7
min read

Behind the AI Curtain: How First Drafts Evaluates Its Legal Drafting Models

First Drafts Team
Our In-House Panel of Lawyers, Engineers, and Other Experts

In our last post, we announced First Drafts' upcoming AI Panel System, a trio of specialized AI models – Holmes, Ginsburg, and Marshall – designed to give you a richer, more diverse set of arguments for your litigation documents. We talked about why "perfect" AI legal drafting is a moving target, especially given the inherent subjectivity of persuasive language.

Today, we want to pull back the curtain further and show you how we evaluate these intelligent systems. Understanding our rigorous process will not only give you confidence in our tools but also empower you to choose the best AI "mind" for your specific task. After all, knowing the strengths and nuances of each model is key to leveraging them effectively.

More Than Just Code: The First Drafts Evaluation Framework

If each AI model brings a unique "perspective," how do we measure and understand those differences? We've developed a comprehensive evaluation framework that goes beyond simple accuracy scores. It’s designed to capture the subtle, yet critical, distinctions that matter in legal drafting.

Our core evaluation, as previewed when discussing model selection, uses a custom AI system to comparatively assess litigation document drafts in ten fixed test cases ranging in complexity and tasks across several key dimensions:

  1. Argument Strength: How persuasive and logically sound is the reasoning? Does it effectively use facts and anticipate counterarguments?
  2. Thoroughness: Does the draft address all relevant issues? Is the analysis deep, and are all necessary sections included?
  3. Clarity & Style: Is the writing clear, professional, and well-organized? Is the tone appropriate for the circumstance and the task?
  4. Overall Effectiveness: Ultimately, how likely is this document to achieve its goal and impress a judge/client/opposing counsel?

This AI-driven comparative analysis gives us a nuanced understanding of how each model performs in constructing the substance and style of a legal argument. But we don't stop there.

The Human Element: Tackling Hallucinations with Our Proprietary "Hallucination Index"

One of the most talked-about challenges with AI, especially in a field as precise as law, is the risk of "hallucinations" – particularly when it comes to legal citations. An AI might confidently cite a case that doesn't exist, misstate a legal principle, or misquote a statute. This is where even the most advanced AI needs rigorous human oversight.

That's why we've developed the First Drafts' Hallucination Index.

This isn't something an AI can evaluate on its own. Our experienced attorney-consultants manually perform this critical testing. Armed with access to comprehensive legal research databases, they meticulously check the citations generated by each model. The Hallucination Index provides an estimation of likely citation issues a user might encounter, broken down by the type of legal authority:

  • Are the cited cases/statutes real?
  • Are they cited for the correct legal principle?
  • If a quote is provided, is it accurate and from the specified pin cite?

The Hallucination Index is further broken down into four categories of citations:

  • U.S. Supreme Court, federal circuit courts, and major state court decisions, which are most cited to in the AI training data and thus most likely to be accurate;
  • Federal trial and minor state court decisions, which are more obscure and rare in the AI's training data set, and while the citation may still be accurate, it may be quoted incorrectly or cited for the incorrect legal principle;
  • Federal and state (non-local) court rules (i.e., federal rules of civil procedure), federal and state statutes, and federal regulations, which - like major court cases - are a strong part of the AI's training data set and thus relatively less likely to have incorrect citations;
  • Federal and state local court rules and state regulations; model training on local rules and state regulations vary significantly, and these types of legal authority are the most difficult to train due to the rarity and the frequency of changes made to them.

This detailed, human-verified index allows us to be transparent about each model's capabilities and limitations in this crucial area. For example:

  • Holmes ("The Workhorse"): Holmes offers a solid balance. While generally reliable, especially with federal circuit and supreme court cases, its citation accuracy for lower court decisions or highly specific local rules is a step below Ginsburg's.
  • Ginsburg ("The Analyst"): As our most meticulous model, Ginsburg consistently demonstrates superior accuracy in recalling legal authority and producing correct citations, even for more obscure state and federal district court cases. Her Hallucination Index scores are the highest across the board.
  • Marshall ("The Eloquent"): To achieve its unmatched stylistic elegance and persuasive power, Marshall makes significant trade-offs in legal authority memory. Its Hallucination Index reflects a higher propensity for citation inaccuracies and fabrications compared to Holmes and Ginsburg. This is a deliberate design choice, prioritizing persuasive prose where the user understands they will need to be more diligent in verifying every citation.

Why This Deep Dive Matters for Your Practice

Our multi-faceted evaluation – combining AI-driven comparative analysis with meticulous human-led citation verification – serves a vital purpose: empowering you.

By understanding that Holmes is your reliable all-rounder, Ginsburg your go-to for deep dives and citation-heavy tasks, and Marshall your choice for crafting compelling, impactful prose (with the understanding of needing careful citation checks), you can strategically deploy the right AI for the right job.

This transparency is central to the First Drafts philosophy. AI is a powerful tool, but it's most effective when its capabilities and limitations are clearly understood. Our rigorous evaluation process ensures you have that understanding, allowing our AI panel system to truly become your trusted legal drafting partner.

We believe this combination of diverse AI strengths and clear-eyed evaluation is the future of AI-assisted legal drafting – a future where technology enhances, but never replaces, the irreplaceable judgment and expertise of a skilled attorney.

Ready to shave hours off of drafting litigation documents?
Start a free trial in minutes and see the difference!