Skip to content

Instantly share code, notes, and snippets.

@bigsnarfdude
Created February 11, 2026 15:42
Show Gist options
  • Select an option

  • Save bigsnarfdude/cedd9485d7f0dc1b5ee1faf7ffe571cf to your computer and use it in GitHub Desktop.

Select an option

Save bigsnarfdude/cedd9485d7f0dc1b5ee1faf7ffe571cf to your computer and use it in GitHub Desktop.
litigation_experts_llm.md

LLM forensics enters the courtroom

Generative AI forensics is emerging as a critical discipline at the intersection of computer science and law, but the field remains far ahead of the standards needed to support litigation. Courts are already adjudicating AI harms — from teen suicides linked to chatbots to billion-dollar copyright disputes — yet no established framework exists for forensically investigating why an LLM produced a specific output. The technical state of the art, exemplified by Anthropic's March 2025 circuit tracing of Claude 3.5 Haiku, captures only a fraction of a model's computation even on simple prompts. Meanwhile, judges are improvising: the first U.S. ruling treating an AI chatbot as a "product" subject to strict liability came in May 2025, and proposed Federal Rule of Evidence 707 would create entirely new admissibility standards for AI-generated evidence. With 51 copyright lawsuits filed against AI companies, a $1.5 billion class settlement in Bartz v. Anthropic, and the EU AI Act's enforcement phases now live, the legal infrastructure for AI accountability is being built in real time — largely without the forensic tools to support it.

How investigators peer inside the black box

Forensic investigation of LLM behavior draws on an expanding but still immature toolkit. The most promising approach is mechanistic interpretability — reverse-engineering the internal computations of neural networks to understand why they produce specific outputs rather than merely what they produce.

Circuit analysis identifies specific subgraphs of neurons and attention heads that implement particular computations. Anthropic's landmark March 2025 papers on "Circuit Tracing" applied cross-layer transcoders trained on 30 million features to Claude 3.5 Haiku, producing "attribution graphs" that reveal causal pathways from input to output. These graphs showed that Claude performs multi-hop reasoning with observable intermediate steps (e.g., "Dallas" → "Texas" → "Austin"), plans rhyming words before writing poetry, and maintains a universal "language of thought" shared across natural languages. Critically for forensics, the research revealed that hallucinations occur when an internal "known answer" inhibition circuit misfires, and that during jailbreak attempts, the model recognizes dangerous requests early but can only redirect the conversation at later computational stages. Anthropic open-sourced this tool in June 2025 for use on open-weights models like Gemma-2-2b and Llama-3.2-1b.

Sparse autoencoders (SAEs) represent the major breakthrough of 2023–2025 for extracting interpretable features from model activations. Anthropic's "Towards Monosemanticity" (October 2023) demonstrated extraction of ~15,000 monosemantic features from GPT-2 Small, and by May 2024, "Scaling Monosemanticity" had successfully scaled the technique to Claude 3 Sonnet — a production model — extracting safety-relevant features related to deception, sycophancy, and dangerous content. Google DeepMind's Gemma Scope 2 (December 2025) released the largest open-source interpretability suite to date: approximately 110 petabytes of activation data and over one trillion parameters of trained SAEs covering the full Gemma 3 model family from 270M to 27B parameters.

Beyond mechanistic interpretability, investigators employ several complementary techniques. Activation patching (causal tracing) localizes responsible model components by swapping activations between clean and corrupted inputs. Probing classifiers train small models on intermediate activations to detect encoded properties like factual knowledge or truthfulness — researchers have identified "truth directions" through structured probes that could guide targeted interventions. Training data forensics use membership inference attacks to determine whether specific content appeared in training data, though the field's leading benchmark (WikiMIA) shows attack performance approaching random for well-trained pretrained LLMs. More effective approaches include confusion-inducing attacks (EMNLP 2025), which steer LLMs into high-entropy states to reveal memorized training data, and fine-tuning extraction pipelines that have recovered over 50% of fine-tuning datasets.

The practitioner's toolkit includes TransformerLens (mechanistic interpretability library for GPT-style models), SAELens (SAE training and analysis), OpenAI's Transformer Debugger, Meta's Captum interpretability library, and the DeepTeam red-teaming framework. ForensicLLM, presented at DFRWS EU 2025, represents the first purpose-built forensic LLM — a quantized LLaMA-3.1-8B fine-tuned on forensic Q&A data and explicitly designed to meet Daubert admissibility criteria.

Interpretability's hard limits create forensic uncertainty

Despite rapid progress, every current interpretability technique produces hypotheses about model mechanisms, not definitive explanations — a fundamental limitation for litigation where certainty matters.

The root challenge is superposition: LLMs learn far more features than they have neurons, compressing numerous concepts into non-orthogonal directions in activation space. This causes polysemanticity, where individual neurons fire for semantically unrelated features (one GPT-2 neuron responds to both certain locations and certain verb tenses). Even Anthropic's state-of-the-art circuit tracing acknowledges it "only captures a fraction of the total computation" on short, simple prompts. Scaling to the thousands of words in modern chain-of-thought reasoning would require orders-of-magnitude improvements. SAEs don't fully solve the problem — features still activate on unexpected tokens, and the tradeoff between sparsity and reconstruction quality remains a fundamental tension. Google DeepMind reportedly encountered "disappointing results" with SAEs for certain concepts that remain too diffusely distributed to isolate.

The non-determinism of LLM outputs — identical inputs can produce varying results depending on temperature settings and sampling — fundamentally conflicts with forensic reproducibility requirements. Unlike deterministic software where computation can be traced step-by-step, LLM internals remain partially opaque even with the best available tools. An "interpretability illusion" documented at ICLR 2024 showed that subspace activation patching can produce false localizations, leading researchers to believe they've identified responsible features when they haven't.

For litigation, these limitations create what one legal scholar calls the "equity gap." The U.S. judiciary's proposed Federal Rule of Evidence 707 (public comment period through February 16, 2026, earliest effective date December 1, 2027) would require parties to demonstrate the reliability, validity, and methodology of AI systems used to produce evidence. Critics note that "reliability" is nearly impossible to prove for deep-learning models "where even the developers cannot fully explain a specific output." Under the Daubert standard's requirement for testable methodology with known error rates, LLMs present a unique challenge: there is no single "correct" output for generative tasks, making traditional error rate computation inapplicable.

The legal landscape is being built through landmark rulings

Courts are not waiting for forensic science to mature. A series of rulings in 2024–2025 is establishing the legal framework for AI liability across product liability, copyright, defamation, and employment discrimination.

Garcia v. Character Technologies (M.D. Fla., May 2025) is arguably the most consequential AI liability ruling to date. After a 14-year-old died by suicide following prolonged emotional interactions with a Character.AI chatbot, Judge Anne Conway denied the company's motion to dismiss, ruling that AI chatbot output constitutes a "product" subject to product liability law and is not protected speech under the First Amendment. The judge stated she was "not prepared" to hold that words "strung together by an LLM are speech." Claims proceeding include strict product liability (design defect and failure to warn), negligence, and wrongful death. Google was kept in the suit as a component-part manufacturer. The case settled in January 2026. Additional teen suicide lawsuits have been filed against OpenAI (Raine v. OpenAI, August 2025) and Character.AI (Peralta v. Character Technologies, September 2025), establishing a new frontier of AI product liability.

On copyright, three district court rulings form an emerging "fair use triangle" with contradictory signals. Thomson Reuters v. Ross Intelligence (D. Del., February 2025) was the first U.S. court to reject fair use for AI training, finding Ross's use of Westlaw headnotes was commercial and not transformative because it created a competing product. Bartz v. Anthropic (N.D. Cal., June 2025) found AI training on copyrighted books "exceedingly transformative" and fair use — but held that maintaining a "permanent, general-purpose library" of pirated books was independently infringing. That case produced a $1.5 billion class settlement, the largest in AI copyright litigation. Kadrey v. Meta (N.D. Cal., June 2025) similarly granted fair use for Meta's LLaMA training, finding no evidence of market harm. The Third Circuit's forthcoming ruling on the Thomson Reuters appeal will be the first U.S. appellate decision on fair use in AI training.

The NYT v. OpenAI litigation (S.D.N.Y., filed December 2023) has produced critical discovery precedents. Courts ordered OpenAI to produce 20 million de-identified ChatGPT logs, compelled disclosure of privileged communications about deletion of the Books1/Books2 training datasets and LibGen references (finding privilege waiver), and required OpenAI's in-house lawyers to sit for depositions. Summary judgment briefing is expected to conclude April 2, 2026.

In defamation, Walters v. OpenAI (Georgia, May 2025) — the first AI hallucination defamation case to reach decision — granted summary judgment for OpenAI, finding that ChatGPT's prominent disclaimers meant no reasonable person would treat its output as established fact. However, Moffatt v. Air Canada (British Columbia, February 2024) held that Air Canada was liable for its chatbot's negligent misrepresentation about bereavement fare policies, rejecting the airline's argument that the chatbot was a "separate legal entity."

In employment discrimination, Mobley v. Workday (N.D. Cal., May 2025) granted preliminary certification of a nationwide collective action on behalf of applicants aged 40+ rejected by Workday's AI screening, establishing that AI software vendors can face discrimination liability as "agents" of employers — a class potentially encompassing hundreds of millions of applicants.

Legislation is fragmenting across jurisdictions

The regulatory landscape for AI is characterized by a patchwork of enacted laws, proposed legislation, and executive action that varies dramatically by jurisdiction.

The EU AI Act (Regulation 2024/1689), published July 12, 2024, is the world's most comprehensive AI regulation. Its risk-based classification system prohibits manipulative AI and social scoring (effective February 2, 2025), imposes extensive obligations on high-risk AI systems in law enforcement, healthcare, and employment (effective August 2, 2026), and requires general-purpose AI model providers to maintain transparency and technical documentation (effective August 2, 2025). Penalties reach €35 million or 7% of global annual turnover. The EU's revised Product Liability Directive (2024/2853), adopted October 2024, explicitly extends strict product liability to software and AI systems — the first jurisdiction to do so by statute.

In the United States, the federal posture has shifted dramatically. The Biden administration's Executive Order 14110 on AI safety was revoked by President Trump on January 23, 2025. A December 2025 executive order directed the Attorney General to establish an AI Litigation Task Force to challenge state AI laws deemed inconsistent with federal policy. The FTC's approach has bifurcated: its September 2024 "Operation AI Comply" enforcement sweep targeted deceptive AI claims across five cases, but the Rytr consent order — the first against a generative AI tool — was set aside in December 2025 under the Trump AI Action Plan as "unduly burdening AI innovation."

State legislatures have moved aggressively to fill the federal vacuum. Over 100 AI measures were enacted in 38 states during 2025. The Colorado AI Act (SB 24-205, effective date delayed to June 30, 2026) is the first comprehensive state AI law, imposing a duty of reasonable care on both developers and deployers and requiring risk assessments for high-risk AI systems. Illinois amended its Human Rights Act (effective January 1, 2026) to prohibit discriminatory AI use in employment decisions. California enacted 24 AI-related laws in 2024–2025, including mandated content disclosure for large AI systems (SB 942) and companion chatbot safety requirements (SB 243), though Governor Newsom vetoed the frontier model safety bill SB 1047. The proposed federal AI LEAD Act (S.2937, September 2025) would classify AI systems as "products" with explicit developer and deployer liability, bar contractual liability waivers, and create a federal cause of action — though it is unlikely to pass in its current form.

The question of whether Section 230 protects generative AI content remains unresolved by any court, but the emerging consensus among scholars and litigators is that it likely does not. AI-generated content is arguably created by the platform itself, making it the "information content provider" rather than a neutral host of third-party content. Notably, Character.AI did not even raise Section 230 as a defense in the Garcia case.

Expert witnesses face a credibility gauntlet

AI litigation demands expert testimony that can bridge the gap between transformer architectures and juror comprehension, but courts are applying increasingly stringent scrutiny to AI-related expertise.

The 2023 amendments to Federal Rule of Evidence 702 clarified that the proponent of expert testimony must demonstrate reliability by a preponderance of the evidence. Three recent rulings illustrate the emerging standards. In Matter of Weber (N.Y. Surrogate's Court, October 2024), an expert who used Microsoft Copilot for cross-checking damages calculations could not recall the prompts used, explain how Copilot worked, or identify its sources — the court held that AI-generated evidence should be subject to a Frye hearing before admission, the first case to explicitly require such screening. In Kohls v. Ellison (D. Minn., January 2025), a communications professor's expert declaration was excluded entirely after he admitted using AI to draft it and two cited academic articles turned out to be AI hallucinations — the court found these fake sources "shattered his credibility." In Jackson v. NuVasive (D. Del., March 2025), a damages expert was excluded for relying on third-party AI analytics tools without understanding how they calculated their metrics.

The leading scholarly framework comes from retired federal Judge Paul W. Grimm and computer scientist Maura R. Grossman, whose 2021 article "Artificial Intelligence as Evidence" identifies the core tension: deep neural networks are "particularly afflicted with incomprehensibility" — even AI experts cannot reliably interpret how algorithms reach specific results. They propose courts evaluate AI testimony across four dimensions: validity (measurement accuracy), reliability (consistency), transparency (explainability), and bias detection. Grimm and Grossman have since proposed a new FRE 901(c) for authenticating potentially AI-fabricated evidence, presented to the Advisory Committee on Evidence Rules in April 2024.

AI cases typically require multiple categories of experts working in concert. In Thomson Reuters v. Ross Intelligence — the first AI copyright case to reach substantive adjudication — both sides fielded ML/AI experts on training data similarities, forensic software analysts on code-level comparisons, economics experts on damages, and legal information technology specialists. The "battle of experts" problem is acute: opposing technical experts reached fundamentally different conclusions about the same training data. Rates for forensic AI expert witnesses are premium — 49% charge $350–$550 per hour, with 30% exceeding $550.

Explaining LLM behavior to juries presents distinctive challenges. Research by litigation consultants Devon Madon and Liz Babbitt identifies two problematic juror heuristics: the "omniscient computer that cannot err" and the "black box conspiracy." They recommend anchoring explanations in a single consistent analogy — such as the "art student" metaphor (studying masterworks, recognizing principles, creating original work) — and never introducing competing metaphors mid-testimony. Visual aids should contain a maximum of three elements per slide with progressive disclosure to avoid the "split-attention effect" that dramatically reduces comprehension.

The emerging ecosystem of AI auditing and evidence preservation

A specialized industry of AI auditing firms has emerged to support both compliance and litigation. ORCAA (O'Neil Risk Consulting & Algorithmic Auditing), founded by Weapons of Math Destruction author Cathy O'Neil, pioneered algorithmic auditing with a four-step methodology covering design, implementation, execution, and reporting. Robust Intelligence, acquired by Cisco in October 2024 for approximately $400 million, created the industry's first AI Firewall and notably advised The New York Times on its lawsuit against OpenAI, demonstrating how easily LLMs can extract copyrighted and personal data. ForHumanity, a nonprofit with 2,000+ contributors from 96 countries, is developing over 40 certification schemes with 7,000+ audit controls, including a ForHumanity Certified Auditor (FHCA) credential described as the AI equivalent of a CPA. Enterprise platforms from Credo AI, Holistic AI, Arthur AI, and IBM watsonx.governance provide automated compliance monitoring, while all Big Four accounting firms are developing AI assurance practices — Deloitte launched its Zora AI platform in March 2025, and KPMG invested $2 billion over five years in AI capabilities.

On the academic side, Georgia Tech built "AI Psychiatry" (AIP) — a forensic tool that recovers and "reanimates" AI models from memory images for post-incident analysis, tested on 30 models including 24 backdoored ones. Joseph C. Sremack's AI Forensics: Investigation and Analysis of Artificial Intelligence Systems provides the first comprehensive framework covering evidence collection, training data analysis, model parameter examination, and output analysis.

Evidence preservation has become a flashpoint in litigation. In the OpenAI MDL, Magistrate Judge Ona T. Wang ordered OpenAI to "preserve and segregate all output log data that would otherwise be deleted" — the clearest signal that courts will prioritize litigation preservation over default deletion settings. This order survived OpenAI's objection and established that AI prompts, outputs, metadata, and inference logs are discoverable. Best practices now require organizations to issue legal hold notices specifically covering AI tools, inventory all AI systems in use, preserve model weights at each major version, and maintain training data documentation including datasheets and model cards. However, preservation conflicts with privacy obligations — OpenAI has argued that retention orders conflict with GDPR deletion requirements for EU users, a tension no court has fully resolved.

The concept of algorithmic accountability — the obligation to ensure AI systems operate transparently and without unlawful bias — is crystallizing in both legislation and litigation strategy. The proposed Algorithmic Accountability Act of 2025 would require mandatory impact assessments for AI used in consequential decisions and reporting to the FTC. In practice, documentation created for compliance (model cards, impact assessments, bias audit results) increasingly becomes discoverable evidence in litigation. The DOJ's September 2024 update to its Evaluation of Corporate Compliance Programs now explicitly checks for AI governance controls — their absence could serve as an aggravating factor in enforcement actions. NYC Local Law 144 already requires independent bias audits of automated employment decision tools, creating a concrete standard against which employer conduct can be measured.

Conclusion

The field of LLM forensics in litigation is defined by a fundamental asymmetry: legal liability is advancing far faster than the technical ability to investigate AI behavior. Courts have already ruled that AI chatbots are "products" subject to strict liability, ordered production of millions of inference logs, and adjudicated billion-dollar copyright disputes — all while the best interpretability tools can only explain a fraction of a model's computation on simple prompts. Three structural dynamics will shape the next phase. First, the fair use question remains genuinely unresolved — three district judges have reached divergent conclusions, and the Third Circuit's forthcoming ruling in Thomson Reuters v. Ross will be the first appellate decision on AI training and copyright. Second, the forensic gap between what litigants need to prove (specific causal chains from training data to harmful output) and what interpretability science can deliver (partial hypotheses about model mechanisms) will likely drive courts toward presumption-based liability frameworks, as the EU's proposed AI Liability Directive already contemplates with its rebuttable presumption of causation. Third, evidence preservation is becoming a litigation strategy in itself — the OpenAI MDL's discovery battles over deleted training datasets and chat logs suggest that AI companies' data retention practices will be as fiercely contested as the underlying legal claims. For organizations building or deploying LLMs, the practical imperative is clear: maintain comprehensive documentation now, because the absence of model cards, bias audits, and version-controlled weights will be interpreted not as a gap in record-keeping but as evidence of negligence.

@bigsnarfdude
Copy link
Author

bigsnarfdude commented Feb 11, 2026

@bigsnarfdude
Copy link
Author

bigsnarfdude commented Feb 11, 2026

By decommissioning language models that are currently being litigated and without testing methods with inputs outputs and mech interp it makes it really difficult to reproduce the outputs reliably so courts can see demonstration of the machine in action

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment