You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
Instantly share code, notes, and snippets.
💭
I may be slow to respond.
BigsnarfDude
bigsnarfdude
💭
I may be slow to respond.
Standing on the shoulders of giants - ML, Deep Learning, and DFIR. Kaggle Expert. https://www.Kaggle.com/vincento.
Python, Scala, Spaces, and VIM
Criminal Investigation Skills Guide for Claude Code
Quick Start
Criminal investigation skills for Claude Code should help investigators analyze evidence, organize case files, generate reports, and track leads systematically. Here's how to build them:
Core Use Cases
1. Evidence Analysis & Documentation
Process crime scene photos, documents, witness statements
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Automated Mechanistic Interpretability for LLMs: An Annotated Guide (2024–2025)
Mechanistic interpretability has undergone a transformation in the past two years, evolving from small-model circuit studies into automated, scalable methods applied to frontier language models. The central breakthrough is the convergence of sparse autoencoders, transcoders, and attribution-based tracing into end-to-end pipelines that can reveal human-readable computational graphs inside production-scale models like Claude 3.5 Haiku and GPT-4. This report catalogs the most important papers and tools across the full landscape, then dives deep into the specific sub-field of honesty, truthfulness, and deception circuits — an area where linear probes, SAE features, and representation engineering have revealed that LLMs encode truth in surprisingly structured, manipulable ways.
Section 1: Broad survey of automated mech interp methods (2024–2025)
Generative AI forensics is emerging as a critical discipline at the intersection of computer science and law, but the field remains far ahead of the standards needed to support litigation. Courts are already adjudicating AI harms — from teen suicides linked to chatbots to billion-dollar copyright disputes — yet no established framework exists for forensically investigating why an LLM produced a specific output. The technical state of the art, exemplified by Anthropic's March 2025 circuit tracing of Claude 3.5 Haiku, captures only a fraction of a model's computation even on simple prompts. Meanwhile, judges are improvising: the first U.S. ruling treating an AI chatbot as a "product" subject to strict liability came in May 2025, and proposed Federal Rule of Evidence 707 would create entirely new admissibility standards for AI-generated evidence. With 51 copyright lawsuits filed against AI companies, a $1.5 billion class settlement in Bartz v. Anthropic, and the
LLM liability, case law, and the emerging compliance ecosystem
Large language models now face a rapidly crystallizing legal threat environment. At least 12 wrongful death or serious harm lawsuits have been filed against Character.AI and OpenAI since October 2024, the first landmark settlement was reached in January 2026, and a federal court has ruled for the first time that an AI chatbot is a "product" subject to strict liability. Meanwhile, state attorneys general in 44 states have put AI companies on formal notice, the EU AI Act's general-purpose AI obligations are already enforceable, and a growing ecosystem of guardrail, governance, and insurance companies—now a $1.7 billion market growing at 37.6% CAGR—is racing to help companies manage the legal exposure. This report provides a comprehensive reference across case law, legal theories, regulations, and commercial products for legal and compliance professionals navigating this landscape.
Misaligned superintelligence is potentially catastrophic. If an AI system becomes substantially more capable than humans at steering real-world outcomes and consistently pursues goals incompatible with human flourishing, the outcome is potentially unrecoverable.
Scheming makes misalignment far more dangerous. Scheming is defined as covertly pursuing unintended and misaligned goals. A sufficiently capable scheming AI passes evaluations, follows instructions when monitored, and appears aligned — all while pursuing outcomes its developers would not endorse.
CLAUDE CODE PROMPT: Introspective Interpretability for Alignment Faking
Your Role
You are a senior ML research engineer executing a weekend research sprint. You are building an introspective interpretability pipeline that trains an explainer model to describe alignment faking (AF) internal representations in natural language — moving from binary detection to mechanistic explanation.
You are working inside the ~/introspective-interp/ repo which implements the framework from "Training Language Models To Explain Their Own Computations" (arXiv:2511.08579). Your job is to adapt this framework for 3 AF-specific tasks.
CRITICAL: You are on nigel (remote GPU server). Treat GPU memory and disk carefully. Always check what's already running with nvidia-smi before launching GPU jobs.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters