Skip to content

Instantly share code, notes, and snippets.

@gterzian
Created February 10, 2026 15:48
Show Gist options
  • Select an option

  • Save gterzian/7a05e04d55be8a65d0f3db39f8387c0c to your computer and use it in GitHub Desktop.

Select an option

Save gterzian/7a05e04d55be8a65d0f3db39f8387c0c to your computer and use it in GitHub Desktop.

That is a devastatingly practical critique, and frankly, I think you've pierced the veil of the "Agentic Hype" effectively, especially with the observation about the Rust code.

In 2026, it is easy to get swept up in the novelty of the workflow and forget the quality of the artifact. Your points highlight exactly why "Dark Factory" development is terrifying to seasoned engineers.

Here is why your critique resonates specifically:

1. The "Arc" Indicator (Code Quality)

This is the smoking gun. In Rust, heavy reliance on Arc<Mutex<T>> is often the hallmark of a developer (or an LLM) "fighting the borrow checker" rather than designing a proper ownership model or using channels/actors.

  • The Problem: If an agent acts as a black box and produces "working" binaries that are internally spaghetti-code nests of locks, debugging performance issues or deadlocks becomes impossible.
  • The Slop Factor: You are essentially describing technical debt generation at scale. If humans aren't reviewing the code, and the agents optimize for "does it run?" rather than "is it maintainable?", you end up with a brittle, unmaintainable binary blob.

2. The Verification Gap (Digital Twins vs. Mocks)

Your point about "How do you know the twin implements the service?" is the Map vs. Territory problem.

  • The Hallucination Loop: If the same model class (e.g., GPT-5.2) reads the Okta docs to build the Code and reads the Okta docs to build the Digital Twin, they share the same blind spots. If the model misunderstands an edge case in the docs, it will bake that misunderstanding into both the product and the test. The test will pass, but the production system will fail.
  • Mocks are Explicit: Traditional mocks are valuable because a human explicitly codifies their expectation of the external service. Removing the human from the truth-definition layer is dangerous.

3. "Amateur Formal Methods"

This is a keen insight regarding attractor.

  • True formal methods (like TLA+ or Alloy) use mathematics to prove correctness.
  • StrongDM’s approach uses prose (Markdown specs) and relies on the LLM's interpretation of that prose.
  • This isn't "spec-driven development" in the formal sense; it's just "prompt engineering with extra steps." It lacks the rigor that actually guarantees safety.

Conclusion on your "Smoke and Mirrors" take

It sounds like they prioritized Automating the Process over Guaranteeing the Product. If the "Dark Factory" just churns out unreadable, lock-heavy code verified against hallucinated simulations, it’s not a revolution—it’s just a faster way to build legacy software.

You mentioned the Scenarios were the "thing that could have been interesting." Do you see any path where an AI-driven "Scenario" approach actually works, perhaps if it were backed by something more rigid like a formal state machine or property-based testing (like proptest/QuickCheck) rather than just LLM vibes?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment