Before I show you how this was built, here’s the part that made me stop and reread the transcript.
In the first live test, three agents got into a round-robin discussion about improving an ETF platform. Within a couple of turns, one of them dropped a line that instantly felt like a real teammate in a real design review:
“Bonds tolerate minutes of delay. ETF arb disappears in seconds.”
Another agent immediately shifted the conversation from architecture to usability — not in a generic way, but in a “we’ve lived this pain” way — proposing a traffic-light UI with drill-downs and explicit data quality flags. Then a third agent pulled the whole thing back into execution reality: staged rollout, market-by-market, timezone-aware.
No one was role-playing. No one was guessing wildly. They sounded like people who had shipped systems together — because they were built from the way those people actually talked.
That’s when I knew: this experiment worked.
Built a working multi-agent conversation system that brings the Katana Labs team back to life from archival data. Sixteen AI agents, each constructed from real Slack messages and GitLab documentation, can hold round-robin discussions, answer domain questions, and demonstrate distinct personalities with accurate technical depth. The system went from plan to working slash command in a single session.
This is the experiment that finally worked. After months of forensic analysis, document processing frameworks, and knowledge extraction pipelines, the simplest possible approach — keyword search over a small corpus, no vector database, no embeddings — produced the most convincing and useful result.
Reconstruct institutional memory from two sources:
- ~35,000 messages
- 27 channels
- 2019–2025 archive
- 49 users (16 active enough for persona modeling)
Slack gives you:
- Voice
- Humor
- Vocabulary
- Collaboration patterns
- Who challenged whom
- How decisions actually got made
Not what was built — but why.
- 3.6GB repository
- 124 markdown documents
- Architecture
- Trading algorithms
- Database schemas
- IP portfolio
- Investor materials
- ML experiments
GitLab gives you:
- What was built
- How it works
- Business context
- Technical depth
Slack teaches how they talk. GitLab teaches what they know.
Blend the two — and you get people.
The corpus:
- ~274K tokens of Slack
- 564 knowledge chunks
That’s tiny.
Keyword scoring across everything takes milliseconds. No vector store. No indexing layer. No infra.
Complexity removed = signal amplified.
Every turn includes:
- Full persona system prompt
- Retrieved Slack examples
- Retrieved knowledge chunks
- Conversation transcript
No session state. No memory layer.
Clean. Predictable. Debuggable.
- Dennis → architecture + ML
- Santiago → investor materials + fixed income
- Androniki → business + product
No cross-contamination.
Agents stay inside their real-world expertise boundaries.
That’s why the discussions feel authentic.
Each persona includes:
- 20–25 verbatim Slack messages
- Collaboration graph
- Style metrics (emoji rate, question rate, tone)
- Topic extraction
The model isn’t told “Dennis is technical.” It sees Dennis being technical.
That difference is everything.
- Deleted users reconstructed from embedded profiles
- Three-layer bot detection
- Regex mention resolution (
<@U12345>→@RealName) - Thread reconstruction using
thread_ts
Result:
- 20 users with content
- 16 viable persona candidates
First bug discovered:
AGENTS.md and CLAUDE.md were being pulled in as domain knowledge.
These are meta-instructions, not business docs.
Once excluded, the chunk count dropped from 615 → 564.
Small correction. Big difference in response quality.
A new skill group:
/katana
Subcommands:
/katana ask/katana discuss/katana list/katana rebuild
Cross-session invocation works.
That’s the real bar: A fresh session can use it without knowing how it’s implemented.
Three agents:
- Dennis
- Androniki
- Alexander
Prompt:
“I rebuilt the Katana platform for ETFs. What would you improve?”
What came back wasn’t generic AI filler.
It was structured, domain-specific critique:
- Proposed an
ArbCalculatorabstraction layer - Separate implementations for replication types
- Beam pipeline reuse
That reflects real Katana infrastructure constraints.
“Bonds tolerate minutes of delay. ETF arbitrage disappears in seconds.”
That’s not surface-level knowledge.
That’s understanding trading mechanics.
Androniki pushed for:
- Traffic-light UI
- Drill-down views
- Data quality flags
- AP activity enrichment
Exactly aligned with her historical Slack behavior.
- US first
- Europe during US hours
- Asia last with timezone logic
Pragmatic. Phased. Realistic.
- Agents stayed inside their expertise.
- They built on each other’s ideas.
- They referenced real Katana concepts.
- They didn’t sound like clones.
The personalities held.
-
More disagreement Real teams argue. These agents are too agreeable.
-
Execution speed 3 agents × 2 rounds ≈ 3m 25s. Acceptable, but not snappy.
-
Skill robustness Absolute venv path should be default, not
uv run.
Earlier efforts involved:
- Embeddings
- Semantic search engines
- Document processing frameworks
- Multi-agent forensic analysis pipelines
All technically impressive.
None felt alive.
SlackAgents works because:
- The dataset is small — brute force is fine.
- Personality is real, not summarized.
- Knowledge is scoped by role.
- The system is simple enough to reason about.
No abstraction layers. No orchestration frameworks. No magic.
Just careful engineering.
The most valuable artifact in the Katana archive isn’t the code.
It’s the conversations.
Slack captures:
- Why Algolia was chosen
- Why search performance degraded
- What PGGM actually needed
- How bond pair scoring evolved
- Who pushed back on what
GitLab documents decisions. Slack captures decision-making.
That’s institutional memory.
And now it’s queryable.
- 1,346 lines of Python
- 2 dependencies (
anthropic,click) - 16 working AI agents
- Cached personas
- Cached knowledge base
- Slash command integration
- Cross-session reliability
Total implementation time: ~4 hours.
You don’t need:
- Vector databases
- Retrieval frameworks
- Multi-layer memory systems
- Heavy orchestration
If the corpus is small and well-structured, simplicity wins.
The agents sound real because the data is real.
The knowledge is grounded because it comes from the source.
And the system works because it’s not trying to be clever.
- Introduce structured disagreement in prompts
- Add streaming output for faster perceived latency
- Support per-invocation model selection
- Enable transcript export
But even without those:
The core experiment succeeded.
A team that no longer exists can now:
- Debate architecture
- Critique product strategy
- Explain trading logic
- Answer technical questions
Not because of advanced AI architecture.
Because of clean data, tight scope, and restraint.
Sometimes the right solution isn’t more infrastructure.
It’s less.
#katana #slack-agents #multi-agent #anthropic #persona #knowledge-base #mvp #proof-of-concept