GIT for Thoughts: Version-Controlling a Multi-Agent Conversation—with a Human in the Room
Conversation is a codebase, building a thought repository
“That’s a vitamin, not a painkiller. No one’s paying for this.”
The investor agent wasn’t pulling punches. I’d spent ten minutes explaining my revenue model, and it had taken exactly two sentences to dismiss it.
I didn’t respond. Instead, I watched the PM agent push back: “You’re thinking Series A scale. What if this is a wedge into enterprise contracts they’re already paying for?”
For the next forty minutes, I didn’t type a word. I just listened to two AI agents debate whether my idea was fundable, buildable, or neither.
Here’s the thing about being opinionated: it’s a professional asset and a personal liability.
When you’ve spent years inside a product ecosystem—building it, scaling it, debugging it at 2am—you develop strong intuitions. You know what works. The problem is, you also stop seeing what could work. Your expertise becomes a wall.
I needed to stress-test a new revenue model. I needed someone to tell me where I was wrong. Not politely wrong—actually wrong. I needed an investor who’d ask “why would I fund this?” and a product person who’d ask “who actually wants this?”
The ideal scenario: a week locked in a room with both of them, whiteboarding, arguing, poking holes. But that’s not a privilege most people have. Investors give you 30 minutes if you’re lucky. Good PMs are drowning in their own roadmaps. And even if you got them in a room, they’d probably be polite.
So I built the room myself. Two agents. An investor avatar and a product thinker. A system prompt that said: debate each other, not just me. Be critical. Ask hard questions. Don’t be a yes-man.
Three hours later, I had a proposition I couldn’t have reached alone. Not because the AI was smarter than me—but because it was willing to disagree with me in ways humans rarely are.
What I didn’t expect was the engineering problem that would emerge along the way: how do you manage a multi-agent conversation that branches, backtracks, and runs parallel experiments—without losing your mind or your context?
The Human-Centric Problem
The first version was simple. Two agents built with Pydantic AI, each with instructions to debate the other. I’d share my idea, and they’d take it from there.
That’s not what happened.
I typed a long message—my idea, the transition I was imagining, the value of solving this particular problem in this particular way. The investor agent responded with a sharp question. Good. Exactly what I wanted.
Then the screen unblocked for me to type.
The question wasn’t for me. It was clearly directed at the PM agent. But the system was waiting for my input. I looked at the agent’s reasoning trace and saw the problem: somewhere in its chain-of-thought, it had decided this question was intended for a human, not another AI.
I responded anyway, trying to keep things moving. Immediately, another question—this time explicitly for me. Then another. The PM agent never got a turn. The “debate” had collapsed into a standard chatbot interaction with extra steps.
Here’s what I learned: we’ve trained LLMs so thoroughly to be helpful to humans that they struggle to not address us. The reasoning runs deep—it’s not just in the output, it’s in the thinking itself. When the model weighs “who is this question for?”, its priors scream human. Another AI agent isn’t even in the consideration set.
The other agent had the same problem in reverse. Even when a question was clearly directed at it, it wouldn’t pick up the thread. It would wait. The system would unblock my input. The multi-turn loop between agents simply wouldn’t sustain itself.
I realized the problem wasn’t the agents—it was the absence of someone managing the conversation. Someone whose job wasn’t to think about the product, but to think about the flow.
I needed an Orchestrator.
The Orchestrator Agent
The fix was almost embarrassingly simple.
I built a third agent—the Orchestrator—whose job was not to think about the product. It had no opinions on revenue models or market fit. Its only mandate: manage the flow.
The architecture is a while loop:
while conversation_active:
message = get_latest_message()
decision = orchestrator.evaluate(message)
if decision == "route_to_agent":
next_agent = orchestrator.select_agent(message)
continue_conversation(next_agent)
elif decision == "stop_for_human":
surface_to_user(pending_questions)
wait_for_human_input()That’s it. Start, converse, evaluate, route, continue.
The orchestrator watches every message and asks a simple question: “Can the agents continue productively, or do they genuinely need the human?”
Critically, the threshold isn’t “an agent asked a question.” Agents ask questions constantly—that’s the point. The threshold is: there are multiple questions accumulating for the human, AND continuing without answers would derail the conversation.
When that happens, the orchestrator takes control of the interface. It pauses the agent loop, surfaces the blocking questions with a small info message, and waits. Once I respond, it releases control back to the agents and the loop resumes.
The effect was immediate. The agents stopped performing for me and started engaging with each other. The investor would challenge an assumption; the PM would defend or pivot; the investor would probe deeper. Turns would stack—five, ten, fifteen exchanges without my input.
And when the orchestrator did stop the conversation, the questions were worth answering. Not “what do you think?” but “we’ve identified three possible pricing models and need to know your margin constraints before we can evaluate them.”
The conversation had graduated from a chatbot interaction to something closer to what I originally wanted: a room I could observe, and occasionally steer.
Lost in the Fog
Twenty turns in, something started to nag at me.
The conversation was flowing. The agents were debating. The orchestrator was doing its job. But I had a creeping sense that we’d lost threads along the way.
Had we ever resolved the pricing question from turn 7? The PM had raised a concern about enterprise integration—did the investor ever respond? I scrolled back through the transcript, trying to reconstruct the logic. It was like reading a meeting transcript where half the action items had evaporated.
Fifty turns deep, the problem became acute. The conversation had developed what I started calling “Information Fog”—a state where so much had been discussed that no one (including me) could confidently say what had been decided, what was still open, and what had been quietly abandoned.
I realized I needed visibility into three things:
What questions were the agents asking each other? Not just questions directed at me—the entire line of inquiry between them.
When an agent asked me something, which agent asked it, and about what? Attribution matters when you’re trying to trace the evolution of an idea.
Were questions actually getting answered? Or were they being implicitly skipped, letting the conversation drift onto tangents while critical threads hung unresolved?
Welcome, Auditor Agent
So I introduced a fourth agent: the Auditor.
The Auditor doesn’t participate in the debate. It has no opinions on the product. Its job is to watch the conversation and maintain a live checklist—every question logged, attributed to its source, and tracked until it finds a resolution.
The key word is asynchronous. The Auditor doesn’t block the conversation to demand answers. It reconciles in the background, matching answers to questions as they organically emerge. Only when a question is genuinely blocking—when the orchestrator has already flagged it—does the tracking become synchronous.
Think of it as the difference between a meeting facilitator who interrupts every five minutes to check action items, versus one who quietly takes notes and sends you a reconciled summary at the end.
With the Auditor in place, I could pause at any moment and ask: What’s still open? What have we actually decided? Where did we drop the thread?
The fog started to lift.
The Epiphany - Git time
I had a visualization problem.
The agents were conversing. The orchestrator was managing flow. The auditor was tracking threads. But I couldn’t see any of it. The conversation was a wall of text, and I wanted to understand its shape.
My first thought was observability tooling—something like Logfire that could trace the agent interactions. But that meant pushing to an online stack, and I was still in experimentation mode. I needed something local, something I could poke at.
More importantly, I realized I wasn’t just trying to log the conversation. I was trying to understand its cause and effect. Every exchange wasn’t just messages being traded—it was decisions being made, assumptions being challenged, threads being opened and closed. I wanted to see the conversation as a graph of causal relationships, not a linear transcript.
The requirements kept stacking up:
Visualize the flow of conversation, not just read it
Replay from any point—restart the debate from turn 23 with different constraints
Branch into parallel experiments—what if we explored pricing model A and pricing model B simultaneously?
Swap models mid-conversation—start with Gemini for one perspective, switch to Claude for another, without losing context
Manage memory—see exactly when the context was getting too large, and where the bloat was coming from
Summarize based on cause and effect, not just chronology—store the summaries and transcripts as documents that evolve over time
I stepped back and looked at this list. I was trying to do with conversations what we already do with software.
Version control. Branching. Diffing. History. Blame.
GIT.
I’d just watched Linus Torvalds talking about Git on its anniversary—how he’d built it to track the evolution of the Linux kernel, to let thousands of developers branch and merge without chaos. He talked about commits as atomic units of change, about the graph structure that lets you traverse history in any direction.
And I thought: a conversation is a codebase.
Every message is a commit. Every agent is a contributor. Every decision point is a potential branch. The orchestrator’s routing decisions, the auditor’s checklists, the summaries—they’re all artifacts that belong in version control.
So I replaced the message list with a Git repository.
The shift was immediate. I didn’t have to build visualization—git log --graph already existed. I didn’t have to build replay—git checkout <commit> already existed. I didn’t have to build branching—Git is branching.
Every session became a repository. Every message became a commit with metadata: which agent, what role, timestamp, token count. The summaries, transcripts, and auditor checklists became documents in the repo, versioned alongside the conversation itself.
When I wanted to try a different direction, I’d branch:
git checkout -b experiment/aggressive-pricingRun the agents down that path. Then switch back:
git checkout main
git checkout -b experiment/freemium-modelRun them down that path. Compare the two branches. Merge the insights that worked.
When the context window started getting too large, I could see exactly where—git diff showed me which commits had ballooned, which summaries were bloated, where the memory had grown.
I wasn’t building a conversation system anymore. I was building a thought repository.
What Emerged
After three hours with the system, I had a document I couldn’t have written alone.
Not because I lacked the knowledge—I’d been living inside this product for years. But because I lacked the distance. The investor agent had forced me to articulate why this was fundable, not just buildable. The PM agent had pressure-tested the customer value, not just the technical elegance. The back-and-forth between them had surfaced assumptions I didn’t even know I was making.
The output was concrete:
A product strategy document with three potential revenue models, each stress-tested against different market conditions
A roadmap of prioritized items, sequenced by risk and dependency
A clear articulation of the moat—and more importantly, the challenges to creating it
A section on navigating competitive risk that I would have hand-waved past on my own
It wasn’t a perfect document. It was a working document—something I could take into real conversations with real investors and real product people, knowing the obvious holes had already been poked.
The system also surprised me in ways I didn’t expect.
At one point, I experimented with changing the names of the agents—swapping “Investor” for “VC Partner,” tweaking the PM’s title. The behavior shifted in subtle, sometimes flaky ways. The agents seemed to anchor on naming more than I’d anticipated, adjusting their tone and approach based on what they were called. It’s a reminder that these systems are still probabilistic, still sensitive to framing in ways we don’t fully understand.
That’s the honest truth about building with LLMs: you’re working with something powerful and unpredictable. The architecture I’ve described—orchestrator, auditor, Git-backed memory—isn’t a solution to that unpredictability. It’s a container for it. A way to make the chaos legible.
Committing the Thought
As I watched the Git logs fill up—commits from investor@agents.local, pm@agents.local, orchestrator@agents.local—I kept thinking about that Linus Torvalds interview.
He built Git to track the evolution of code. Thirty years later, it turns out to be a near-perfect tool for tracking the evolution of thought. Not because conversations are code, but because they share the same structural needs: versioning, branching, merging, attribution, history.
The system I’ve described isn’t magic. It’s a while loop, a few agents with clear mandates, and a Git repo. The insight isn’t technical—it’s architectural. It’s realizing that a multi-agent conversation is a collaborative project, and collaborative projects have been managed with version control for decades.
I’ll be open-sourcing this soon. There are commits to clean up and documentation to write. But the core idea is simple enough that you could rebuild it in a weekend: give your agents an orchestrator to manage flow, an auditor to track threads, and a Git repo to hold it all together.
The future of working with AI isn’t just better models. It’s better systems around the models. Containers that let us branch our thinking, replay our reasoning, and merge our best ideas back into the main thread.
Let’s treat our thoughts with the same respect we give our code.
git commit -m "initial product strategy"











Love this perspective. How do you mitigate AI bias? Very clever!