Theory Building Has a Speed Limit
2026-04-08 - 8 min read
AI removed the bottleneck on shipping code. It didn't remove the bottleneck on understanding it. That mismatch is creating a new class of failures — and a 40-year-old CS paper predicted exactly why.

In 1985, Peter Naur published a paper called Programming as Theory Building. His argument was simple and radical: the primary output of a software project is not the code. It's the theory — the shared mental model the team builds of the problem domain, the solution, and the thousands of micro-decisions that connect them.
Naur defined theory as something richer than documentation or tribal knowledge:
"The knowledge a person must have in order not only to do certain things intelligently but also to explain them, to answer queries about them, to argue about them, and so forth."
The code is a written representation of that theory. Like all representations, it's lossy. The critical knowledge — why this service talks to that one instead of going through the API gateway, why the retry logic has a specific backoff curve, why the database schema looks the way it does — exists in the minds of the people who built the system. As Naur put it:
"The programmer having the theory of the program can explain why each part of the program is what it is."
Naur had a term for what happens when the team that holds this theory dissolves: theory death. The program doesn't stop running. But it stops being understood. And a program that nobody understands is a program that can't be safely changed.
For forty years, this was mostly a theoretical concern. Shipping software was slow enough that theory-building kept pace naturally. The bottleneck was always production — getting code written, reviewed, tested, and deployed. Understanding came along for the ride because humans were in the loop at every step.
That equilibrium is over.
AI Broke the Equilibrium
I write all my code with Claude Opus 4.6 because it's the best current frontier model. I challenge implementations using independent agents running different models. I built a homelab specifically to run local models that supplement my AI workflows. I've effectively scaled to the output of a 10-person consulting agency as a solo practitioner.
The tools are extraordinary. I'm not being hyperbolic — AI-assisted development has changed what a small team can ship in a week, a month, a quarter. The bottleneck on producing code is effectively gone.
But the bottleneck on understanding code hasn't moved.
Theory-building is still bounded by human cognition. You can't 10x the rate at which a person builds a mental model of a system. You can ship 10x faster, but you can't comprehend 10x faster. And when shipping outpaces comprehension, you get a new kind of failure that Naur didn't anticipate — not theory death from the team dissolving, but theory deficit from the team falling behind while still in the room.
The Silent Failure Problem
There's a specific failure mode that only exists at AI-scale output, and it's the one that scares me most: AI-generated infrastructure that silently skips what it's supposed to enforce.
CI pipelines, monitoring configs, Terraform modules, test suites — the code that tells you whether the rest of your code works. When an agent writes a CI config, it produces something that looks correct, passes review, and quietly skips the check it was supposed to run. A workflow that doesn't trigger on the right events. A test suite that exits early under certain conditions. A monitoring rule that filters out the exact class of errors you need to catch.
When PRs were 200 lines and a human wrote "skip this test because X," you caught it in review. At AI-scale output, the bullshit is invisible. False confidence is worse than no confidence — at least with no confidence you know to look.
This is qualitatively different from traditional bugs. A bug in application code breaks a feature and someone notices. A bug in CI silently removes the thing that would have caught the next bug. It's a failure in the immune system, not the body. And the team doesn't know it's sick until the symptoms are undeniable.
The All-Hands Moment
I recently sat in a room where a team walked through every user journey in their platform. Not the demo paths — the real flows, the edge cases, the routes nobody checks on a regular Tuesday.
Almost every single journey had a dozen meaningful bugs or completely broken features. Regional components weren't shipping because of five missing lines in a Terraform config. CI was too complex and was silently failing — or just not triggering certain workflows. Features that had been "shipped" weeks ago had never actually worked end-to-end.
- The code had shipped. The deploys had succeeded. The CI was green.
- And almost nothing worked the way it was supposed to.
This is what theory deficit looks like when it becomes visible. The gap between what was shipped and what was understood had been widening for months, silently, under the surface. It took walking the product — every flow, every edge — to see it.
Theory Deficit Compounds
The dangerous thing about theory deficit is that it compounds silently. Each feature built on top of a misunderstood foundation adds to the debt. But unlike technical debt, theory debt can't be refactored. You can't write a script that retrofits understanding into a team. You can't merge a PR that gives someone the mental model of why the system works the way it does.
- Technical debt accumulates in the code. You can measure it, prioritize it, and pay it down with focused engineering work.
- Theory debt accumulates in the team. The only way to pay it down is for humans to do the slow, unglamorous work of understanding what they've built.
And here's the compounding effect: the less the team understands, the more they rely on AI to navigate the codebase. The more they rely on AI, the less they build new understanding. The gap widens. The next all-hands walkthrough is worse than the last one.
Keeping Theory Pace with Shipping
If theory-building has a speed limit, the answer isn't to stop shipping. It's to build practices that force understanding to keep up. Here's what I've found works.
Fresh-Context Review
Anthropic published research on effective harnesses for long-running agents showing that fresh-context agents outperform agents with accumulated context. The accumulated context degrades judgment — the agent becomes invested in its own decisions and stops seeing problems clearly.
The same principle applies to code review. An independent agent with no history of the implementation sees what's actually there, not what was intended. I use cc-debate, a Claude Code plugin that runs multiple AI reviewers in parallel and synthesizes their feedback. The fresh eyes catch what the author's eyes — human or AI — have learned to skip.
Walk the Product
Schedule regular team walkthroughs of actual user journeys. Not sprint demos. Not happy-path presentations. Real flows, edge cases, the routes users actually take. Do this often enough that the gap between "shipped" and "working" never gets large enough to be demoralizing.
This is the most low-tech item on this list, and it's the most important. There is no substitute for the team seeing — together, in a room — what the product actually does.
Treat CI and Monitoring as First-Class Code
AI-generated CI is the most dangerous code in your repository, because it's the code that tells you whether the rest of the code works. If it silently skips, everything downstream looks green.
- Review infrastructure, CI pipelines, and monitoring configs with the same rigor as application code — or more.
- Require that every CI workflow has explicit assertions about what it checks, not just that it runs.
- When a workflow is green, ask: does green mean "passed" or does green mean "didn't run"?
Shared Memory Systems: The Double-Edged Sword
Tools like Cognee build persistent knowledge graphs across agent sessions — preserving context that would otherwise be lost when a context window resets. I use Cognee as the backbone of my own knowledge base, and it genuinely helps close the theory gap by making prior decisions, architectural context, and session history queryable across sessions.
But shared memory systems are a double-edged sword. They also enable agents to ship faster and more autonomously, which can widen the gap between what's shipped and what humans understand. The memory system knows. The team might not.
Full disclosure: I don't work for Cognee — it's simply what I use. The principle applies to any shared memory tool: use it as a resource the team learns from, not as a substitute for the team building the theory themselves.
The right way to use these systems is deliberately — as a tool for humans to query and learn from, not as a crutch that lets agents operate without human comprehension catching up.
Slow Is Smooth, Smooth Is Fast
There's a mantra from the Navy SEALs training tradition: slow is smooth, smooth is fast. The idea is that rushing creates mistakes, and mistakes create rework that costs more time than the slowness would have.
Applied here: if nobody on the team can explain why a component exists, stop shipping on top of it. The pressure is always to keep moving. There's always another feature, another sprint, another deadline. But shipping on top of a misunderstood foundation is how you get the all-hands moment — the one where you walk every flow and realize nothing works.
Slowing down to rebuild understanding isn't lost time. It's the only investment that compounds in the right direction.
The New Theory Death
Naur wrote that "the death of a program happens when the programmer team possessing its theory is dissolved." That was about people leaving — institutional knowledge walking out the door. It still happens. But AI-assisted development has introduced a new variant: theory death while the team is still in the room.
The team is present. The code is shipping. The CI is green. And the understanding is falling further behind with every commit. The program isn't dead in the traditional sense. But the theory is dying — slowly, silently, one auto-merged PR at a time.
The speed limit isn't shipping. It's understanding. And the teams that respect that limit will build systems they can actually maintain, extend, and trust — long after the last commit.