Fix It at Merge Time, or Fear It Forever

AI made writing code cheap. Understanding it is still the hard part.

People are starting to call the gap comprehension debt: the difference between how much code exists in your system and how much of it anyone on the team could explain at 2am. O'Reilly Radar wrote about it recently, and the term is sticking. Classic technical debt announces itself. Slow builds, tangled modules, that one file nobody wants to touch. Comprehension debt is sneakier. The tests are green. The deploy succeeded. Then something breaks at the worst possible moment and nobody can trace why.

We've been building toward this at QEEK. This post is about why we think merge time (not pre-merge review, not static analysis, not "review harder") is where a lot of quality work actually needs to happen now.

Flow from AI code generation through pre-merge review to merge, splitting into comprehension debt accumulation versus merge-time ledger with Track and FYI items — Generation got cheap; verification and memory did not. Pre-merge optimizes for shipping. Post-merge is when you still remember why, and when a ten-minute fix is still realistic.

Generation got free. Verification got harder.

Some numbers from the last year:

→AI accounts for roughly 42% of committed code today, heading toward 65% by 2027 (Sonar, 2026)
→96% of developers don't fully trust AI-generated code without checking it themselves
→38% say reviewing AI code takes more effort than reviewing human-written code
→53% report AI produces code that looks correct but introduces hidden defects (Sonar on technical debt)

Productivity didn't vanish. It just moved. The bottleneck is verifying now, and verifying at scale is a different job than reviewing a colleague's PR.

There's also an empirical study on real-world repos (arXiv, 2026) that tracked AI-introduced issues after merge. Unresolved issues climbed from hundreds in early 2025 to 110,000+ by early 2026. Plausible code merges, gets forgotten, and piles up. That's the actual problem.

Duplication is up ~48%. Refactoring activity is down ~60%. PRs are bigger. Teams ship faster. The codebase gets harder to hold in your head.

Pre-merge review can't fix this

The instinct is to add more gates: linters, SAST, AI review bots, "explain your PR" requirements. Useful, all of them. Still not enough.

Pre-merge review optimizes for throughput. Someone has another PR waiting. The diff looks fine. Tests pass. LGTM.

Comprehension debt builds in the merges nobody thinks twice about: hundreds of them where the code looked reasonable and nobody really internalized what changed. DORA-style data on high-AI-adoption teams backs this up: merge volume goes up, review time goes up too. We're moving faster through the queue, not understanding more of what's in it.

Pre-merge tools are good at security holes, lint violations, obvious bugs in the diff. They're bad at things like:

→"This extends a pattern that's already duplicated across four workflows"
→"This follows an existing convention. Is that convention still what we want?"
→"This PR touched deploy CI. Here's some adjacent ops debt worth writing down"

That's organizational memory. It doesn't fit in a blocking comment on line 47.

The missing layer: merge-time memory

There's a better window to capture quality: right after merge, when the feature is already shipped, context is still warm, and the fix is still cheap.

Merge Insights is what we built for that: a post-merge review that runs when a PR merges to main, surfaces findings as Track (worth putting on the backlog) or FYI (useful context), and leaves a record tied to that specific merge.

Not a blocker. Not a bot nagging you on every line. More like a ledger.

Why post-merge?

Context is still warm. The author (or the AI session that wrote most of the code) still remembers why the choices were made. A ten-minute fix today becomes a three-point refactor in six months when nobody remembers the PR existed.

The question changes. Pre-merge asks "should we ship this?" Post-merge asks "what should we remember?" Comprehension debt needs the second question answered, and pre-merge review mostly doesn't ask it.

Adjacent debt shows up. A small CI fix might also extend a repo-wide staging pattern or copy auth setup from another workflow. The PR didn't create that debt, but someone just touched that area, which is a natural moment to log it.

We've heard versions of this in customer conversations (anonymized):

I want my engineers to run this and clean up the codebase, not PMs showing up to meetings knowing more about the code's problems than the engineers do.
— Head of Product, enterprise financial data platform

We already built an internal chat tool for tech debt identification and sloppy code detection. We haven't gotten to automatic spec generation grounded in the architecture yet.
— CDO, consumer technology company (~15 engineers)

Everyone is independently building markdown files to map out the system. The auto-generated architecture view would be directly useful.
— Engineering lead, post-acquisition integration

There's a flood of slop from people who don't understand what they're generating. Once a team outsources the thinking to the model, they've lost.
— Founding engineer, agent tooling startup

Teams are already hacking around this: internal tools, markdown maps, chat-over-the-repo. The pain is real. Nobody's really productized the workflow yet.

A real example: first production run

We ran Merge Insights on our own infrastructure recently: a merged PR that added staging deployment for Firestore indexes. Staging had been missing composite indexes and an audit-log query was returning 500s.

The PR was small and correct: ~36 lines, pinned firebase-tools, added a deploy-staging job gated on the same flag we use elsewhere. It fixed the outage.

The post-merge review came back with 2 Track and 6 FYI items. Two of the Track findings:

Track — workflow duplication

The new staging job copied auth setup, gating logic, and project targeting from our functions deploy workflow (~90% structural overlap). Suggestion: extract a reusable base workflow next time someone touches CI. Maybe 2–3 points of work.

Track — deploy ordering

Production deploy runs before staging (needs: deploy). For index deploys that's low risk (indexes are additive) but it inverts validation order and matches a repo-wide pattern we probably ought to document. Suggestion: staging-first with a manual prod gate, in a future refactor.

Blockers? No. The PR fixed a real problem. Could pre-merge review have caught these? Maybe, but reviewers were asking "does this deploy indexes to staging?" not "what CI patterns did we just extend again?"

As Track items (not "drop everything and fix this") they're the sort of thing that disappears without a merge-time record. An engineer sees them, spends ten minutes on the duplication during the next CI change, or tickets it. Without the ledger, the same pattern gets copied into the fourth workflow and nobody notices until it's painful.

That's roughly what we wanted the product to do.

Track ≠ urgent

Quick note on semantics, because "Track" reads like severity if you're used to pre-merge review:

Badge	Meaning
Track	Worth putting on the backlog. Do it when you're next in this area. Not blocking. Not on-call urgent.
FYI	Useful context. Lower follow-up expectation.

Post-merge review isn't "you shouldn't have merged this." It's "here's what this merge touched that you might want to remember."

The thing we're worried about isn't one bad PR. It's a thousand locally reasonable merges that nobody fully internalized, until the codebase works fine but feels untouchable. I've started calling that generative drift: each change fine on its own, collectively messy.

With AI-assisted coding, deferring small cleanup compounds faster than it used to. "We'll refactor later" used to mean next quarter. Now it means after ten more AI-assisted PRs in the same module, written by people (or agents) who never built the mental model in the first place.

Fix at merge time, while context is warm and the fix is still cheap.

How this fits what you already have

Merge Insights isn't trying to replace Sonar or your pre-merge bot. Different layer:

Stack of quality layers: static analysis and pre-merge AI review before merge, post-hoc agent review during session, and Merge Insights after merge to main — Sonar catches the SQL injection. Merge Insights catches that you copied the staging deploy pattern for the fourth time. You probably want both.

Sonar catches that you introduced a SQL injection. Merge Insights catches that you copied the staging deploy pattern for the fourth time and your prod-before-staging convention now lives in three workflows.

You probably want both.

Where we land on this

Most of the AI coding conversation in 2026 is still about speed: tokens, agents, vibe coding, lines generated. That part is largely working.

The harder problem is attention: who still understands the codebase, who notices when patterns spread, who pays down debt before it turns into fear.

Comprehension debt is the actual crisis, not generation capacity. Pre-merge review can't solve it: wrong incentives, wrong timing. Merge time is the window: context still warm, fix still cheap, ledger gets written. Track and FYI work because they're memory, not gatekeeping. And the primary user is the engineer maintaining the codebase, not a PM auditing their work.

We're early. One production run on our own repo so far. But the signal from conversations (teams building internal debt tools, engineers drowning in review, markdown duct tape everywhere) suggests the problem is ahead of the product category.

If you're shipping fast with AI and starting to feel like the codebase is getting ahead of you: that's probably not paranoia. The fix isn't "review harder." It's capture what you learned at merge time, while you still can.

QEEK builds Merge Insights as part of a platform for architectural context across codebases. We're in beta — qeek.ai if you want to take a look.