Engineering That Outlasts the Paradigm

“Vibe coding” was Collins Dictionary Word of the Year 2025, a term coined by Andrej Karpathy in February of that year. In the same window, developers’ trust that AI coding tools produce accurate output fell from around 40% to 29% (Stack Overflow, 2025). Two record-setting numbers, in opposite directions, in the same year. They describe the same mistake.

The discourse split into two tribes. The vibe coder (“AI solved it; I can prompt my way to anything”). The AI denialist (“real engineers don’t use AI; whoever does isn’t really engineering”). Both treat AI as the variable. Neither is right.

A third path has been named before. Simon Willison called it vibe engineering. Addy Osmani called it AI-Assisted engineering. This post adds the evidence the third path needed: the smoking gun (the agent-skills stack is operationalized engineering literature, and you can verify it in 60 seconds), a paradigm-portable worked example, and a four-move compass. Plus the close: if AI disappeared tomorrow, persona 3 is fine, and sharper. The position was never about AI.

Key Takeaways

Both tribes share the same misdirection: they think the value question is about AI. It isn’t. It’s about engineering, which is upstream of any paradigm.

The most useful agent skills are operationalized engineering literature. Strunk’s Elements of Style (1918) through Beck’s TDD (2003); 85 years of canon, in checklist form.

The METR perception gap is the diagnostic: developers using early-2025 AI tools were 19% slower while believing they were 20% faster (METR, 2025). Persona 3 measures.

Coding got cheaper. The engineering problem didn’t.

The two tribes are arguing about the wrong thing

84% of developers use or plan to use AI tools, and 46% actively distrust them (Stack Overflow, 2025). The discourse reads this as a fight between believers and skeptics. It isn’t. Both camps share a premise: that the value question is about AI. The value was always upstream of AI.

The vibe coder tribe formed fast. Karpathy’s coinage in February 2025 (“fully give in to the vibes, embrace exponentials, and forget that the code even exists”) landed in Merriam-Webster as “slang and trending” within weeks and Collins Dictionary’s Word of the Year by year-end. 51% of professional developers now report using AI tools daily (Stack Overflow, 2025). This cohort isn’t fringe. It’s the median.

The AI denialist tribe formed in reaction. Trust in AI accuracy fell from around 40% in 2024 to 29% in 2025 (Stack Overflow, 2025). 46% actively distrust the output. Only 3% report “highly trusting” it. The frustration is genuine: 45% of developers cite “AI solutions almost right, but not quite” as their number-one daily annoyance.

But here’s the part both tribes miss. The variable that mattered in 2010, 2015, 2020, and 2025 is the same: engineering decisions made (or skipped) upstream of any code. Persona 1 treats AI as the answer. Persona 2 treats AI as the problem. Both let AI hold the steering wheel of the conversation, and the conversation drifts away from where the work actually lives.

Predecessors saw this. Willison and Osmani named the third path in 2025. What they didn’t have yet, and what this post stakes, is the empirical evidence: the agent stack everyone’s arguing about is itself a transcription of engineering classics. Open it. Read it. The books are right there.

What breaks when you skip the engineering?

AI-generated code produces approximately 1.7x more issues than human-written code (Shiplight AI, 2025). 24.2% of AI-introduced issues survive at HEAD; security issues survive at 41.1% (arXiv 2603.28592, 2026). The model fills in blanks the prompter didn’t know existed. That isn’t a model failure. It’s a missing engineering decision, after the fact, in production.

The shape of the failure is consistent. The prompted MVP works in isolation; the second feature breaks the architecture the model invented for the first. The happy path ships because the prompt described the happy path; production traffic finds everything else. Parallel prompts produce silently incompatible interfaces, and the prompter doesn’t read the diff carefully enough to catch the type drift.

The cost externality is now measurable. Code duplication rose 8x in AI-assisted repos and code churn nearly doubled (3.1% to 5.7%, 2020-2024) (Shiplight AI, 2025). Uplevel’s study of 800 developers found a 41% bug-rate increase for teams with Copilot access (Augment Code, 2024). 40% of AI-generated code in security-sensitive contexts contains critical vulnerabilities. And there’s a tail: an estimated 8,000+ AI-built startups now need rebuilds, with cleanup costs between $400 million and $4 billion (InfoQ, 2025).

This isn’t an “AI is bad” finding. Each line item maps to a missing engineering decision. The duplication is a missing second-author check. The security vulnerabilities are a missing failure-mode review. The startup rebuilds are missing scope decisions (“should we even build this”) and missing contract design (“what does this interface actually guarantee”). The model wrote the code under conditions no senior engineer would have signed off on. The variable was the engineering, not the model. For the upstream code shape that makes any of this survivable, see code shape and AI agents.

What rots when you refuse the leverage?

Senior developers report the lowest “highly trust” rate in AI accuracy (2.6%) and the highest “highly distrust” rate (20%) (Stack Overflow, 2025). The reflex is defensible. The conclusion most often drawn from it is a trap. Refusing the leverage isn’t the same as defending the craft. It freezes you in 2023.

The legitimate part of the frustration is real. “AI solutions almost right, but not quite” was the number-one frustration at 45%. The cognitive cost of detecting the “not quite” can run higher than just writing the code yourself. Greenpepper Software named the rolling tax: senior engineers face permanent “trust debt,” reverse-engineering AI-shaped logic just to ship a stable update. That’s not a complaint to dismiss. It’s a real cost of working alongside non-deterministic output.

The trap underneath the legitimate frustration is subtler. “I do TDD in my head” was a fine claim in 2018. It’s a weak claim once TDD-as-checklist exists and the head version goes unaudited. Refusing operationalized engineering tooling because the box says “AI” forfeits leverage on your own discipline. The compounding loss is invisible: senior engineers who took the multiplier in 2023 are competing in 2026 with engineers who didn’t. The gap isn’t skill. It’s rep count, with feedback loops, on the same craft.

The honest version of persona 2 isn’t “AI is bad.” It’s “I will not multiply weak discipline.” That stance is correct. It’s just misdirected. “I will not multiply AI” cedes the engineering layer where the leverage actually lives. The engineering tooling AI brought with it (and we’re about to walk through what’s actually inside it) wasn’t AI’s invention. It’s the discipline persona 2 has been quietly defending all along.

A note before moving on. The denialist instinct is what makes persona 3 possible. Without skepticism, persona 1 wins by default and the whole field gets worse. The mistake is letting skepticism close the door instead of holding it open. For the architecture that makes the leverage worth taking on, see treat AI as a team member.

The smoking gun. The skills are the books.

Look inside what makes agents actually useful. Not the model. The skills. Polya’s How to Solve It (1945). Strunk’s Elements of Style (1918). Beck’s TDD (2003). Agans’s debugging rules (2002). Meyer’s Design by Contract (1988). Span: 85 years. The most useful AI tooling is operationalized engineering literature. The paradigm changed; the books didn’t.

I went looking for what made AI agents productive on real codebases, expecting to find prompt engineering. I found the bookshelf I already owned. Eight skills from the live agent stack, mapped to the canonical text each one transcribes:

The agent stack’s value isn’t novelty. It’s operationalized discipline. Persona 1 saw magic and skipped the discipline. Persona 2 saw an AI label and refused the discipline. Both missed the same thing in different directions.

There’s a 60-second falsifier for this section. Open the test-driven-development skill in any current agent stack. Strip the YAML frontmatter and the agent-specific instructions. What remains is Beck’s Red-Green-Refactor cycle in shorter sentences than Beck wrote. Do the same with systematic-debugging. What remains is Agans’s “make it fail, quit thinking and look, change one thing at a time.” Do it with writing-clearly-and-concisely. The skill is, almost literally, Strunk’s table of contents.

I’ll be honest about one thing. The skills don’t cite these texts. They don’t have to. The disciplines walked into the operationalized form on their own, the way principles do when you put enough working engineers in a room and ask “what’s the rule we keep coming back to.” Convergence is the point. Convergence is the evidence.

A paradigm-portable decision: building local code intelligence

The agent’s bottleneck wasn’t the model. It was the agent’s search. Letting an agent grep through a 50,000-line codebase every turn is a retrieval problem, not an intelligence problem. Building an MCP indexing layer is a paradigm-portable engineering call. Decompose. Identify the real bottleneck. Build the right abstraction.

I shipped that build last quarter. The full account, with the benchmark progression and design choices, is in local code intelligence for AI agents. The short version: agents in a large codebase spent most of their tool calls re-deriving structure they had no way to retain. Each turn started over. The agent looked stupid; the agent wasn’t stupid; the agent was running blind.

I framed the problem in classical shape. Identify the bottleneck (retrieval, not intelligence). Choose the right primitive (a semantic index plus symbol resolution). Draw the boundary (an MCP server with a small surface). Test the abstraction (benchmark progression on representative tasks). Each of those decisions is the same call an engineer would have made in 2010 facing a slow internal tool. Don’t make every consumer scan; build an index.

Three things stand out when I run this through the persona frame.

Persona 1 would have lived with grep. The model was getting confused, and the model would get better. It didn’t, on this dimension. Search isn’t an intelligence problem.

Persona 2 would have refused on principle. Scaffolding for AI is admitting AI matters. Better to ship without. The cost: a year later, the same problem is still the same problem.

Persona 3 saw the engineering problem and shipped. Decomposition, retrieval primitive, contract surface, evaluation. None of that is AI-specific. The AI is incidental to the engineering.

The compounding bonus, which I didn’t expect: building the index taught me more about my own codebase in three weeks than skim-reading would have in a year. You can’t index what you don’t understand. That’s move 2 from the compass, the next section, in disguise.

What does persona 3 actually do?

Four moves. Read the skills as engineering, not AI tooling. Point the accelerator at learning, not just shipping. Own the upstream decisions. Treat the agent as a force multiplier on discipline, knowing the multiplier is negative if discipline is weak. The METR perception gap is the diagnostic: developers using early-2025 AI tools were 19% slower while believing they were 20% faster (METR, 2025). Persona 3 measures, and that’s the entire difference.

Move 1. Read the skills as engineering, not AI tooling. The TDD skill makes you a better human TDD practitioner with or without an agent in the loop. Run brainstorming on yourself before you open Cursor. Run verification-before-completion on yourself before you open a PR. The skills outlive the model. The agent stack is a syllabus. Read it that way.

Move 2. Point the accelerator at learning, not just shipping. Faster iteration is more reps. More reps is more feedback. More feedback is sharper instincts, if you measure.

METR’s original RCT (n=16, 246 issues, real open-source codebases) found the perception gap of the era: developers forecast +24% before the task, retrospected +20% afterward, and were measured at -19% (METR, 2025). The follow-up cohort with redesigned methodology came in at -4% with a wide confidence interval, and METR concluded “AI likely provides productivity benefits in early 2026” (METR, 2026).

The number isn’t the lesson. The 39-point self-perception gap is. Persona 1 ships without measuring. Persona 2 refuses without measuring. Persona 3 measures, and that’s the whole difference. For what “measure, don’t feel” looks like in a real cost-cutting exercise, see we tried to cut Claude output.

Move 3. Own the upstream decisions. Decomposition. Problem framing. Failure-surface ownership. Interface boundaries. “What shouldn’t exist.” None of this got cheaper. The model can’t do it for you because the model doesn’t have your context. The work upstream of code is still yours. For the tactical version with a worked decision matrix, see context engineering in practice.

Move 4. Treat the agent as a force multiplier on discipline. A multiplier on weak discipline is negative. The 1.7x bug rate, the 41% security vulnerability rate, and the $400M-$4B startup rebuild bill are what zero discipline times a multiplier looks like. Sharpen the discipline first. Discipline first looks like contracts before prompts, tests before commits, scope decisions before features. Then multiply. Multiply by what you have, not by what you wish you had.

If AI disappeared tomorrow

The thought experiment AI denialists have been running for two years. Run it across all three personas and the post lands.

Persona 1 with the tools removed is naked. The crutch made the engineering instinct unnecessary, so the engineering instinct never formed. The problem isn’t “I can’t ship without AI.” The problem is “I never built the muscle to know what should be shipped.” Recovery requires going back to the books, the same books the agent skills already encoded.

Persona 2 with the tools removed continues exactly as before. Day-to-day work proceeds. The discipline they have is the discipline they had three years ago. The problem is invisible because no compounding happened. The engineer who took the multiplier in 2023 has done more reps with sharper feedback in the same calendar time. That gap doesn’t close itself.

Persona 3 with the tools removed is fine. The discipline was sharpened by the multiplier; the discipline still works without the multiplier. The agent-skills checklists they ran on themselves still run on themselves. The mental models built by faster iteration are still mental models. The position was never about AI.

That’s the layer that outlasts the paradigm. Coding got cheaper. The engineering problem didn’t.

Frequently Asked Questions

Is vibe coding “real” engineering?

Vibe coding is a cultural moment, not a discipline. It works for prototypes and small apps where the engineering decisions get absorbed by the framework or the model’s training distribution. It collapses on systems where decomposition, contracts, and failure surface matter, which is why AI-built apps are 1.7x more bug-prone (Shiplight AI, 2025) and a generation of AI-built startups need $400M-$4B in rebuilds (InfoQ, 2025).

Will AI replace senior software engineers in 2026?

No, on the evidence we have. METR found experienced developers using early-2025 AI tools were 19% slower while perceiving themselves 20% faster (METR, 2025); the 2026 follow-up shows -4% with a wide confidence interval (METR, 2026). The bottleneck isn’t typing speed. It’s decomposition, problem framing, and failure-surface ownership. The model can’t do those for you.

What skills will outlast AI in software engineering?

The skills agents are operationalizing: test-first design (Beck, 2003), systematic debugging (Agans, 2002), problem framing (Polya, 1945), post-condition verification (Meyer, 1988), spec-before-implementation (Hunt and Thomas, 1999), and clear technical writing (Strunk, 1918). Read the agent-skills stack and you’re reading a syllabus.

Why do experienced developers distrust AI coding tools?

Trust in AI accuracy fell to 29% in 2025; senior developers showed the lowest “highly trust” rate at 2.6% (Stack Overflow, 2025). The frustration is real (45% cite “almost right but not quite” as their top issue), but the conclusion most often drawn (“don’t use it”) leaves leverage on the table. The third path is to use the tooling on top of operationalized discipline, not in place of it.

What does it actually look like to “use AI well”?

Read the skills as engineering literature. Run them on yourself before you ever open the agent. Measure cycle time; don’t trust the feeling of speed. Own the upstream decisions (decomposition, contracts, failure modes). Treat the agent as a multiplier on the discipline, knowing the multiplier is negative if the discipline is weak.

Conclusion

The discourse argued about AI; the work was always engineering. The agent-skills stack is engineering literature in checklist form, with span of 85 years from Strunk to Beck. Persona 1 multiplied by zero discipline. Persona 2 refused the multiplier. Persona 3 sharpened the discipline and used the multiplier; both got better.

Three things to take from this:

Both tribes share the same misdirection: the value question isn’t about AI.
The skills are the books. Read them as a syllabus, regardless of whether you ever run an agent.
Measure cycle time. Don’t trust feelings of speed. The METR perception gap is the diagnostic.

If you want to start somewhere, open one agent skill (test-driven-development, systematic-debugging, brainstorming) and read it as an engineering doc, not a tool config. That’s move one. For the architectural follow-on, treat AI as a team member is where the four moves above go to live.

The paradigm changed. The books didn’t. The engineer’s job didn’t either.