< All Videos
AI Made Experienced Developers 19% Slower (What the Research Actually Says)
Click to load video
loads third-party content from youtube-nocookie.com

AI Made Experienced Developers 19% Slower (What the Research Actually Says)

Devs claim 10x. Studies find -19%. Businesses bet on 100x. They're all right. AI productivity isn't a number, it's a jagged frontier.

Subscribe on YouTube

> Key Takeaways

  • > There is no single productivity multiplier for AI coding. The number everyone is arguing about doesn't exist.
  • > Tetris from scratch in seconds. A C compiler that's mostly broken. The frontier is jagged, not uniform.
  • > Three things shape where you land: the model, the loop architecture, and the complexity of the task.
  • > A mid-range model in a well-built loop can outperform a frontier model with no loop. The math is exponential.
  • > The frontier moves but stays jagged. Building and adapting your loops is the durable skill.

> Linked Resources

> Transcript

Show full transcript

So, I asked Claude to build me Tetris from scratch. One prompt and back comes a fully playable game. A thousand times faster than a human could write that code. Anthropic asked their models to build a C compiler from scratch. What they got back was mostly broken and heavily supported by external effort and honestly overhyped. So, what is it? A thousand times better or a mass-produced mess?

There is no single productivity multiplier for AI coding. That's the whole problem with this conversation. Before we get into the research, let's put one thing on the table. More output doesn't guarantee better results. Even if you had a genuine 10x productivity multiple, you have no guarantee you'll 10x revenue. You still have to know what to build.

Strategy still counts for a lot. So, we're not here to hype. We're here to figure out what's actually going on. I'll introduce a better way to think about this, the jagged performance frontier. We'll break down what shapes it and look through that really quickly together. And we'll take a look at where all of this stuff is heading. Developers on social media are claiming 10x and sometimes a 100x productivity.

Businesses are budgeting for mass AI-driven efficiency. CEOs are telling their boards this will cut their head count. Then you look at the research. The MITRE study tested experienced developers on their own open-source code bases and found they were 19% slower with AI tools. And these developers believed they were faster while measurably being slower.

Stanford found 12 to 31% improvements, but only on simple, well-defined tasks. The remote labor index found AI could only complete 2 and 1/2% of real-world freelance tasks. So, you have developers saying 10x, studies saying minus 19, and businesses is expecting 100. So, where does the real number land? Here's the thing. Most of these studies were looking at Copilot-style auto complete, hitting tab in your editor.

That's a completely different setup from agentic loops with tests, type checking, tool access, running on its own. We're comparing different things and pretending they're the same. Even with agentic approaches, the setups vary wildly. Do you have Playwright? Do you have tests? Are you using TypeScript? Does the model have access to your code base?

These things matter enormously, and most studies don't control for any of these. The reason nobody can agree is they're all looking for a single coefficient, a global multiplier you can apply all development work. AI makes developers X% faster. Done. But that's not how this works. Tetris from scratch, a thousand times faster than a human. A C compiler from scratch, worse than doing it on your own.

A CRUD app with a database, probably 5 to 10x. A novel distributed consensus algorithm, good luck. It was never one number. It's a A jagged performance frontier where the tasks, the setup, the domain, they all sit at completely different points. Three things shape where you land on this frontier. First is the LLM itself. Yes, a more capable model gives you better results, but honestly, this is the least interesting lever to pull on.

Second, and this is the big one, your loop architecture. The scaffolding around the LLM, the tests, the type checking, the browser automation tools access, file access. Here's why this matters so much. Think of each step an agent takes to flip a coin. At 90% accuracy per step in predicting, you get about 10 steps before something goes wrong. You bump that up to 95% and the expected chain length doubles to 20.

It's not a linear improvement in percentage, it's exponential. Every verification layer you add to the loop that pushes per step accuracy up a few points of percentage, tests that catch errors, a type compiler that rejects bad code, Playwright that checks the actual browser, a small bump in per step accuracy makes a huge difference in how far the agent can get on its own.

This means a mid-range model inside a well-built loop can outperform a frontier model with no loop. Third is the complexity and nature of the task itself. Some tasks are just fundamentally more agent-friendly than others. Tetris has clear rules, clear success criteria, well-understood patterns. An agent can get to the right answer because it's easy to check if it's right.

A C compiler, everything depends on everything else, correctness is subtle, edge cases pile up. The agent can't check its own work because the problem is just too complex for simple tests. Same model, same loop, wildly different results depending on where the task sits on that complexity spectrum. The frontier isn't static. It moves. Inference costs have collapsed a thousandfold in 3 years.

Models get better and cheaper almost for free every year, but they get better at what they're trained on. Code keeps improving because there are mountains of training data and clear correctness signals. Telling a new joke, it's still terrible at that. The frontier shifts, but it stays jagged. And something people often miss is that their loops are coupled to the models they're using.

A loop you built for how a particular model fails might be waste on a next-generation model. All those guardrails and retries and defensive scaffolding might not apply anymore. Or worse, it might actively fight the model's strengths. When the model changes, your loop might need to change with it. So, the frontier moves, the shape stays jagged, and your loops need to evolve alongside the models.

Building the loop isn't a one-time thing. You need to keep tuning it. The skill is building and adapting to these loops. Understanding what scaffold, how to verify, and how to build these feedback That's what determines where you sit on And that is learnable. This channel is about exactly that. How to build these loops and structure them and keep adapting as the models evolve.

Practical, hands-on, no hype. If you want to dig into the research on your own, I have a survey of over 1,200 research papers linked in the description. And if your team is trying to figure this out on your own, I run hands-on workshops. The link's down there, too.

> More Videos