DEV Community

Cover image for The 66% Problem
Evan Lausier
Evan Lausier

Posted on

The 66% Problem

I spent three hours last Tuesday chasing a bug that didn't exist.

The code looked perfect. It was syntactically correct, followed best practices, and even had thoughtful comments explaining what each function did. The problem was that one of those functions was solving a problem I never asked it to solve. Claude had decided, in its infinite pattern-matching wisdom, that my API endpoint needed pagination. I hadn't asked for pagination. I didn't want pagination. But there it was, breaking my response structure in ways that took me longer to diagnose than it would have taken to write the whole thing myself.

This is the 66% problem.

According to Stack Overflow's latest developer survey of over 90,000 developers, 66% said their biggest frustration with AI coding assistants is that the code is "almost right, but not quite." Another 45% said debugging AI-generated code takes more work than it's worth.

I find those numbers oddly comforting. Not because I enjoy suffering, but because it means I'm not losing my mind. The tools that were supposed to make me faster have introduced a new category of bug that didn't exist three years ago: the bug that looks like working code.

Here's the thing about completely wrong code. It fails loudly. It throws errors. It refuses to compile. Your test suite catches it. You fix it and move on with your life. But almost-right code? That's the code that passes your tests, ships to staging, and then does something subtly insane at 2am when your biggest client runs a batch job you forgot they were running.

The old bugs were honest. They announced themselves. These new bugs are polite. They wait.

I've been writing code professionally for decades... I've made every mistake you can make. I've shipped SQL injection vulnerabilities. I've accidentally deleted production data. I once re-imaged over a production database locking myself out.Those were my mistakes, and I understood them immediately when they blew up. The feedback loop was tight: I did something dumb, the system complained, I learned not to do that again.

The AI-generated bugs don't work that way. When something breaks now, my first question isn't "what did I do wrong?" It's "what did the AI do that I didn't notice?" That's a fundamentally different kind of debugging. Instead of understanding my own logic, I'm reverse-engineering someone else's assumptions about what I probably wanted.

Microsoft Research published a study earlier this year that quantified this. They tested nine different AI models on SWE-bench Lite, a benchmark of 300 real-world debugging tasks. The best performer, Claude 3.7 Sonnet, solved 48.4% of them. Less than half. These weren't exotic edge cases. They were the kinds of bugs that wouldn't trip up an experienced developer.

The models are phenomenal at writing code. They struggle to fix it.

This makes a perverse kind of sense when you think about how they work. Code generation is pattern completion. You give the model a prompt, it predicts what code probably comes next based on billions of examples. That's genuinely useful for boilerplate, for syntax you've forgotten, for exploring unfamiliar libraries. But debugging isn't pattern completion. Debugging is hypothesis testing. It requires understanding what the code is supposed to do, what it's actually doing, and why those two things are different.

That "why" is where everything falls apart. The AI doesn't know why your system is architected the way it is. It doesn't know about the business rule your CEO insisted on in 2019 that makes no logical sense but accounts for 40% of your revenue. It doesn't know that your database schema has a quirk because you migrated from Oracle fifteen years ago and nobody wants to touch it. It just sees patterns and matches them.

The METR randomized trial from July 2025 found something that should concern all of us. They had experienced open-source developers complete tasks with and without AI assistance. The AI group was 19% slower on average. But here's the part that keeps me up at night: they believed they were 24% faster. Before starting, participants predicted AI would speed them up. After finishing, even with slower results, they still thought it had helped.

We're not just getting almost-right code. We're getting almost-right code while feeling productive. The dopamine hit of instant completion masks the debugging debt accumulating behind us.

I'm not going to tell you to stop using AI tools. I use them constantly. But I've started treating them differently than I did a year ago. I used to accept suggestions and move on. Now I read every line like it was written by a junior developer who's very confident and moderately competent. Because that's essentially what it is.

The 66% aren't complaining because the tools are bad. They're complaining because the tools are good enough to be dangerous. A hammer that misses the nail is annoying. A hammer that hits almost the right spot is how you end up with a crooked house.

I don't have a solution. I'm not sure anyone does yet. The tools will get better. The context windows will get longer. The models will learn to ask clarifying questions instead of assuming. Maybe.

Until then, I'm keeping my print statements close and my test coverage closer. Some skills don't need to be automated. They need to be sharpened.

Top comments (30)

Collapse
 
sylwia-lask profile image
Sylwia Laskowska

Oh yes, I remember when my UX designer wrote his very first piece of Python code in ChatGPT 😄
It worked at first, but once he asked for “optimization”, something broke. In the end he asked me to take a look. I removed about 80% of the code because it was adding things he didn’t actually need, tweaked the main function a bit - and it worked.
He decided I was a genius after that 😄

Collapse
 
pascal_cescato_692b7a8a20 profile image
Pascal CESCATO

You're a genius anyway 😉 – that said, I've written code with chatGPT, Claude, and many others… for simple cases, it's fine. But for production-ready code… hmmm, you understand. And debugging is the same. Asking it to find the error in simple code, sure – but if there's functional logic behind it, not detailed… of course, it won't find it. Worse, it might rewrite the code for you, with terrible self-assurance, explaining that it found the error… and your code will become an ocean of nonsense.

Collapse
 
shitij_bhatnagar_b6d1be72 profile image
Shitij Bhatnagar

Agree, and post AI code generation, one can go on telling AI about its code mistakes, it will apologize politely, show some crappy fix .. for me, the amount of time lost in getting AI to produce better code could have been better utilized to train a human or fix the issue myself.

Collapse
 
sylwia-lask profile image
Sylwia Laskowska

Haha 😄 definitely not a genius — unless we’re talking about a genius of chaos 😂

But yes, totally agree. For simple cases, quick prototypes, or a first pass at debugging, it can be really helpful. For larger projects with real business logic and context… well, that’s a whole different story — and a pretty risky one 😅

Thread Thread
 
pascal_cescato_692b7a8a20 profile image
Pascal CESCATO

Hey! A genius of chaos is still a genius 😄

Thread Thread
 
evanlausier profile image
Evan Lausier • Edited

LOL I donno @sylwia-lask .... I've read a lot of your stuff... you might be.. 😊 strict equality one was really good.

I don't think I realized how widespread this was... I thought it was just me for a while.😂

Sounds like were all sharing in the fun haha

Thread Thread
 
sylwia-lask profile image
Sylwia Laskowska

Haha 😄 careful, you’re raising expectations now!

Don’t worry - I’ll balance it out soon. Tomorrow I’ll probably publish a post about how bad I am at CSS 😂

Collapse
 
evanlausier profile image
Evan Lausier

HA! That's so funny! 😂

Collapse
 
fredbrooker_74 profile image
Fred Brooker

I feel your pain

I've spent 5 hours debugging nonexistent bugs until it was clear that the unit tests were flawed 😂

simply put - Gemini created bad test cases and could not solve the problem afterwards
😂 😂 🍸 💩

Collapse
 
evanlausier profile image
Evan Lausier

Oh my!! 😂😂 That is like the AI version of "The Good Idea Fairy" 😂

Collapse
 
webdeveloperhyper profile image
Web Developer Hyper

I always check AI outputs carefully and ask follow-up questions about the code. Sometimes I also go back to the official documentation to verify whether the AI’s output is really correct. Even so, I still miss bugs that I didn’t anticipate.😭
However, AI is improving very quickly and getting better day by day. So I believe that as my skills improve and AI coding improves as well, the number of bugs will decrease in the future.

Collapse
 
evanlausier profile image
Evan Lausier

It really is. I find myself using it more and more for quick analysis when I am strapped for time. It more often than not points me in the right direction but is not quite there in the detailed root causes.

Collapse
 
kc900201 profile image
KC

Microsoft Research published a study earlier this year that quantified this. They tested nine different AI models on SWE-bench Lite, a benchmark of 300 real-world debugging tasks.

@evanlausier could you share which research study that was done by showing any link or reading reference? I'm interested to know on which factor did they make the study.

Some skills don't need to be automated. They need to be sharpened.

Agree with this. Moreover, we need to set up the metrics on how they can be improved. It would be better if we could set up a benchmark for the specific skills to match the market's expectation.

Collapse
 
richardpascoe profile image
Richard Pascoe

Thank you, Evan. A truly thought-provoking post. I've often wondered about the productivity claims surrounding AI, as expressed in a recent discussion post - AI Productivity Gains? - but your words really put the reality of the situation into sharp focus and I really appreciate that.

Collapse
 
marina_eremina profile image
Marina Eremina

It’s great to see these Stack Overflow surveys, right? I also find it reassuring when my own opinion about a tool aligns with what a large part of the community thinks, looks like many of us try it out and reach similar conclusions :)

Collapse
 
ben-santora profile image
Ben Santora

Your comment made me realize that I haven't been to the Stack Overflow site in years. I need to visit it again.

Collapse
 
evanlausier profile image
Evan Lausier

LOL right?

Collapse
 
evanlausier profile image
Evan Lausier

I know right!? I thought I was the only one too!!

Collapse
 
hariprasadraja profile image
Hariprasad

I have been struggling on this problem from months now. one of a junior developer written 6000+ lines of testing code in a single PR. While reviewing it, I wonder do he actually read all these before sending it for my review. In a highly productive month, I would have written a 1000 to 2000 lines max for a month of work, but seeing such a number in a single PR is so insane.

Collapse
 
evanlausier profile image
Evan Lausier

Omg, yea thats crazy!

Collapse
 
aloisseckar profile image
Alois Sečkár

"almost right, but not quite" - every single time I try to generate an AI image...

Collapse
 
shitij_bhatnagar_b6d1be72 profile image
Shitij Bhatnagar • Edited

I fully agree with the intent of the article and the findings and especially, 'debugging isn't pattern completion. Debugging is hypothesis testing'. Debugging is often not discussed much but every engineer knows that at times it can be very frustrating to debug code, even the code you may have written or reviewed/approved earlier and debugging skills are 'non-negotiable', they can be a differentiator as well.

The whole narrative around code generation by AI is broken, because at the end of the day, if I were a production manager, I would never trust AI generated code and when something breaks, a bot will not fix, I need a man and that man needs to be confident on the code itself, that link itself is missing.

AI code can be an unverified assistance at most, not the main actor. I have also noticed how context-less the remarks from AI code review bots can be (take the online git management tools) on MRs, and that's because they are just following some rules fed to them, not because they found some real bug in your code. I feel, Find Bugs and PMD were more predictable than these AI code review bots.

Thanks for the article.

Collapse
 
evanlausier profile image
Evan Lausier

Right? Thank you so much!! Im really glad it resonated. It probably helped the article that I spent half the morning debugging something one of my junior resources got from using an AI code tool 😊

Collapse
 
vasughanta09 profile image
Vasu Ghanta

Nailed the "66% problem"—AI's polite, lurking bugs are stealthier than honest failures; treat it like overconfident juniors and double-test to sharpen those irreplaceable debugging instincts!

Collapse
 
evanlausier profile image
Evan Lausier

Love it! "overconfident juniors" 😂

Some comments may only be visible to logged-in visitors. Sign in to view all comments.