Leon Pennings

Posted on Nov 26, 2025 • Originally published at blog.leonpennings.com

The TDD Trap: How Test-First Becomes Bad Design Forever in Most Teams

#softwareengineering #tdd #ddd #testing

1. Introduction: The Myth of Emergent Design

Test-Driven Development (TDD) promises a simple, seductive idea:

If you write tests first, good design will “emerge” naturally.

For two decades, this has been repeated across conferences, blogs, and books.

It sounds so logical: specifications become tests, tests become documentation, and architecture improves because you only build what’s required.

But the promise hides a structural flaw that quietly ruins codebases every day:

TDD forces design decisions long before the problem is understood.

Instead of guiding us toward good architecture, it often hardens misunderstanding into the foundation of a system. And by the time real complexity appears, the codebase is already littered with brittle tests defending a design that no longer fits the domain.

In other words:

Most teams using TDD end up preserving the wrong design — forever.

2. The Three Assumptions TDD Relies On — And Why They Fail in Reality

TDD depends on three background assumptions that rarely hold in real-world projects.

Assumption 1 — We know all relevant behavior when we write Story #1

In practice:

The first story exposes maybe 10–25% of real use cases.
The domain is still fuzzy.
Stakeholders don’t yet know what they really want.
Edge cases appear only after multiple iterations.

And most importantly:

True constraints never surface until deeper stories arrive.

This means the first tests encode behavior that is, at best, partial — and often simply wrong.

Assumption 2 — Behavior-first automatically leads to good structure

This is the philosophical core of TDD:

“Design emerges from tests.”

But what actually emerges is:

structure optimized for the first features,
procedural workflows disguised as architecture,
classes shaped by testability rather than by domain meaning,
boundaries that reflect user-story order, not domain reality.

TDD encourages what Kent Beck calls “the simplest thing that could possibly work.”

The problem?

The simplest thing is almost never the right thing when the domain is not yet understood.

Assumption 3 — Tests provide a stable foundation for evolution

TDD assumes tests behave like a safety net.

But early tests typically:

encode misunderstandings,
lock in accidental complexity,
constrain future refactoring,
break the moment domain insight changes the model.

So instead of enabling refactoring, they discourage it.

The foundation cracks the moment reality diverges from initial assumptions — which it always does.

3. The Reality Gap: Why Early Tests Become Design Handcuffs

The First Implementation Is Guaranteed Wrong

If your initial understanding covers only ~20% of actual scenarios, then your initial tests encode only that 20%.

This has two consequences:

Your initial implementation is necessarily incorrect.
Your test suite enforces that incorrect design with mechanical precision.

Developers soon face a dilemma:

Preserve the wrong design to keep the tests green

or
Break the tests (often hundreds of them) to fix the model.

Teams almost always choose the first option.

Behavior Becomes a Straightjacket

Because TDD ties structure directly to behavior, every new insight becomes expensive:

New domain invariants contradict earlier tests
Structural refactors break dozens of test fixtures
Changes require rewriting test doubles, mocks, scaffolding

This makes structural correction harder over time, not easier.

The system becomes “correct according to outdated tests,” instead of “correct according to the real domain.”

4. How TDD Encourages Design Optimized for Testability, Not Quality

TDD tries to force design from the outside-in.

But what it typically produces is:

tiny methods created only to isolate dependencies
overly granular classes driven by the desire to mock everything
procedural workflows because domain models are slow to emerge
interfaces created only to facilitate mocking
over-abstracted layers because TDD discourages cohesive aggregates

This results in systems that look clean in isolation, but collapse under the weight of real complexity.

As a result:

The architecture reflects the order of stories, not the structure of the domain.

That is the core flaw.

5. What Real-World Experience Shows (Across Many Teams)

Across industries — finance, government, logistics, compliance — the pattern is consistent:

Teams begin with enthusiasm for TDD
Early progress feels great
Test suites grow quickly
Then domain complexity appears
And refactoring becomes painful
And tests turn into liabilities
And the architecture fossilizes

Teams rarely admit this publicly, but privately the story is common:

The tests start driving the design, instead of the domain.

It’s not that tests are bad.

It’s that tests written before understanding create enormous inertia.

6. The Critical Variable: Team Maturity

Whether TDD helps or harms a team correlates strongly with team maturity.

Low-to-mid maturity teams (most teams)

still learning the domain
still learning modeling
still forming architectural habits
still discovering edge cases
have high turnover or low domain continuity

For them, TDD amplifies instability:

They lock misunderstandings into the code
Refactoring becomes scary
Tests break constantly
Stress levels rise
Architecture emerges accidentally
“Green test = good design” becomes a substitute for thinking

High maturity engineering teams (rare)

Some highly experienced teams can use TDD as a consistency tool.

Not as a design method, but as a regression net.

The difference is profound:

They model before they test
They refactor aggressively
They throw away early tests
They don’t treat TDD dogmatically
They evolve tests along with understanding
They prioritize the domain, not the test suite

TDD “works” for mature teams precisely because they don’t follow TDD as originally prescribed.

7. So Should You Use TDD? My Answer: Almost Never as a Design Philosophy

Tests are good.

Automation is good.

Confidence is good.

But using tests as the engine of design is:

risky
expensive
rigid
overly optimistic
and counterproductive to long-lived domain models

In complex systems, design must come from understanding, not from initial behavior guesses.

Use tests to lock in insights once you actually understand the domain.

Not before.

That is the sustainable path.

8. What To Do Instead: A Domain-First Approach

If not TDD-first, then what?

1. Start with modeling, not tests

Sketch domain concepts.

Identify invariants.

Find aggregates.

Understand constraints.

Tests should validate these insights — not substitute for them.

2. Implement core domain logic directly

Don’t fragment it for testability.

Keep it expressive and cohesive.

3. Add tests once the model stabilizes

Now automation works with the domain, not against it.

4. Use tests as regression, not prophecy

Tests should confirm correctness — not predict future structure.

9. Conclusion: TDD Is Not Evil — Just Misapplied by Most Teams

TDD is not a bad idea in theory.

It’s just the wrong tool for the wrong stage of development.

It works beautifully when:

the domain is trivial,
the problem is well known,
or the team is extremely mature.

But for most real-world, evolving domains, TDD creates structural debt disguised as good engineering.

The truth is simple:

If you don’t fully understand the domain yet, TDD will lock misunderstandings into your architecture.

Test-first becomes mistake-first.

And mistakes, once encoded in hundreds of green tests, have a way of staying forever.

Top comments (12)

david duymelinck • Nov 26 '25

I think the core of the post is:

Preserve the wrong design to keep the tests green
or
Break the tests (often hundreds of them) to fix the model.

If you are not willing to break the tests, the application is not advancing.

Instead of write tests first, I have a write tests as early as possible attitude.

If I'm still exploring the domain, broken tests or throwing away loads of tests are part of the progress.

Software is not like a house, you can replace load bearing walls if you have a good strategy.

Leon Pennings • Nov 26 '25

The core of the article is that tests are for testing, not for designing. It's like using a fork to eat soup - wrong tool for the job.
As long as tests are used to protect non-negotiable behaviour, they’re an excellent tool.
If they’re used to avoid or replace proper domain modelling, then they’re being misapplied.

david duymelinck • Nov 26 '25 • Edited

You are right. But to be honest I never had the idea using TDD as a modeling tool.
First you model, then you write the tests that support the model. Then write the code.
When the model changes, write the tests for the updated model and write the code.

Treating tests like they are written in stone, is the worst thing you can do to an application. That is why I highlighted that part as the core.
I don't think TDD is to blame for people not willing to put in the work.

Leon Pennings • Nov 26 '25

Yes, true — the fundamental goal of TDD is to fulfill the behavior defined by the tests. That’s why TDD isn’t inherently a design driver: it simply tells you to make red → green.

The problem arises when “green” is treated as a definition of done. The continuous design of a rich domain model becomes invisible in the process and is all too easily skipped. That’s why the risk of treating TDD as a false idol is very real.

In my experience, rich domain models don’t suit TDD very well. Implementing the model is part of discovering and learning about it, and there’s often no single “behavior” to capture at the start — writing lots of tests for evolving domains just isn’t practical.

david duymelinck • Nov 26 '25

The problem arises when “green” is treated as a definition of done.

Isn't that another way of saying don't change the tests?

writing lots of tests for evolving domains just isn’t practical.

How can you write lots of tests for an evolving domain? You can only write tests for the parts you know.
What is the chance the base functionality of an evolving domain changes?

Leon Pennings • Nov 27 '25

No, it’s not another way of saying “don’t change the tests.”
What I mean is that the initial implementation is often treated as complete the moment the tests turn green. The task becomes “fulfill the user story,” not “first understand the domain.” The ongoing design of a rich domain model becomes invisible, and that’s where the risk lies.

When I say a domain is evolving, I mean our understanding of the domain is evolving. Early on you’re not just coding entities; you’re discovering invariants, boundaries, policies, and relationships. Entities are domain objects, yes — but the domain model is much more than its entities, and those other parts tend to shift significantly as insights emerge.

Because of that, early tests rarely survive long in rich models. They’re based on the first, shallowest interpretation of the domain, so they end up fossilizing assumptions that later turn out to be wrong.

Trivial or low-level tests don’t help much here. They almost never catch real bugs, but they add friction and rework whenever the model evolves. The only tests that remain valuable are the ones that protect non-negotiable, high-level business behaviour.

If by “tests” you mean the business-facing, domain-level ones, then yes — those stay stable. But classic TDD’s fine-grained, implementation-first testing simply doesn’t match how rich domain models evolve. It just multiplies the amount of work every time the domain deepens or changes.

david duymelinck • Nov 27 '25 • Edited

I agree that using TDD by the letter is not a viable practice. But even the inventor of TDD says you don't have to do that.
It is true with every concept. You embrace the good parts, and drop the rules that make you do silly things.

The task becomes “fulfill the user story,” not “first understand the domain.”

That looks to me like a developer problem, instead of a TDD problem.
It is not because an architect created the domain, I as a developer mindlessly follow what has been modeled.
The best way of working is by getting everyone on the same page. Sometimes it is a technical issue that makes it not possible to follow the model, other times it is the architect that found a better model. It is working together to create the product that makes is good.

early tests rarely survive long

That is a given, if you are working with models or not. But this brings us back to not willing to do the work.

Like I mentioned before, I'm not a TDD user. But I do like TDD as writing tests as early as possible. When I have the first piece of code that feels solid enough to build on, I start writing tests.

Leon Pennings • Nov 27 '25

I think this is where we see things differently.

Writing tests for the sake of writing tests quickly turns into a “must-have,” and for trivial logic those tests will never catch a real bug — they only duplicate the implementation and later become friction. As long as a process (like strict TDD) implicitly pushes you toward lots of granular tests or high coverage, it guarantees extra work every time the domain shifts.

I’m absolutely in favour of putting in the work where it matters: deep domain behaviour, non-negotiable rules, regression around complex interactions. But if extra work can be avoided by not writing tests that will never find an actual defect, that seems like a much more practical trade-off. In fact, I’ve seen this approach result in fewer defects than a large test suite. A huge number of tests often gives a false sense of security rather than real safety.

For me, that’s why rich domain modelling and strict TDD don’t pair well: one is exploratory, while the other front-loads test obligations before the model is even understood.

david duymelinck • Nov 27 '25 • Edited

Writing tests for the sake of writing tests

I agree that isn't useful.

I'm writing tests to make sure when I refactor the existing behavior keeps on working.
Of course if the behavior changes, the test have to change, but then the tests make sure the extended behavior keeps working if that hasn't changed.

Most bugs appear because there is nothing to monitor the code changes. For me it is the monitoring that is the main goal of having tests. Additionally it provides developers with code examples, but that is just a bonus.

TDD also doesn't require you to write test for test sake, that is the interpretation of the people that use the concept. Literal reading of a theory does more evil than good.

David Sporn • Nov 27 '25

Fundamentally so true.

TDD can just verify that code modifications do not break covered existing behaviours ; if written before writing operational code, it can prove that the modification complies to given specifications ; and thus it serves as an element of proof that the job is "done". And that's already a great thing, one don't need to "believe me", the tests are there for that.

Design emerge when software engineer :

are fed with enough specifications to have a glimpse of the big pictures
ask questions to challenges the specs and refine them ; the more one has worked on diverse projects and domain, the easier to identify and ask the better questions
are generally given time to reflect on their work as a whole, anticipate probable direction of the project, so that they can provisions some architecture groundwork that will nudge that so elusive design to emerge.

Or in short, the right design emerge when software engineer have a clear idea of the big picture, instead of a little snippet.

Leon Pennings • Nov 27 '25

You’ve hit the nail on the head. I completely agree — developers are most effective when they truly understand the business domain. Treat them like assembly-line workers, implementing snippets without seeing the bigger picture, and you end up with bloated applications, technical debt, and fragile designs. Rich domain modelling requires discovery and understanding, not just executing “red → green.” In my experience, TDD and large test suites often only add to the bloat. There’s no real shortcut for ensuring developers deeply understand the business domain.

Nadeem Zia • Nov 28 '25

Good information provided

View full discussion (12 comments)