🤖 AI Tools
· 6 min read

AI Testing for Legacy Codebases — Where to Start (2026)


You inherited a codebase with zero tests. The code works — mostly — but nobody wants to touch it. Every change is a gamble. Every deploy is a prayer. Sound familiar?

This is the legacy testing problem, and it’s one of the most common situations in professional software development. The good news: AI tools in 2026 have gotten remarkably good at helping you dig out of this hole. Not by magically understanding your entire system, but by doing the tedious grunt work of reading unfamiliar code and generating tests that capture what it actually does.

Here’s the practical approach.

The legacy testing problem

Legacy codebases share a few painful traits:

  • Zero or near-zero test coverage. Nobody wrote tests when the code was new, and now it’s too scary to start.
  • Tight coupling everywhere. Classes depend on other classes that depend on databases that depend on config files that depend on the phase of the moon. Isolating anything for a unit test feels impossible.
  • Fear of refactoring. You know the code needs restructuring, but without tests, any refactor could silently break something. So nothing changes.
  • Lost context. The original authors left years ago. Documentation is sparse or wrong. The code is the only source of truth, and it’s not talking.

The traditional advice is “just write tests.” But where? The codebase has 200,000 lines. You can’t test everything, and you don’t understand half of it. This is exactly where AI becomes useful — not as a replacement for thinking, but as a way to accelerate the most tedious parts of the process.

Where to start: critical paths, not 100% coverage

Forget 100% coverage. That’s not the goal and never was. For a legacy codebase, you want to identify the critical paths — the code that runs during your most important user flows — and test those first.

Start by asking:

  1. What breaks most often? Check your bug tracker and incident history. The modules that generate the most production issues are your first targets.
  2. What are you afraid to change? That function everyone avoids touching? That’s a sign it needs characterization tests before anything else.
  3. What’s on the roadmap? If you’re about to modify a module for a new feature, test it first. This gives you a safety net before you start cutting.

You don’t need to understand the entire system. You need to understand the parts you’re about to work on. This is a critical mindset shift — testing legacy code is an incremental activity, not a big-bang project.

Characterization tests: documenting current behavior with AI

A characterization test doesn’t verify that code is correct. It verifies that code behaves the way it currently behaves. That distinction matters enormously for legacy systems.

The idea comes from Michael Feathers’ Working Effectively with Legacy Code: run the code, observe what it does, and write a test that asserts exactly that. If the behavior changes later, the test fails, and you know something shifted — whether intentionally or not.

This is where AI test generation shines. You can point an AI tool at a function or class and ask it to generate tests that capture its current behavior. The AI doesn’t need to know why the code does what it does. It reads the implementation, identifies branches and edge cases, and produces tests that pin down the outputs for given inputs.

A practical workflow:

  1. Pick a function or method on your critical path.
  2. Feed it to an AI tool — Claude, GPT, or a local model via Ollama if you can’t send proprietary code to the cloud.
  3. Ask for characterization tests: “Generate tests that document the current behavior of this function, including edge cases.”
  4. Run the generated tests against the actual code. Some will pass, some won’t.
  5. Fix the tests that fail (the AI may have guessed wrong about behavior). The ones that pass are your new safety net.
  6. Commit them. You now have regression protection for that piece of code.

This process takes minutes per function instead of the hours it would take to manually trace through unfamiliar logic. Over a few weeks, you build meaningful coverage on the parts that matter.

Using AI to understand unfamiliar code

Before you can test legacy code, you often need to understand what it does. AI is surprisingly effective at this. Tools like Claude Code can read a file or module and give you a plain-language summary of its behavior, dependencies, and side effects.

Useful prompts for code comprehension:

  • “Explain what this function does, including all side effects.”
  • “What are the possible return values and under what conditions?”
  • “List all external dependencies this class relies on.”
  • “What would break if I changed this method’s signature?”

You can also use AI for code review on legacy code — ask it to identify potential bugs, dead code paths, or implicit assumptions that aren’t documented anywhere. This builds your mental model of the system faster than reading the code line by line.

One warning: AI can hallucinate about code behavior just like it hallucinates about anything else. Always verify its explanations by running the code. The characterization tests you generate serve double duty here — they confirm (or deny) what the AI told you about how the code works.

The strangler fig pattern for adding tests incrementally

The strangler fig is a tree that grows around its host, gradually replacing it. The same pattern works for testing legacy code:

  1. Don’t rewrite. Don’t try to refactor the legacy code into something testable all at once. That’s how rewrites fail.
  2. Wrap and test. When you need to modify a legacy module, first write characterization tests around it. Then make your change. Then verify the tests still pass (or update them intentionally).
  3. Extract when ready. Once a module has decent test coverage, you can start extracting pieces into cleaner, more testable structures. The tests protect you during the extraction.
  4. Repeat. Each time you touch a part of the codebase, you leave it a little better tested than you found it.

Over months, the tested portion of the codebase grows organically around the untested core — like a fig around its host tree. You never need a dedicated “testing sprint” or a mandate to stop feature work. Testing becomes part of how you work.

AI accelerates every step. It generates the initial characterization tests. It helps you understand the code before you modify it. It suggests how to break dependencies so you can isolate components. It’s not doing the thinking for you, but it’s doing the typing — and in legacy codebases, the typing is the bottleneck.

Tools and workflow

The tooling landscape for AI-assisted legacy testing has matured significantly. Here’s what a practical workflow looks like in 2026:

For test generation:

For code understanding:

  • Use AI assistants that can ingest entire files or directories. Context window size matters here — legacy functions tend to be long.
  • Pair AI summaries with actual debugging sessions. Trust but verify.

For the workflow itself:

  1. Identify the next module to touch (driven by your roadmap or bug backlog).
  2. Generate characterization tests with AI.
  3. Run them, fix failures, commit the passing suite.
  4. Make your change.
  5. Run the tests again.
  6. Ship with confidence.

The key insight is that you don’t need to understand every line of your legacy codebase to start testing it. You need to understand the parts you’re changing, and AI can help you get there faster than ever before. Start with your critical paths, generate characterization tests, and grow your coverage incrementally. The codebase didn’t become untested overnight, and it won’t become fully tested overnight either — but every test you add makes the next change safer.