Test-Driven Prompt Engineering

Claude will probably cover the edge cases you forgot. But being explicit in your prompts still matters — and tests are how you make that explicitness stick.

Claude is good at writing code. It’s also good at anticipating edge cases — often better than you are in the moment, because it’s seen thousands of implementations of similar things and knows where they tend to break.

But “probably covers the edge cases” is not a foundation you want to build on. The edge cases it covered might not be the ones you care about. The behaviour it assumed might not match your domain. And when something breaks in production, “I think the AI handled that” is not a useful mental model to debug from.

Tests are how you make your intentions explicit. And prompting for tests before implementation is one of the highest-leverage habits you can build when working with AI.

The default approach and why it falls short

Most people prompt like this: describe the feature, get the code, paste it in, fix what breaks. The AI generates something functional. You move on.

The problem isn’t that the code is wrong — it’s often right. The problem is that you don’t know why it’s right, which means you don’t know when it’ll be wrong. The AI made assumptions. Some of them match your requirements. Some might not. Without explicit tests, you have no way to tell the difference until a user finds out for you.

Start with the tests

Before asking for any implementation, ask for tests:

I want to build a function that takes an array of transactions
(each with an amount, date, and category) and returns:
- total spend
- average transaction amount
- top 3 categories by total spend

Write tests for this function before we write any code. Be explicit
about edge cases: empty arrays, single transactions, categories
with equal totals, negative amounts.

Two things happen here. First, the AI writes tests that tell you what it understood about the problem. If the tests don’t match your mental model — if they’re testing the wrong behaviour, or the assertions look off — you find out now, in plain text, before a single line of implementation exists. Fixing a test is a one-line change. Fixing a wrong implementation is a debugging session.

Second, you are now being explicit. You listed the edge cases. You defined the outputs. Claude will likely have added a few more you didn’t think of — empty input, ties in the category totals, floating point edge cases — and that’s great. But you seeded the conversation with your specific requirements, which means the tests that come out reflect your domain, not a generic interpretation of it.

Review before you proceed

This step gets skipped most often and it’s the most important one.

Read the tests. Actually read them. Ask yourself:

  • Do these assertions match what I actually want?
  • Are there cases missing that I know about?
  • Does anything look like the AI made an assumption I wouldn’t make?

If something’s off, correct it now. Add a test case. Change an assertion. Push back on an assumption. The cost of a correction here is almost zero. The cost of the same correction after the implementation is written is much higher — not because the code is hard to change, but because now you’re second-guessing everything downstream of it.

Once you’re happy with the tests, they become the spec. Everything from here is just making them pass.

Now ask for the implementation

Good. Now implement the function. The tests we wrote are the
source of truth — the implementation should satisfy all of them.

The model now has a concrete target. It’s not guessing what “correct” looks like — you’ve defined it explicitly. The output will be tighter, more focused, and easier to verify because the bar is clear.

Run the tests. If something fails, feed the failure back:

This test is failing: [paste output]. What's wrong with the implementation?

You’re not debugging blind. You have a failing test that describes exactly what the expectation was and what the actual result was. That’s the most useful debugging context you can have.

The edge case question

Here’s where Claude earns something extra. After the implementation passes your tests, ask:

What edge cases does this implementation not handle that you think it should?
Write additional tests for them.

This is where the model’s breadth of exposure pays off. It’s seen what breaks in functions like this. It’ll surface things — integer overflow, timezone handling in date comparisons, unexpected null inputs — that you didn’t think to specify. Review those too. Accept the ones that apply to your domain. Skip the ones that don’t. Add them to the test suite.

You end up with a test suite that’s yours (the cases you cared about) plus the model’s experience (the cases it’s seen matter). That combination is harder to achieve any other way.

Why being explicit still matters

It’s tempting to just trust the AI to cover everything. Sometimes that works. But the model doesn’t know your business rules, your tolerance for certain failure modes, or the specific guarantees your callers expect. It will make reasonable assumptions. Reasonable isn’t always right.

Being explicit in your prompts — defining inputs, outputs, and edge cases upfront — is how you transfer that context. The tests are the medium. They’re precise in a way that natural language descriptions aren’t. “Handle empty arrays” is vague. expect(summarise([])).toEqual({ total: 0, average: 0, topCategories: [] }) is not.

The AI will write good code either way. But explicit tests mean it writes your code, not a generic approximation of it.

The pattern in one paragraph

Describe what you’re building. Ask for tests before implementation, and list the edge cases you know about. Read the tests and correct anything that doesn’t match your requirements. Then ask for the implementation anchored to those tests. Run them. Feed failures back. Finally, ask the model what edge cases it would add — and decide which ones apply.

That’s it. It’s not complicated. It just requires doing the test step first, which feels slower and produces dramatically better results.