All posts

How to Generate Test Data with AI Without Breaking Your Database

By the Seedfast team ·

The promise is simple: ask your AI agent for test data, get test data. The reality, until recently, has been messier.

You already have an AI assistant. Claude Code, Cursor, Windsurf — one of them sits in your editor and writes most of the code you ship. So when it's time to generate test data with AI, the workflow seems obvious: ask the agent, get data, keep coding. And sometimes it works. But if you've tried to do this against a real schema — one with foreign keys, unique constraints, enum columns, and five tables that all reference each other — you know the reality is less charming than the pitch.

This is a playbook for actually generating test data with AI, written for developers who already live inside an AI coding assistant and want their database populated without context-switching, without writing throwaway seed scripts, and without cleaning up half-inserted data the next morning.

Why Your AI Agent Struggles with Seed Scripts

Ask Claude or Cursor to "seed the orders table with related line_items and payments", and here's what usually happens:

  • The agent reads your schema files, then re-reads them, burning context on column types and FK relationships it'll need to recall again on the next prompt anyway.
  • It writes a Python or SQL seed script. The script looks plausible, but it misses a NOT NULL column, or inserts an orders row pointing at a user_id that doesn't exist yet.
  • It runs the script. The first batch of inserts succeeds; somewhere in the middle, a constraint fails. Now your database is in a half-seeded state.
  • The agent rewrites the script and re-runs it. But the rows from the first attempt are still there, so the second run hits unique-key collisions, leaves orphaned FK references, or quietly duplicates data — old rows tangled with new ones.
  • After several rounds of patching and re-running, the data that did land is semantically incoherent: random strings for product names, mismatched relationships, totals that don't add up. Not the realistic scenario you needed for the feature you were actually building.

This isn't hypothetical. When Neon experimented with letting Claude and GPT generate synthetic data directly, the models consistently struggled once the foreign-key graph got deep — the point where constraint complexity exceeds what a model can track in its working context. Kent C. Dodds has a whole tutorial dedicated to the realistic version of this problem, because the naive approach doesn't survive contact with a production-shaped schema.

The problem isn't that LLMs are bad at generating fake data. They're great at generating fake data. The problem is that schemas have shape, and generating coherent data that respects that shape is a constraint-solving task, not a text-completion task. That's the gap Seedfast was built to fill.

What Actually Works: Give the Agent a Tool

The trick isn't to get your agent to stop writing seed scripts. The trick is to give it a real tool it can delegate to — one that understands the constraint-solving part, so the agent can stay in the part it's good at: describing intent and reasoning about your feature.

That's what MCP (Model Context Protocol) enables, and it's what Seedfast MCP exposes. Instead of the agent writing a Python script, it calls a single tool — seedfast_run — with a natural-language scope: "seed the orders table with related line_items and payments". Seedfast handles schema introspection, FK resolution, and constraint-aware generation. Your agent reads the result and returns to whatever it was actually helping you build.

The framing shift is small but load-bearing: the agent doesn't generate test data. The agent orders test data from Seedfast, which is the tool that specializes in generating it. The whole workflow stays conversational, but the hard part is offloaded.

"I'm building the order history feature. Seed the orders table with a few related line_items and payments, then show me a sample row"

One tool call. Clean data. Context preserved. No half-written scripts to clean up by hand.

Want to try it? Install Seedfast — free tier, no credit card.

Generate Test Data with AI: Real Workflows

Once seeding becomes a single conversational step, a lot of workflows you used to skip become cheap enough to do routinely.

Developer onboarding. New team members spend hours setting up local databases. With Seedfast MCP configured in your project's .mcp.json, onboarding becomes a conversation:

New dev: "Set up my local database with test data for the main features" AI: Creates plan, seeds 15 tables, reports completion in 45 seconds

Combine with Docker Compose and the whole local setup is one command plus one prompt.

Pre-demo environment setup. Before a client demo, prepare the environment in a single sentence:

"Seed the staging database with tables: users, orders, products, payments"

Always use the plan-then-execute pattern here — review what will be seeded before touching a shared environment. See the MCP setup guide for the full pattern.

Migration testing. Before running migrations on production, test them against realistic data. Seed a copy of the schema, run the migration, and check whether the ALTER TABLE takes three seconds or three minutes against a million rows. The migration testing guide covers the full flow.

Reproducing production issues. When debugging production bugs, recreate similar data shapes locally:

"Seed the accounts table with related subscriptions and billing_history tables"

Ephemeral CI databases. Seedfast works well in ephemeral CI databases — spin one up per PR, seed it, run tests, destroy it. Each run gets fresh, isolated data with no shared state between jobs.

Multi-service development. When working across microservices that share a schema:

"I'm working on the orders service. Seed only the tables owned by the orders schema, but include the users table from the shared schema for foreign key references"

Get Started

If you want to generate test data with AI the way it was supposed to work — without broken scripts, half-seeded tables, or context bloat — the setup is short:

  1. Install the CLI: npm install -g seedfast@latest
  2. Get an API key at seedfa.st (free tier available)
  3. Add Seedfast to your AI assistant's MCP config — see the MCP setup guide for Claude Desktop, Cursor, VS Code, and Claude Code CLI
  4. Ask your agent to seed something, and keep coding

For the patterns that make seeding reliable at scale — scope writing, plan-then-execute, performance, error handling — see the MCP Setup Guide. For credential hygiene and what crosses the wire to AI providers, see Data Handling & Privacy.

Related Reading