All posts

Database Seed File Maintenance: Stop Patching seed.sql

By the Seedfast team · · Updated

Why your team quietly stopped running seed.sql months ago. A practical guide for PostgreSQL, MySQL, and ORM-based projects — not torrent .seed files or BitTorrent seedboxes.

Your seed.sql was committed by someone who left two years ago. It worked for three weeks. Then a migration landed, and it has been quietly broken ever since. Maybe it's called fixtures.sql, dev_data.sql, or testdata/init.sql — same file, same fate.

This article is about seed file maintenance — why static seed files drift from the schema, what that drift costs in real engineering hours, how every mainstream ORM (Rails, Prisma, Django, Laravel) hits the same wall, and what to do when you stop pretending the file works. The short answer to how to maintain seed files in 2026: don't. Seedfast reads your live schema on every run and regenerates FK-valid data from a plain-English scope, so there is no static artifact to drift.

If your team has a seed file that actually works on the current schema — without modifications, without commenting out lines, without someone saying "oh yeah, you have to run the migration first and then manually fix line 847" — you are in a vanishingly small minority. Congratulations.

For everyone else: this article is about the file you all know is broken and nobody wants to fix.

TL;DR — your seed.sql drifts because it's static. Stop hand-editing a snapshot of last quarter's schema; let Seedfast read your live schema on each run and regenerate FK-valid data from a plain-English scope. Reference rows you assert on by literal value (admin accounts, country codes, feature flags) stay in a short hand-written seed.sql. Bulk development data gets regenerated.

Try Seedfast free → — connect your DB and run the first seed in under five minutes.

If you're here because npx prisma db seed (or its Rails / Django / Laravel equivalent) just broke after a migration, jump to Try it on your seed file for the install + run snippet.

  • Seed file maintenance is structural, not disciplinary. A static file cannot keep up with a schema that changes every sprint. No amount of "be more careful" fixes that.
  • Every ORM has the same problem in a different syntax. Rails db/seeds.rb, Prisma prisma/seed.ts, Django fixtures, and Laravel seeders all break for the same reason — they hard-code the shape of your data.
  • The real cost shows up in onboarding, CI, and developer trust. A broken seed file turns a 2-minute setup into a 2-hour debugging session and teaches the team to distrust all shared data tooling.
  • Replace the bulk-data portion of the file with Seedfast. Instead of maintaining 500 lines of INSERT statements, describe the data you want in plain English and Seedfast reads your live schema on every run to generate matching data. Keep a small hand-written seed.sql for production reference rows and named-by-ID test fixtures.

The lifecycle is so predictable it could be a template:

Week 1: Creation. A motivated developer (usually someone onboarding) writes a seed file. It inserts users, orders, products — whatever the app needs to look populated. It works. The PR gets merged. The team is grateful. Local development is smooth. People actually use the app locally instead of staring at empty states.

-- seed.sql (v1, the golden age)
INSERT INTO users (id, name, email, created_at)
VALUES
  (1, 'Alice Johnson', 'alice@example.com', '2025-01-15'),
  (2, 'Bob Smith', 'bob@example.com', '2025-02-20'),
  (3, 'Carol Davis', 'carol@example.com', '2025-03-10');

INSERT INTO orders (id, user_id, total, status, created_at)
VALUES
  (1, 1, 99.99, 'completed', '2025-01-20'),
  (2, 1, 149.50, 'completed', '2025-02-15'),
  (3, 2, 75.00, 'pending', '2025-03-01');

INSERT INTO products (id, name, price, category)
VALUES
  (1, 'Widget Pro', 49.99, 'electronics'),
  (2, 'Gadget Plus', 29.99, 'electronics'),
  (3, 'Thingamajig', 19.99, 'accessories')

Week 3: First crack. A migration adds a NOT NULL column to users. The seed file doesn't include it. New developers run the seed, get an error, and ask in Slack. Someone replies "oh just add role DEFAULT 'user' to the users table insert." Nobody updates the file.

ERROR:  null value in column "role" of relation "users" violates not-null constraint
DETAIL:  Failing row contains (1, Alice Johnson, alice@example.com, 2025-01-15, null)

Month 2: The patch. Someone gets frustrated enough to fix it. They add the missing column. They also notice that orders now has a shipping_address_id foreign key to a new addresses table. They add an addresses insert block. The PR is 200 lines of SQL changes for a file that was supposed to be "set and forget." It passes review because nobody wants to think about it too hard.

Month 4: The second break. The products table was renamed to catalog_items as part of a domain modeling cleanup. The seed file still references products. Someone opens an issue. The issue sits in the backlog for six weeks because it's not a production bug, it's "just" developer experience.

Month 6: The workaround. The seed file has broken twice in two months. A senior developer wraps it in a script:

#!/bin/bash
# run-seed.sh — "best effort" seeding
set -e  # just kidding
psql $DATABASE_URL < seed.sql 2>/dev/null || echo "Seed had errors (this is normal)"

The || echo is doing a lot of heavy lifting there. "This is normal" is doing even more.

Month 9: Abandonment. The README still says "Run ./run-seed.sh to populate your local database". New developers try it. It fails silently on half the tables. They ask in Slack. Someone says "I just use the staging database" or "I manually insert what I need". The seed file is effectively dead. It exists in the repo. Nobody deletes it — that would require acknowledging the problem. Nobody fixes it — that would require ongoing commitment. It just sits there, a monument to good intentions.

Month 12: The zombie. A new developer finds the seed file, spends two hours fixing it for the current schema, opens a PR, and the cycle begins again.

The fundamental tension is simple: your schema changes constantly, but your seed file is static.

Consider what happens during a typical sprint. A developer adds a phone_number column to users. Another developer creates a user_preferences table with a foreign key to users. A third developer changes orders.status from a text field to an enum type. A fourth developer adds a check constraint that orders.total must be positive.

Each of these changes is small. Each migration is tested. Each PR is reviewed. And none of them update the seed file, because why would they? The seed file isn't part of the feature. It's not in the test suite. It's not in the CI pipeline (or if it is, it was removed six months ago because it kept breaking the build).

The result is that the seed file drifts from the schema roughly in proportion to the rate at which your team ships schema-changing features. The more productive your team is, the faster the seed file becomes useless. Seedfast inverts this: because it reads the live schema on each run, schema velocity stops being the enemy — the faster the team ships migrations, the less there is to maintain.

This is the human problem underneath the technical one. Who is responsible for seed.sql?

Not the developer who wrote it — they moved to another team. Not the developer who added the new column — they're shipping features, not maintaining test infrastructure. Not the tech lead — they have 40 other things to worry about. Not DevOps — it's application-level data, not infrastructure.

Seed files are communal property, and communal property is everyone's responsibility and therefore nobody's. The same thing that happens to shared kitchen spaces in offices happens to seed files in repos: slow, inevitable decay until someone snaps and does a deep clean. Except with seed files, nobody snaps. They just route around the damage.

There's a deeper structural issue: migrations transform schema forward in time, but seed files are frozen in the past. Your migration system knows how to get from schema version 47 to version 48. It doesn't know how to update the test data that was valid at version 47 to also be valid at version 48.

Some teams try to solve this by running seed files through the migration system — seeding at version 1, then migrating up. This works exactly until your first breaking migration, which is usually the third or fourth one. Then you need to version your seed files alongside your migrations, which means maintaining parallel histories of schema changes and data changes. Nobody does this for long.

Switching to a "proper" ORM seeder does not make the maintenance problem go away — it just moves it to a different file. Every mainstream stack hits the same wall:

  • Rails (db/seeds.rb). Use find_or_create_by! to stay safe to re-run, split by model, keep seeds environment-aware. Organization improves; the columns are still hard-coded, so a NOT NULL migration still breaks bin/rails db:seed.
  • Prisma (prisma/seed.ts). Recent Prisma versions decoupled seeding from prisma migrate dev and prisma migrate reset — you now run npx prisma db seed explicitly (Prisma seeding docs). Adding a required field still breaks prisma.user.create({ data: { ... } }) — at compile time instead of runtime, but the manual fix is the same.
  • Django (fixtures vs factories). loaddata fixtures break on most schema changes; factory_boy generates rows from the live model, which is why guides have been recommending factories over fixtures for a decade (Caktus Group, 2013). Factories help, but every new field is a patch to the factory code.
  • Laravel (DatabaseSeeder + model factories). User::factory()->has(Post::factory()->count(3))->create() is composable — until posts gains a NOT NULL column and every seeder throws a QueryException. The Laravel docs recommend keeping seeders deterministic and in CI; that recommendation is what creates the PR backlog when schemas move.

The pattern is the same across all four: the syntax differs, but you are still writing a file that describes the shape of your data by hand. Every schema change forces a corresponding edit. For a side-by-side comparison of these tools and seven other seeding approaches, see Database Seeder: 7 Tools Compared.

The seed file seems like a small thing. It's a convenience file for local development. How much damage can a broken convenience file really do?

More than you'd think. Seed file maintenance is the kind of work that doesn't show up in any sprint plan but eats engineering time everywhere.

A new developer joins your team. The README says to clone the repo, run migrations, and run the seed file. The seed file fails. The new developer doesn't know if the failure is expected, if their local setup is wrong, or if they did something out of order. They spend an hour debugging before asking for help. A senior developer spends 30 minutes walking them through the workaround.

Multiply this by every new developer, every quarter. Now multiply by the morale cost: the new person's first experience with the codebase is discovering that the documented setup doesn't work. That's not a great first impression of your engineering culture.

Without working seed data, local development means staring at empty states. The dashboard shows "No data found." The list views are empty. The search returns nothing. The graph components render a flat line.

Developers start creating data manually through the UI, which takes ten minutes every time they reset their database. Or they stop resetting their database, which means their local state diverges from everyone else's. Or they just develop against staging, which has its own problems (shared state, slow connections, risk of interfering with QA).

The empty local database is a productivity drain that's hard to quantify because it's spread across every developer, every day, in small increments. Five minutes here to create a test user. Ten minutes there to set up an order with the right status. Twenty minutes to create the specific data configuration needed to test a new feature. It adds up to hours per developer per week.

If your CI pipeline includes a seeding step (it should), a broken seed file means broken builds. The options are:

Fix the seed file every time it breaks. This works, but it means someone is on permanent seed-file duty, patching SQL after every migration.

Remove the seeding step from CI. This is what most teams actually do. The CI pipeline now tests against an empty database, which misses entire categories of bugs that only surface at realistic data volumes.

Make the seeding step non-fatal. The || true approach. The seed runs, fails halfway, inserts data into some tables but not others, and the test suite runs against an inconsistent partial dataset. This is arguably worse than an empty database, because the failures are intermittent and hard to diagnose.

The most corrosive effect of a broken seed file is cultural. When developers learn that the seed file is unreliable, they develop a reflexive distrust of all shared data tooling. Suggestions to invest in better seeding infrastructure are met with "we tried that, it didn't work." Proposals for data-dependent integration tests are rejected with "those will just break when the seed file drifts."

The broken seed file becomes a learned helplessness that prevents the team from investing in the thing they actually need.

The way out is structural, not motivational. If your team has been burned enough times to distrust shared seed tooling, the fix is to stop shipping a tool that requires trust. Seedfast reads your live schema on each run instead of asking a human to keep a file in sync — there is no artifact for the team to lose faith in, because there is no artifact.

Before you decide what to do next, run a quick audit. This script spins up a fresh database, applies your migrations, runs your seed file, and reports exactly how broken it is. For deeper guidance on writing a Postgres seed script that survives migrations, see Postgres Seed Script: Build One That Lasts. Copy this into scripts/audit-seed.sh:

#!/usr/bin/env bash
# audit-seed.sh — find the drift between seed.sql and your current schema.
# Usage: ./audit-seed.sh path/to/seed.sql
set -uo pipefail

SEED_FILE="${1:-seed.sql}"
DB="seed_audit_$(date +%s)"

# 1. Fresh database, latest migrations applied.
createdb "$DB"
trap 'dropdb --if-exists "$DB"' EXIT

# Replace with your project's migration runner:
#   Prisma:  DATABASE_URL=postgres:///$DB npx prisma migrate deploy
#   Rails:   DATABASE_URL=postgres:///$DB bin/rails db:migrate
#   Django:  DATABASE_URL=postgres:///$DB python manage.py migrate
#   Raw SQL: psql "$DB" -v ON_ERROR_STOP=1 -f path/to/schema.sql

# 2. Run the seed, capture every error (default ON_ERROR_STOP=off lets us collect them all).
ERRORS=$(psql "$DB" -f "$SEED_FILE" 2>&1 \
  | grep -E "ERROR|psql:.*: ERROR" || true)

# 3. Check which tables ended up empty despite being referenced in the seed.
REFERENCED_TABLES=$(grep -iE "INSERT INTO ([a-z_][a-z0-9_]*)" "$SEED_FILE" \
  | sed -E 's/.*INSERT INTO ([a-z_][a-z0-9_]*).*/\1/' | sort -u)

echo "=== seed.sql health report ==="
echo
if [[ -z "$ERRORS" ]]; then
  echo "No SQL errors."
else
  echo "SQL errors found:"
  echo "$ERRORS"
fi
echo
echo "Tables referenced by the seed file and their row counts after running:"
for t in $REFERENCED_TABLES; do
  COUNT=$(psql "$DB" -Atc "SELECT count(*) FROM \"$t\"" 2>/dev/null || echo "MISSING")
  printf "  %-30s %s\n" "$t" "$COUNT"
done

Three things to look for in the output:

  1. Error lines. Each one is a migration your seed file has not caught up with.
  2. Tables with 0 rows. The INSERT succeeded syntactically but every row was rejected by a constraint you forgot about.
  3. MISSING tables. The seed file references a table that no longer exists (it was renamed, merged, or dropped).

Anything more than zero in any of those categories is drift. If this is your first time running the check, expect the report to be longer than you want it to be.

The strongest argument for a static seed file is "it's just a file, how expensive can it be?" Turn it into a number.

Maintenance cost per month, as a formula:

cost = (minutes_per_break * breaks_per_migration * migrations_per_month)
     + (new_dev_onboarding_minutes * new_devs_per_month)
     + (minutes_per_dev_per_day_on_empty_db * devs * broken_seed_days_per_month)

These numbers are illustrative — flip any input toward your actual team and the total moves. For a 10-person team that ships ~12 migrations a month, hires one engineer a quarter, and has a seed file that breaks on roughly one in three migrations:

InputValue
Minutes to diagnose + fix one break45
Break rate per migration0.33
Migrations per month12
Onboarding time lost to broken seed90 min
New devs per month0.33
Minutes lost per dev per day to empty-db workarounds (on days the seed is broken)10
Broken-seed days per month (≈ one in three of 20 working days)~8
Developers10

That comes out to roughly ~178 minutes fixing breaks + ~30 minutes onboarding friction + ~800 minutes of empty-database workarounds per month — about 17 hours, more than two full engineering days, spent on a file that was supposed to save time. The dev-day-loss term is the lever: if your team only resets the local DB twice a week, halve it. The break-rate term is the other lever: a team that ships fewer schema-changing migrations pays less. The point is not the exact total — it is that the total is never zero, and it scales with the rate at which your team ships schema work.

Once you have a number, "just keep the seed file up to date" stops looking like a discipline problem and starts looking like a tax. Seedfast replaces that variable line item with a predictable subscription that does not scale with how often your schema changes.

Teams develop creative ways to live with broken seed files. All of them are worse than fixing the root cause: seed file maintenance is structural, not behavioral, and these patterns route around the structure instead of changing it.

## Local Setup
1. Run `make migrate`
2. (Optional) Run `make seed` to populate test data

When the seed step is "optional", it means "broken". Nobody makes a working tool optional. You don't see (Optional) Run the compiler in setup docs. The word "optional" is a signal that the team knows it doesn't work reliably and has decided to make that someone else's problem.

# seed.py
for table in ["users", "orders", "products", "categories"]:
    try:
        run_sql(f"seed_{table}.sql")
    except Exception as e:
        print(f"Warning: {table} seed failed ({e}), continuing...")

Every error is swallowed. Half the tables succeed, half don't. The developer doesn't know which half. The local database has users but no orders, products but no categories. The app technically runs but half the features are untestable. Nobody investigates the warnings because there are always warnings.

seeds/
  v1_initial.sql
  v2_add_roles.sql
  v3_add_addresses.sql
  v4_rename_products.sql
  v5_add_preferences.sql

This is the most disciplined approach, and it's also the most labor-intensive. Every migration that affects seeded tables requires a corresponding seed update. In practice, this means the developer writing the migration now has two files to update and test — the migration and the seed delta. The Neon team's guide to maintaining seed data lays out this approach carefully — version the file, automate execution, keep it safe to re-run — and it is the best you can do with a static file. It also describes roughly one full-time responsibility you did not have before. Compliance drops rapidly after the first month. If your database is hosted on Neon, their branching feature offers an alternative path — instant schema-and-data snapshots per branch — but that approach is platform-specific and does not help the vast majority of teams running Postgres elsewhere. The platform-agnostic alternative is to stop versioning seed deltas entirely: Seedfast reads the live schema on each run and regenerates FK-valid rows from a scope string, so there is no v6_*.sql file to write next sprint and no parallel history to maintain.

Eventually, developers start maintaining their own personal seed files. Each one tailored to the features they work on. None of them complete. All of them incompatible with each other. The team now has N different versions of local state, where N is the number of developers.

"Works on my machine" takes on a new meaning when every machine has different data.

The team-by-team divergence is the moment most teams realize the file isn't recoverable. Seedfast replaces the per-developer sprawl with one command that reads the live schema — every developer gets the same shape of data because the schema, not a hand-edited file, is the source of truth.

All of these failures stem from one root cause: static data cannot keep up with a dynamic schema.

A seed file is a snapshot. It captures the shape of your data at a single point in time. The moment your schema evolves — which it does constantly, because that's what healthy software projects do — the snapshot is stale.

This isn't a discipline problem. It's not something that can be solved by "just keeping the seed file up to date" any more than you can solve clock drift by "just checking your watch more often". The problem is structural — you're using a static artifact to describe a moving target.

The fix isn't a better seed file. The fix is regenerating the bulk data on each run from the schema itself.

What if your seeder read your current schema every time it ran?

Not a file that was written six months ago. Not a snapshot that assumed the products table still exists. Not a script that hardcodes column names. Seedfast reads the actual, current, live schema on each invocation — every column, constraint, foreign key, and enum that exists right now, at this moment. Describe what you want in plain English; Seedfast generates FK-valid rows that match the schema as it stands. This is the core idea behind schema-aware synthetic test data — and the reason there's no static artifact left to drift.

Install once, then run it from any project directory:

npm install -g seedfast
# or: brew install argon-it/tap/seedfast
# or use it without installing: npx seedfast seed --scope "..."

seedfast connect       # paste your DATABASE_URL when prompted
seedfast seed --scope "seed realistic data for all tables"

That's it. No SQL file to maintain for bulk development data. No new columns to add to the seed after migrations. No foreign keys to wire up manually. Seedfast connects to your database, reads the schema as it exists today, and generates data that fits. Reference rows your tests assert on by literal value — admin accounts, country codes, feature flags — stay in a small hand-written seed.sql; only the bulk development data gets regenerated.

When a migration adds a NOT NULL column next week, Seedfast sees it the next time it runs. When a table is renamed, Seedfast uses the new name. When a foreign key is added, Seedfast generates parent rows before child rows where the schema is acyclic. When an enum type gains a new value, Seedfast includes it in the distribution.

There is no schema drift because there is no static artifact to drift.

Instead of writing SQL inserts, you describe what you need in plain English:

# Instead of maintaining 500 lines of INSERT statements
seedfast seed --scope "seed 1,000 users with orders, payments, and support tickets"

Seedfast reads your schema, builds a dependency graph, proposes a plan, and seeds. The scope description works today and will work next month, because it references concepts ("users with orders") rather than column names (user_id INTEGER NOT NULL REFERENCES users(id)).

When your schema changes, the same scope produces different data — data that matches the new schema. The command is the same. The intent is the same. The output adapts automatically.

Before (the seed.sql lifecycle):

  1. Developer writes seed.sql (2 hours)
  2. Works for 3 weeks
  3. Migration breaks it (5 minutes to discover, 30 minutes to fix)
  4. Works for 2 weeks
  5. Another migration breaks it (someone files an issue)
  6. Issue sits in backlog for 6 weeks
  7. New developer fixes it (1 hour)
  8. Works for 1 week
  9. Two migrations land in the same sprint, seed file breaks in multiple places
  10. Someone wraps it in || true
  11. Team stops using it
  12. Repeat from step 7 every few months

Cumulative time: dozens of hours per year. Effective uptime: maybe 40%.

After (seedfast):

# In your README
seedfast seed --scope "seed realistic data for all tables"

# In CI
seedfast seed --scope "seed 1,000 users with orders" --output plain

There is no step 2 through 12. The command works after every migration because it reads the current schema. There is no SQL file to patch and no error-swallowing wrapper script to maintain — the seed-after-each-migration treadmill goes away.

Toy schemas with users → orders → products are easy. The interesting question is what happens when Seedfast meets the kind of schema that has been running in production for three years. A short, honest support matrix:

  • Foreign keys (single-column and composite): Seedfast builds a dependency graph and inserts parent rows before child rows. Self-references and cycles are handled by inserting placeholder rows first, then patching FKs in a second pass.
  • Enum types and check constraints: Recognized and respected — generated values are drawn from the enum's members and pass the CHECK predicate.
  • Generated columns (GENERATED ALWAYS AS ...): Skipped on insert; the database computes them. Stored generated columns work the same way.
  • Partitioned tables (range/list/hash): Inserts go through the parent table; Postgres routes rows to the correct partition. No special configuration needed.
  • citext, jsonb, uuid, tsvector, numeric(10,2), geometric types: Supported via Postgres's type system; Seedfast generates type-correct values.
  • Triggers with side effects (audit rows, denormalized counts): They fire as written. If a trigger does something you don't want during seeding, disable it for the seed run the same way you would for a pg_restoreALTER TABLE ... DISABLE TRIGGER USER, run the seed, re-enable.
  • Views, materialized views, and functions: Not seeded directly (they don't hold data). Materialized views you may want to REFRESH after seeding.
  • Multi-schema databases: Tables in non-public schemas are read and seeded as long as your DATABASE_URL user has access.

If your schema has something exotic that's not listed — extension-defined types, custom domains, row-level security policies that conflict with the seeding role — seedfast doctor will report what it can and can't handle before you commit to a run. The honest answer to "does this work on my real schema" is: in most cases yes, in some cases with one extra setup step, and seedfast plan will tell you which before any rows are written.

The seed file in CI is where the pain compounds, because CI failures block everyone. If you want the full walk-through for GitHub Actions and GitLab, the CI/CD database seeding guide covers exit codes and artifact patterns. The short version:

# Before: fragile, breaks every few sprints
- name: Seed test database
  run: psql $DATABASE_URL < seed.sql # fingers crossed

# After: reads current schema every time
- name: Seed test database
  run: seedfast seed --scope "seed 5,000 users with orders and payments" --output plain
  env:
    SEEDFAST_API_KEY: ${{ secrets.SEEDFAST_API_KEY }}
    SEEDFAST_DSN: ${{ secrets.SEEDFAST_DSN }}

The --scope flag makes it non-interactive. To make the run idempotent, point each CI job at a fresh ephemeral database (e.g., a Postgres service container) — Seedfast appends rows rather than replacing them, so re-running against an already-populated database stacks more data on top. If you need a clean slate without recreating the DB, truncate the affected tables first.

The before-and-after for new developers is dramatic:

Before: Clone repo. Run migrations. Run seed. Seed fails. Ask Slack. Wait for response. Get workaround. Apply workaround. Half the data loads. Manually create the rest. Time: 1-3 hours.

After: Clone repo. Run migrations. Run seedfast seed, review the proposed plan, hit Y. Time: typically 2 minutes on a small schema.

No debugging. No Slack. No workarounds. The database is populated with realistic data that matches the current schema. The new developer sees a populated dashboard on their first day, not an empty state with a TODO comment.

Your seed.sql isn't broken because your team is lazy. It's broken because the premise is flawed. Asking a static file to keep up with a dynamic schema is asking for perpetual maintenance — and perpetual maintenance of non-production tooling is exactly the kind of work that gets deprioritized, postponed, and eventually abandoned.

The teams that have working seed files are the ones spending real engineering time on seed file maintenance. That time could be spent on features, on tests, on the product. Maintaining seed files is not a valuable use of engineering time. It's a tax you pay because the tool requires it. The answer to how to maintain seed files in any production codebase, eventually, is to stop trying. At enterprise scale the math gets worse — see enterprise database test data for what compliance and volume add to the bill, and staging without prod data for why the regulated audience cannot just copy production into dev.

Stop paying the tax. Move the bulk data out of seed.sql — keep only the reference rows your tests assert on by literal value — and let Seedfast regenerate the rest from your live schema. Try Seedfast free →

If your seed.sql is the file nobody wants to own, point Seedfast at the same database and let it generate the bulk data from your live schema instead. Reference rows your tests assert on by literal value (admin accounts, country codes, feature flags) stay in a short seed.sql. Everything else — the 500 users, 2,000 orders, the relational bulk that breaks every migration — gets regenerated on each run from whatever shape the schema is in today.

# Try without installing — runs the latest CLI in this shell only:
npx seedfast connect
npx seedfast seed --scope "1,000 users with orders, payments, and support tickets"

# Or install once for repeated use:
# npm install -g seedfast
# brew install argon-it/tap/seedfast

No file to update after the next migration. No factory code to keep in sync. The schema is the source of truth, and Seedfast reads it fresh on each invocation.

On privacy. Seedfast's synthetic data generation does not require samples of your existing rows — the pipeline reads schema metadata (table and column definitions, types, FK relationships, constraints) and the scope text you provide. Your DATABASE_URL stays on your machine; the CLI connects to your database directly, and the password is never transmitted to Seedfast's servers. Note that DEFAULT expressions, CHECK constraints, and enum values are part of schema metadata and are transmitted as written; if any of those contain sensitive literals, treat them accordingly. See how it works for the full data-handling breakdown.

Free trial covers connect + first seed — small schemas typically fit within trial limits. See pricing for current terms, or read the full walkthrough in the getting started guide.

Because a seed file is a snapshot of the schema at the moment it was written, and a migration is, by definition, a change to that schema. Any migration that adds a NOT NULL column, a foreign key, a check constraint, or renames a table will break any INSERT statement that does not already account for it. The fix is not "remember to update the seed" — the fix is to generate seed data from the current schema on every run, so there is nothing to update.

A migration changes the shape of your database: CREATE TABLE, ALTER TABLE ADD COLUMN, DROP INDEX. A seed file inserts rows into the shape that migrations created. Migrations are versioned, ordered, and applied once per environment; seed data is rerun on demand. The two get conflated because both can be .sql files, but they answer different questions: migrations answer "what does my schema look like?", seeds answer "what data lives in it?". Mixing rows into migrations is a common anti-pattern — it makes migrations non-replayable across environments.

Idempotency means the second run produces the same final state as the first. In raw SQL, use INSERT ... ON CONFLICT DO NOTHING against a unique key. In Rails, find_or_create_by!. In Prisma, upsert with where/create/update. In Django, the update_or_create queryset method. None of these solve drift — they only protect against duplicate rows when the same script runs twice. A seed that is safe to re-run still breaks the day a migration adds a NOT NULL column the script doesn't know about.

Prisma 7 decoupled seeding from migration commands: prisma migrate dev and prisma migrate reset no longer auto-run the seed, and the --skip-seed flag was retired along with it because there is no automatic seed run to skip. To seed, you call npx prisma db seed explicitly (Prisma seeding docs). The change is cleaner — migrations and seeding are now separate concerns — but it does not fix the underlying drift problem. Your prisma/seed.ts still hard-codes column shapes, and a required field added to the model still breaks db seed until you patch the file.

You can, and some teams do. This is the most disciplined manual approach: for every migration that changes a seeded table, add a corresponding seed delta that reshapes the existing fixtures. In practice, compliance collapses within a few weeks because every PR now costs a second file to update, and reviewers stop catching it. It works for very small teams with slow-moving schemas; it does not scale.

Fixtures and seed files are both static artifacts — JSON/YAML/SQL that describe specific rows and are loaded verbatim. They share the drift problem. Factories (Factory Bot, factory_boy, Laravel model factories) are code that generates rows at test time using the current model definitions, so renaming a field is a compile-time error instead of a silent runtime failure. Factories are a clear upgrade over fixtures, but you still hand-write one per model, so schema changes still force edits — just in Ruby/Python/PHP instead of SQL.

No. Prisma 7 actually removed automatic seeding from prisma migrate dev and prisma migrate reset — you now call npx prisma db seed explicitly — and Rails has always kept db:seed as a manual step. Both frameworks give you a nice place to put seed code; neither frees you from writing out the shape of your data by hand. Built-in seeders move where seed file maintenance happens, not whether it happens. See the database seeding overview for how ORM seeders fit into the broader picture.

Safe-to-re-run (find_or_create_by!, ON CONFLICT DO NOTHING, upsert) is necessary but not sufficient. It prevents the second run from failing; it does nothing about the first run failing because a new required column was added. You still need to edit the seed to match every schema change. Re-runnability is table stakes, not a solution.

Two cases keep seed file maintenance worthwhile: First, reference data that ships to production — country codes, role names, default feature flags. This is part of your application's contract and belongs in migrations or a narrowly scoped seed step. Second, specific named rows your tests assert on by ID — a handful of rows, clearly separated from bulk development data. Everything else (the 500 users, 2,000 orders, 10,000 line items you need for realistic local dev and CI) should be generated on demand from the current schema.

Leave the existing file in place. Add seedfast seed --scope "..." to the end of your setup script so new developers get schema-fresh data. Once everyone is on the new command, drop the .sql file. No flag day, no coordination — the two approaches coexist fine because they both write to the same tables. The getting started guide walks through the first run.

It can, and for most teams it should. In package.json you can either swap the "prisma": { "seed": "..." } command to call seedfast seed --scope "...", or leave prisma db seed for a small handful of hand-written reference rows and point developers at seedfast seed for bulk data. Seedfast connects to the same DATABASE_URL Prisma uses, so there is no separate connection config. The Seedfast CLI doesn't care which ORM you use — it reads the schema from the live database.

Synthetic data generation does not require samples of your existing rows. The CLI transmits database schema metadata — table and column names, types, nullability, length limits, foreign keys, and constraint definitions — along with the natural-language scope you provide. Generated synthetic data is produced from the schema shape, not from samples of your real rows.

Your DATABASE_URL stays on your machine: the CLI connects to your database directly, and the password is never transmitted to Seedfast's servers. One thing to watch: DEFAULT expressions, CHECK constraints, and enum values are part of schema metadata and are transmitted as written, so if any of those contain literals you consider sensitive, treat them accordingly. The same applies to your scope text — what you type is what we read. See how it works for the full data-handling breakdown.