Database Seeding: What It Is, Methods, and Best Practices
By Mikhail Shytsko, Founder at Seedfast · · Updated
Database seeding is the process of populating a database with an initial dataset before an application runs against it. If you've ever written an INSERT statement into a file called seed.sql or seed.ts — you've done it. This guide starts there and works forward through ORM seeders, generated data, and what to do when none of those keep up with your schema.
If you're already past the "what is it" stage and comparing concrete tools (Laravel vs Prisma vs Drizzle vs standalone seeders), jump to the database seeder tool comparison.
- Database seeding starts simple (a SQL file) and gets complicated fast as schemas grow — the problem isn't generating values, it's keeping seed data valid across migrations
- Every major ORM has a built-in seeder (Prisma, Laravel, EF Core), but the data and relationships are defined manually — which means they break on schema changes just like raw SQL
- Foreign key resolution is what makes seeding hard at scale. Once tables form cycles or deep chains, hand-ordered inserts stop working — you need automated dependency resolution that preserves referential integrity
- If your seed file is the bottleneck, the next step is generation — tools that read the schema directly and produce data that satisfies constraints, so the seed doesn't need to be maintained alongside the migrations
Database seeding is the process of populating a database with an initial set of data. The term comes from agriculture: you plant seeds before anything grows. In databases, seeding means inserting a starting dataset so the application has something to work with — three admin users for local development, a thousand products for a staging demo, or a connected dataset across 40 tables for integration tests.
There are two kinds of seed data, and they're often conflated:
Reference data lives in production. Roles, categories, feature flags, country codes, default settings. This data is part of the application's contract — without it, the app doesn't function. It changes rarely and should be version-controlled.
Development/test data exists only in non-production environments. Fake users, sample products, test orders. Its purpose is to let developers build features, run tests, and demo the application without waiting for real users to show up.
The confusion starts when teams use the same seed mechanism for both. A seed.sql that inserts admin roles AND a thousand fake users couples two things that change at different rates and have different deployment rules. Production reference data should be in migrations or a dedicated seed step. Test data should be generated, not hand-written.
A freshly migrated database has tables but no rows. Every dashboard renders empty states, every list view returns nothing, every test that depends on a user or a product fails before it starts. Seeding solves three concrete problems:
- Local development gets a populated UI. New devs can click through real flows on day one instead of staring at empty tables.
- Integration and end-to-end tests get realistic data. A test that asserts "user sees their last 10 orders" needs 10 orders sitting in the database before the test starts.
- Production gets the reference rows the app needs to function. Roles, country codes, default settings — without them the application doesn't start cleanly on day one.
Without seeding, every developer ends up writing the same SQL statements by hand, every CI run starts with a different database state, and onboarding turns into "ask Marcus for a dump."
Migrations and seeds are often confused because they both run against the database before the application does. They are not the same thing.
| Migration | Seed | |
|---|---|---|
| What it does | Changes structure (tables, columns, indexes) | Inserts data |
| Versioned? | Yes — strict order | No — should be idempotent |
| Runs in production? | Always, on every deploy | Reference data only; never test data |
| Repeatable? | Each migration runs once | Should produce the same end state on every run |
The rule of thumb: if you'd lose work without it, it's a migration. If you'd lose realism without it, it's a seed. Mixing the two — putting test data inside a migration, or putting schema changes inside a seed — couples concerns that change at different rates and breaks deploys in surprising ways.
Every database seeding journey starts here:
INSERT INTO roles (id, name) VALUES
(1, 'admin'), (2, 'editor'), (3, 'viewer');
INSERT INTO teams (id, name) VALUES
(1, 'Engineering'), (2, 'Design');
INSERT INTO users (id, email, team_id, role_id) VALUES
(1, 'alice@example.com', 1, 1),
(2, 'bob@example.com', 2, 2);
Run it with psql -f seed.sql and your database has data. Simple, deterministic, version-controlled. For a 5-table schema that changes quarterly, this is fine. For a hands-on walkthrough of writing a Postgres seed file from scratch, see Seed Database: PostgreSQL Step-by-Step Guide.
The problem starts on the day someone runs ALTER TABLE users ADD COLUMN department_id INTEGER NOT NULL REFERENCES departments(id). Your seed file doesn't insert departments. The INSERT into users now fails. Someone has to fix the file, test it, commit it. Multiply this across a schema with 30 tables and monthly migrations, and the maintenance cost becomes its own workstream.
Skip the maintenance loop. Seedfast generates seed data straight from your live PostgreSQL schema — when a migration adds department_id, the next run picks it up. Free tier, no seed file required.
Every major ORM has a seeding mechanism. The syntax varies, but the pattern is identical: you write code that creates records through the ORM's API.
Prisma puts seeds in a TypeScript file:
import { PrismaClient } from "@prisma/client";
const prisma = new PrismaClient();
async function main() {
const team = await prisma.team.create({
data: { name: "Engineering" },
});
await prisma.user.create({
data: {
email: "alice@example.com",
team: { connect: { id: team.id } },
},
});
}
main()
.then(async () => {
await prisma.$disconnect();
})
.catch(async (e) => {
console.error(e);
await prisma.$disconnect();
process.exit(1);
});
// Register the script under `prisma.seed` in package.json or prisma.config.ts,
// then run: npx prisma db seed
Laravel uses seeder classes:
// database/seeders/DatabaseSeeder.php
public function run(): void
{
$team = Team::create(['name' => 'Engineering']);
User::create([
'email' => 'alice@example.com',
'team_id' => $team->id,
]);
}
// Run: php artisan db:seed
EF Core (version 9 and later) uses UseSeeding and UseAsyncSeeding on the DbContext:
protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
{
optionsBuilder
.UseSqlServer(connectionString)
.UseSeeding((context, _) =>
{
if (!context.Set<Team>().Any())
{
context.Set<Team>().Add(new Team { Name = "Engineering" });
context.SaveChanges();
}
})
.UseAsyncSeeding(async (context, _, ct) =>
{
if (!await context.Set<Team>().AnyAsync(ct))
{
context.Set<Team>().Add(new Team { Name = "Engineering" });
await context.SaveChangesAsync(ct);
}
});
}
// Triggered automatically by EnsureCreated / Migrate, or manually via context.Database.EnsureCreated()
EF Core's older HasData API is now classified as "model managed data" — Microsoft recommends UseSeeding for general-purpose seeding because HasData runs as part of migrations and couples schema changes with row inserts.
ORM seeders are an improvement over raw SQL — typed languages like TypeScript (Prisma) and C# (EF Core) catch errors at compile time, the seed code lives alongside your app, and ORMs handle direct relationships through their own APIs (Prisma's nested writes, Laravel's model factories, EF Core's seeding hooks).
But the core problem remains. Data is still manually defined. Relationships are still manually wired. When a migration adds a required column or changes a relation, the seeder typically breaks — the same way a seed.sql breaks, just in TypeScript or PHP or C# instead of SQL. And you're locked to one ORM: a Prisma seed file can't be reused if part of your team uses Drizzle, or if you switch ORMs next year. For a side-by-side comparison of 7 database seeder tools (Laravel, Prisma, Drizzle, TypeORM, EF Core, Mockaroo, Seedfast) on FK resolution and schema drift, see the database seeder tool comparison.
Most teams hit a wall around 15-20 tables. Sometimes it's the dependency chain. Sometimes it's circular references. Usually it's all of them at once, and they show up the same week.
The dependency chain gets deep. To insert an order_item, you need an order, a product, and a price record. The order needs a user and a shipping_address. The product needs a category and a vendor. That's three or four levels of dependencies for one line item — and your seed code becomes more about wiring IDs together than about the data itself.
Circular references show up next. A user has a manager_id that references another user. A category has a parent_category_id. An employee belongs to a department, but a department has a head_employee_id. You can't insert either side first, so the seed needs multi-pass inserts, deferred constraints, or nullable FKs.
Migrations break seeds constantly. Every NOT NULL column addition, every new FK constraint, every renamed table requires someone to update the seed file. On an active codebase with weekly migrations, seed maintenance becomes a recurring tax that nobody budgets for. Eventually the seed file stops working and the team switches to "just ask Sarah, she knows how to get a working database."
This is the point where most teams either live with broken seeds, copy production (which carries compliance considerations), or start exploring automated approaches.
TL;DR for the impatient. If this is sounding familiar, Seedfast reads your PostgreSQL schema, resolves the foreign-key graph, and generates connected data in one command. Free tier, any ORM, no seed file to maintain.
The pattern behind every seeding frustration is the same: humans write data, the schema changes, the data becomes invalid. What if the generator read the schema and produced data that satisfies constraints automatically?
This is how Seedfast works. Connect it to your schema — no production data, no PII, no compliance perimeter to expand — and it generates a complete dataset from the schema itself: tables, columns, types, constraints, foreign keys, the full dependency graph it derives from the database.
# Point Seedfast at your database (connection string or DATABASE_URL)
seedfast connect
# Generate data from the current schema
seedfast seed --scope "e-commerce store with electronics products"
Seedfast connects to your PostgreSQL database, reads the schema, and generates a dataset that respects your declared constraints and foreign-key relationships. Tell it --scope "e-commerce store with electronics" and you get products with electronics descriptions, prices in realistic ranges, reviews that reference real products, and users with orders that make sense together — not random strings scattered across disconnected tables. The --scope flag lets you describe the domain in natural language instead of writing factory code.
Seedfast reads the database directly rather than your ORM schema, so it works transparently with Prisma, Drizzle, TypeORM, Laravel Eloquent, EF Core, and other ORMs — regardless of which ORM defined your PostgreSQL schema. When a migration changes the schema, the next seedfast seed picks up the new structure, in most cases with no seed file to update.
# Quick local dev setup
seedfast seed --scope "small team with a few projects"
# Integration test data
seedfast seed --scope "3 users with 2 orders each"
# Staging demo
seedfast seed --scope "realistic e-commerce with 500 products and reviews"
For production reference data (roles, feature flags), a version-controlled SQL file is still the right tool. The two work well together: run your reference seeds first, then let Seedfast fill in the rest around the data that's already there. You spend significantly less time maintaining test data and more time on the product you're actually building.
In a pipeline, database seeding needs to be automated, fast, and isolated. No shared databases between parallel test runs, no manual "run this script first" steps, no fixtures drifting from the schema. For a full pipeline setup guide, see the CI/CD database seeding docs.
The ideal is a single command after migrations:
# GitHub Actions
steps:
- name: Run migrations
run: npx prisma migrate deploy
- name: Seed test database
env:
SEEDFAST_API_KEY: ${{ secrets.SEEDFAST_API_KEY }}
run: seedfast seed --scope "minimal checkout flow"
- name: Run tests
run: npm test
ORM seeders work here too — npx prisma db seed, php artisan db:seed, EF Core seeding triggered by dotnet ef database update — as long as the seeder script is maintained and the CI environment matches dev. The advantage of a schema-aware generator is that it doesn't require a maintained seed script: the same command works whether the last migration added one column or ten tables.
Many teams share a single PostgreSQL database where each service owns its own tables. In that setup, Seedfast seeds everything in one command — the FK graph across all services' tables is resolved without manual ordering. No per-service seed scripts to coordinate, no "seed users first, then orders, then payments" choreography. If your services use separate databases, that's a harder problem with its own strategies.
Separate reference data from test data. Production roles, categories, and config belong in migrations or a dedicated reference seed that runs in every environment. Test data belongs in a separate step that runs only in dev, test, and staging.
Make seeds idempotent. A seed that fails on the second run is a seed that breaks CI after a retry. Use INSERT ... ON CONFLICT DO NOTHING, upsert patterns, or clear-and-reseed strategies.
Match data volume to the use case. Tiny datasets hide bugs — broken pagination, missing indexes, slow queries that only appear at scale. Local dev can be small. Integration tests should have enough data to exercise edge cases. Load tests need production-scale volumes.
Don't copy production. A pg_dump of production gives you realistic data but puts PII in any environment that receives a copy. Quality masking is harder than it sounds — you need to catalog every PII-containing column including JSON fields, free-text notes, and audit logs, and keep that catalog current as the schema evolves. Miss one and PII leaks through. Generating synthetic data from the schema substantially reduces this class of risk — the contents of production rows never leave production. (Schema metadata such as table and column names is still read by the generator, so teams in regulated verticals should review what their schema names reveal.)
Automate, don't document. If your onboarding doc says to run four commands in a specific order to get a working database, that's a seed script waiting to be written. One command, no decisions, no "ask Sarah."
Database seeding is the process of populating a database with an initial dataset — reference data for production (roles, categories) or test data for development, staging, and CI/CD. The term comes from the idea of planting seeds before anything grows. Methods range from hand-written SQL files to schema-aware generators that read the database directly and produce valid data automatically.
Migrations change database structure — creating tables, altering columns, adding indexes. Seeds populate the database with data. Migrations are versioned and run in strict order; seeds should be idempotent and can run at any time after migrations. Mixing them — putting test data inside migrations, or schema changes inside seeds — couples two concerns that change at different rates and breaks deploys in surprising ways.
Seeding gives an empty database something to work with. Without seed data, a freshly migrated database has tables but no rows — every dashboard renders empty states, every list view returns nothing, every test that depends on a user or a product fails. Seeding solves three concrete problems: local development gets a populated UI, integration and E2E tests get realistic data, and production gets the reference rows (roles, categories, feature flags) the app needs to function on day one.
Only with reference data — roles, categories, feature flags, country codes, default settings — that the application's contract depends on. Never seed development or test data into production. The two should run as separate steps: a small, version-controlled reference seed that runs in every environment including production, and a larger development/test seed that runs only in dev, staging, and CI.
Idempotent seeding is a seed script that produces the same end state no matter how many times it runs. The first run inserts the rows; the second run is a no-op instead of a duplicate-key error. Patterns: INSERT ... ON CONFLICT DO NOTHING in PostgreSQL, find_or_create_by! in Rails, upsert in Prisma. Idempotent seeds are the minimum bar for CI — a seed that fails on retry is a seed that breaks every flaky build.
A seed.sql file is a plain SQL file containing INSERT statements that populate a database with an initial dataset. You run it with psql -f seed.sql after migrations have created the tables. It's the simplest form of database seeding — version-controlled, deterministic, no dependencies — and the starting point for most projects. The downside: it's a static snapshot of the schema, so every migration that adds a required column or renames a table breaks the file until someone updates it by hand.
Add a seed step after migrations in your pipeline — one command, no manual steps, no shared state between parallel runs. ORM seeders work (npx prisma db seed, php artisan db:seed) as long as someone maintains the seed script. Seedfast doesn't need one: seedfast seed --scope "test data" reads the current schema and generates matching data on every run. Platform-specific mechanics differ: on Neon, seed the parent once and every branch inherits the data; on Supabase, preview branches re-apply seed.sql on creation.
Yes — Prisma, Drizzle, TypeORM, Sequelize, Laravel Eloquent, EF Core, Django ORM, Rails ActiveRecord, SQLAlchemy, or raw SQL — provided you're running PostgreSQL. Seedfast works with the database directly, not through your ORM. Run your migrations first, then seedfast seed — it picks up whatever schema state exists in the database.
Use a version-controlled seed file for production reference data — roles, categories, feature flags — that changes rarely and needs to be deterministic. Use a generator for development and test data that needs to cover many tables, respect foreign keys, and survive schema changes without manual updates. For a deeper comparison of all generation methods, see Test Data Generation: 7 Methods Compared.
Stop writing factory code. Seedfast reads your PostgreSQL schema and generates a connected dataset that respects your declared constraints — minimal seed files to maintain, no factory code to keep in sync with migrations, and it works across major ORMs (Prisma, Drizzle, TypeORM, Laravel Eloquent, EF Core). A free tier is available, and the getting started guide typically takes under five minutes.