Database Seeding: From Your First seed.sql to Fully Automated
By the Seedfast team ·
Database seeding is the process of populating a database with data before your application creates any real records. That might be three admin users for local development, a thousand products for a staging demo, or a connected dataset across 40 tables for integration tests.
If you've ever written an INSERT statement into a file called seed.sql or seed.ts — you've done database seeding. This guide starts there and works forward through ORM seeders, generated data, and what to do when none of those keep up with your schema.
Key Takeaways
- Database seeding starts simple (a SQL file) and gets complicated fast as schemas grow — the problem isn't generating values, it's keeping seed data valid across migrations
- Every ORM has a built-in seeder (Prisma, Laravel, EF Core), but the data and relationships are defined manually — which means they break on schema changes just like raw SQL
- Foreign key resolution is the real bottleneck at scale: inserting data into 30+ tables with circular dependencies requires topological sorting, not manual ordering
- Schema-aware generators read the live database and produce constraint-satisfying, connected data — reducing or eliminating the need for a maintained seed file
What is database seeding?
The term comes from agriculture: you plant seeds before anything grows. In databases, seeding means inserting an initial dataset so the application has something to work with.
There are two kinds of seed data, and they're often conflated:
Reference data lives in production. Roles, categories, feature flags, country codes, default settings. This data is part of the application's contract — without it, the app doesn't function. It changes rarely and should be version-controlled.
Development/test data exists only in non-production environments. Fake users, sample products, test orders. Its purpose is to let developers build features, run tests, and demo the application without waiting for real users to show up.
The confusion starts when teams use the same seed mechanism for both. A seed.sql that inserts admin roles AND a thousand fake users couples two things that change at different rates and have different deployment rules. Production reference data should be in migrations or a dedicated seed step. Test data should be generated, not hand-written.
Starting simple: the seed.sql file
Every database seeding journey starts here:
INSERT INTO roles (id, name) VALUES
(1, 'admin'), (2, 'editor'), (3, 'viewer');
INSERT INTO teams (id, name) VALUES
(1, 'Engineering'), (2, 'Design');
INSERT INTO users (id, email, team_id, role_id) VALUES
(1, 'alice@example.com', 1, 1),
(2, 'bob@example.com', 2, 2);
Run it with psql -f seed.sql and your database has data. Simple, deterministic, version-controlled. For a 5-table schema that changes quarterly, this is fine.
The problem starts on the day someone runs ALTER TABLE users ADD COLUMN department_id INTEGER NOT NULL REFERENCES departments(id). Your seed file doesn't insert departments. The INSERT into users now fails. Someone has to fix the file, test it, commit it. Multiply this across a schema with 30 tables and monthly migrations, and the maintenance cost becomes its own workstream.
How ORMs handle database seeding
Every major ORM has a seeding mechanism. The syntax varies, but the pattern is identical: you write code that creates records through the ORM's API.
Prisma puts seeds in a TypeScript file:
import { PrismaClient } from '@prisma/client';
const prisma = new PrismaClient();
async function main() {
const team = await prisma.team.create({
data: { name: 'Engineering' },
});
await prisma.user.create({
data: {
email: 'alice@example.com',
team: { connect: { id: team.id } },
},
});
}
main();
// Run: npx prisma db seed
Laravel uses seeder classes:
// database/seeders/DatabaseSeeder.php
public function run(): void
{
$team = Team::create(['name' => 'Engineering']);
User::create([
'email' => 'alice@example.com',
'team_id' => $team->id,
]);
}
// Run: php artisan db:seed
EF Core offers HasData in model configuration:
protected override void OnModelCreating(ModelBuilder builder)
{
builder.Entity<Team>().HasData(
new Team { Id = 1, Name = "Engineering" }
);
builder.Entity<User>().HasData(
new User { Id = 1, Email = "alice@example.com", TeamId = 1 }
);
}
// Run: dotnet ef database update
ORM seeders are an improvement over raw SQL — typed languages like TypeScript (Prisma) and C# (EF Core) catch errors at compile time, the seed code lives alongside your app, and ORMs handle direct relationships through their own APIs (Prisma's nested writes, Laravel's model factories, EF Core's navigation properties).
But the core problem remains. Data is still manually defined. Relationships are still manually wired. When a migration adds a required column or changes a relation, the seeder breaks — the same way a seed.sql breaks, just in TypeScript or PHP or C# instead of SQL. And you're locked to one ORM: a Prisma seed file can't be reused if part of your team uses Drizzle, or if you switch ORMs next year.
When seeding stops being simple
At some point — usually around 15-20 tables — manual database seeding starts to hurt. Three things happen simultaneously:
The dependency chain gets deep. To insert an order_item, you need an order, a product, and a price record. The order needs a user and a shipping_address. The product needs a category and a vendor. That's three or four levels of dependencies for one line item — and your seed code becomes more about wiring IDs together than about the data itself.
Circular references appear. A user has a manager_id that references another user. A category has a parent_category_id. An employee belongs to a department, but a department has a head_employee_id. You can't insert either side first. Now your seed needs multi-pass inserts, deferred constraints, or nullable FKs — and the complexity of the seed script starts exceeding the complexity of the feature you're trying to test.
Migrations break seeds constantly. Every NOT NULL column addition, every new FK constraint, every renamed table requires someone to update the seed file. On an active codebase with weekly migrations, seed maintenance becomes a recurring tax that nobody budgets for. Eventually the seed file stops working and the team switches to "just ask Sarah, she knows how to get a working database."
This is the point where most teams either live with broken seeds, copy production (which carries compliance considerations), or start exploring automated approaches.
Reading the schema instead of writing the data
The pattern behind every seeding frustration is the same: humans write data, the schema changes, the data becomes invalid. What if the generator read the schema and produced data that satisfies constraints automatically?
This is how Seedfast works. Instead of defining what data to insert, you connect Seedfast to your database and let it figure out the data from the schema — tables, columns, types, constraints, foreign keys, the full dependency graph.
# Point Seedfast at your database (connection string or DATABASE_URL)
seedfast connect
# Generate data from the current schema
seedfast seed --scope "e-commerce store with electronics products"
Seedfast connects to your PostgreSQL database, reads the schema, and generates a complete dataset that respects all your constraints and relationships. Tell it --scope "e-commerce store with electronics" and you get products with electronics descriptions, prices in realistic ranges, reviews that reference real products, and users with orders that make sense together — not random strings scattered across disconnected tables. The --scope flag lets you describe the domain in plain English instead of writing factory code.
It doesn't matter which ORM you use. Prisma, Drizzle, TypeORM, Laravel's Eloquent, EF Core — Seedfast reads the database directly, not your ORM schema. When a migration changes the schema, the next seedfast seed picks it up. No seed file to update.
# Quick local dev setup
seedfast seed --scope "small team with a few projects"
# Integration test data
seedfast seed --scope "3 users with 2 orders each"
# Staging demo
seedfast seed --scope "realistic e-commerce with 500 products and reviews"
For production reference data (roles, feature flags), a version-controlled SQL file is still the right tool. The two work well together: run your reference seeds first, then let Seedfast fill in the rest around the data that's already there. You stop maintaining test data and start spending that time on the product you're actually building.
Database seeding in CI/CD
In a pipeline, database seeding needs to be automated, fast, and isolated. No shared databases between parallel test runs, no manual "run this script first" steps, no fixtures drifting from the schema. For a full pipeline setup guide, see the CI/CD database seeding docs.
The ideal is a single command after migrations:
# GitHub Actions
steps:
- name: Run migrations
run: npx prisma migrate deploy
- name: Seed test database
run: seedfast seed --scope "minimal checkout flow"
- name: Run tests
run: npm test
ORM seeders work here too — npx prisma db seed, php artisan db:seed, dotnet ef database update — as long as the seeder script is maintained and the CI environment matches dev. The advantage of a schema-aware generator is that it doesn't require a maintained seed script: the same command works whether the last migration added one column or ten tables.
Many microservice teams share a single PostgreSQL database where each service owns its own tables. In that setup, Seedfast seeds everything in one command — the full FK graph across all services' tables is resolved automatically. No per-service seed scripts to coordinate, no "seed users first, then orders, then payments" choreography. If your services use separate databases, that's a harder problem with its own strategies.
Best practices
Separate reference data from test data. Production roles, categories, and config belong in migrations or a dedicated reference seed that runs in every environment. Test data belongs in a separate step that runs only in dev, test, and staging.
Make seeds idempotent. A seed that fails on the second run is a seed that breaks CI after a retry. Use INSERT ... ON CONFLICT DO NOTHING, upsert patterns, or clear-and-reseed strategies.
Match data volume to the use case. Tiny datasets hide bugs — broken pagination, missing indexes, slow queries that only appear at scale. Local dev can be small. Integration tests should have enough data to exercise edge cases. Load tests need production-scale volumes.
Don't copy production. A pg_dump of production gives you realistic data but puts PII in every environment with a copy. Quality masking is harder than it sounds — you need to catalog every PII-containing column including JSON fields, free-text notes, and audit logs, and keep that catalog current as the schema evolves. Miss one and PII leaks through. Generating synthetic data from the schema sidesteps this class of risk entirely.
Automate, don't document. If your onboarding doc says "run these 4 commands in this order to get a working database," that's a seed script waiting to be written. One command, no decisions, no "ask Sarah."
Frequently asked questions
What does database seeding mean?
Database seeding is populating a database with an initial dataset — reference data for production (roles, categories) or test data for development, staging, and CI/CD. The term comes from the idea of planting seeds before anything grows. Methods range from hand-written SQL files to AI generators that read the schema and produce valid data automatically.
What's the difference between a migration and a seed?
Migrations change database structure — creating tables, altering columns, adding indexes. Seeds populate the database with data. Migrations are versioned and run in order. Seeds should be idempotent and can run at any time after migrations. Mixing them (like EF Core's HasData) couples two concerns that change at different rates.
How do I seed a database in CI/CD?
Add a seed step after migrations in your pipeline — one command, no manual steps, no shared state between parallel runs. ORM seeders work (npx prisma db seed, php artisan db:seed) as long as someone maintains the seed script. Seedfast doesn't need one: seedfast seed --scope "test data" reads the current schema and generates matching data on every run.
Does Seedfast work with my ORM?
Yes — Prisma, Drizzle, TypeORM, Sequelize, Laravel Eloquent, EF Core, Django ORM, Rails ActiveRecord, SQLAlchemy, or raw SQL. Seedfast works with the database directly, not through your ORM. Run your migrations first, then seedfast seed — it picks up whatever schema state exists in the database.
When should I use a seed file vs a generator?
Use a version-controlled seed file for production reference data — roles, categories, feature flags — that changes rarely and needs to be deterministic. Use a generator for development and test data that needs to cover many tables, respect foreign keys, and survive schema changes without manual updates. For a deeper comparison of all generation methods, see Test Data Generation: 7 Methods Compared.
Seed your database in one command
Seedfast reads your PostgreSQL schema and generates a valid, connected dataset — no seed files to maintain, no factory code to update, works with any ORM. There's a free tier, and the getting started guide takes under five minutes.