All posts

Writing a Postgres Seed Script That Survives the Next Migration

By the Seedfast team ·

You're staring at an empty Postgres database. Maybe it's a new project, maybe it's a contributor's first day, maybe a migration just nuked the dev DB and the existing seed file no longer applies cleanly. Either way, you're about to write or rewrite a postgres seed script — the file the README tells everyone to run before they can log in locally. This article is about that file: what a good one looks like, the failure modes that quietly turn a good one into a bad one, and the point at which the script stops being the right tool.

This isn't about running a seed file (psql -f seed.sql and you're done — covered in running the script with psql, Prisma, or Drizzle). It's about the script as a maintained artifact. Seedfast reads your live schema on every run and skips the maintenance step entirely, but the long-form answer below is worth a read because the failure modes you avoid show up in every Postgres project sooner or later.

Key Takeaways#

  • A working seed script depends on three things: foreign-key-ordered inserts, idempotent re-runs (ON CONFLICT DO NOTHING), and a sequence reset at the end. Skip any of them and the file breaks the second time someone runs it.
  • The script that works on the day you write it is not the script that works after the next migration. A NOT NULL column added without a default is the canonical failure mode and shows up in real terminal output below.
  • Hand-rolled seed files scale to a handful of stable reference rows. Past that, they accumulate maintenance debt at roughly the rate your team ships migrations.
  • The pivot point — when "fix the seed file again" stops being worth it — is the moment to stop describing data by hand and start generating it from the live schema.

What a postgres seed script actually does#

A postgres seed script is a file (seed.sql, seed.ts, db/seeds.rb, whatever your stack calls it) that inserts a known set of rows into an empty or partially populated database. Two distinct jobs hide under the same name:

  • Reference data — country codes, role names, default tenants, feature flags. Small. Production-shipped. Versioned with the schema. This kind of data belongs in a seed step or in a migration.
  • Development and test data — the 50 users, 200 orders, and 1,000 line items you need so that the dashboard isn't empty on localhost. Bigger. Throwaway. Has to look plausible, has to satisfy every constraint, has to come back the same way every time you reset the database.

For the conceptual ground around methods and tradeoffs across ORMs, database seeding covers that picture. This article is narrower: a real worked example, the script that fits the schema, and the moments where it falls apart.

A worked example schema#

Five tables, real foreign-key relationships. A users table, a teams table that owns users, a team_members join table, projects per team, tasks per project, comments on tasks. The dependency graph reads top-down: users → teams → team_members → projects → tasks → comments.

-- schema.sql
CREATE TABLE users (
    id          BIGSERIAL PRIMARY KEY,
    email       TEXT NOT NULL UNIQUE,
    full_name   TEXT NOT NULL,
    created_at  TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

CREATE TABLE teams (
    id          BIGSERIAL PRIMARY KEY,
    name        TEXT NOT NULL,
    owner_id    BIGINT NOT NULL REFERENCES users(id),
    created_at  TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

CREATE TABLE team_members (
    team_id     BIGINT NOT NULL REFERENCES teams(id),
    user_id     BIGINT NOT NULL REFERENCES users(id),
    role        TEXT NOT NULL CHECK (role IN ('admin','member','viewer')),
    PRIMARY KEY (team_id, user_id)
);

CREATE TABLE projects (
    id          BIGSERIAL PRIMARY KEY,
    team_id     BIGINT NOT NULL REFERENCES teams(id),
    name        TEXT NOT NULL,
    created_at  TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

CREATE TABLE tasks (
    id          BIGSERIAL PRIMARY KEY,
    project_id  BIGINT NOT NULL REFERENCES projects(id),
    assignee_id BIGINT REFERENCES users(id),
    title       TEXT NOT NULL,
    status      TEXT NOT NULL CHECK (status IN ('todo','in_progress','done')) DEFAULT 'todo',
    created_at  TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

CREATE TABLE comments (
    id          BIGSERIAL PRIMARY KEY,
    task_id     BIGINT NOT NULL REFERENCES tasks(id),
    author_id   BIGINT NOT NULL REFERENCES users(id),
    body        TEXT NOT NULL,
    created_at  TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

Two things to notice. The tasks.assignee_id column is nullable — it's a real-world detail (an unassigned task is valid) and it's the kind of nuance a seed script has to get right. The team_members table has a composite primary key, which means its idempotency strategy is slightly different from the others. We'll come back to both.

Writing the seed file#

A reasonable hand-rolled seed for the schema above looks like this. Every block has a job: explicit IDs (so test code can assert against them), foreign-key-ordered inserts, ON CONFLICT DO NOTHING for re-run safety, a transaction wrapper, and a sequence reset at the end so the next inserted row doesn't collide with id 3.

-- seed.sql
BEGIN;

INSERT INTO users (id, email, full_name) VALUES
  (1, 'alice@example.com', 'Alice Anderson'),
  (2, 'bob@example.com',   'Bob Brown'),
  (3, 'carol@example.com', 'Carol Chen')
ON CONFLICT (id) DO NOTHING;

INSERT INTO teams (id, name, owner_id) VALUES
  (1, 'Platform',  1),
  (2, 'Frontend',  2)
ON CONFLICT (id) DO NOTHING;

INSERT INTO team_members (team_id, user_id, role) VALUES
  (1, 1, 'admin'),
  (1, 2, 'member'),
  (2, 2, 'admin'),
  (2, 3, 'member')
ON CONFLICT DO NOTHING;

INSERT INTO projects (id, team_id, name) VALUES
  (1, 1, 'Auth refactor'),
  (2, 1, 'Billing cleanup'),
  (3, 2, 'Design system v2')
ON CONFLICT (id) DO NOTHING;

INSERT INTO tasks (id, project_id, assignee_id, title, status) VALUES
  (1, 1, 1, 'Move login to OAuth flow', 'in_progress'),
  (2, 1, 2, 'Audit JWT expiry handling', 'todo'),
  (3, 2, 1, 'Reconcile payment webhooks',  'done'),
  (4, 3, 3, 'Token migration plan',       'todo')
ON CONFLICT (id) DO NOTHING;

INSERT INTO comments (id, task_id, author_id, body) VALUES
  (1, 1, 2, 'Started a branch yesterday — almost there.'),
  (2, 1, 1, 'Nice. Let me know when it is ready for review.'),
  (3, 3, 1, 'Webhook handler now retries on 5xx.')
ON CONFLICT (id) DO NOTHING;

SELECT setval('users_id_seq',    (SELECT MAX(id) FROM users));
SELECT setval('teams_id_seq',    (SELECT MAX(id) FROM teams));
SELECT setval('projects_id_seq', (SELECT MAX(id) FROM projects));
SELECT setval('tasks_id_seq',    (SELECT MAX(id) FROM tasks));
SELECT setval('comments_id_seq', (SELECT MAX(id) FROM comments));

COMMIT;

Run against a fresh postgres:16-alpine container, the captured output is small and boring — which is what you want from a seed script:

BEGIN
INSERT 0 3
INSERT 0 2
INSERT 0 4
INSERT 0 3
INSERT 0 4
INSERT 0 3
COMMIT
 users | teams | team_members | projects | tasks | comments
-------+-------+--------------+----------+-------+----------
     3 |     2 |            4 |        3 |     4 |        3
(1 row)

Three users, two teams, four team_members, three projects, four tasks, three comments. Re-running the file produces the same row counts because every block is idempotent.

If you'd rather generate richer data than three named users, the standard column-level option is the Node.js + @faker-js/faker loop — open a pg connection, call faker.person.fullName() per row, insert. Faker handles column values; you handle the relationships and the insert order yourself. That's a good fit for flat tables and the place most teams reach for at first.

Where postgres seed scripts go wrong#

Most of the bad days a seed file causes come from one of five mistakes. Worth naming each one out loud, because reviewers tend to skim past them and they all have a fix.

Foreign-key ordering#

Every INSERT for a child row needs the parent row to already exist. Inserting tasks before projects fails. Inserting comments before tasks fails. The dependency graph in the schema dictates the order in the file, and once you have more than five or six tables it stops being obvious.

The harder edge cases are not linear chains. Circular foreign key cases — where two tables reference each other, or a table references itself — need either deferrable constraints (SET CONSTRAINTS ALL DEFERRED) or a two-pass insert.

Idempotency#

A seed file that works once and fails on the second run is broken. Two patterns make a script safe to re-run: INSERT ... ON CONFLICT DO NOTHING against a unique constraint, and wrapping the whole script in BEGIN; ... COMMIT; so a partial failure rolls back instead of leaving the database in a half-seeded state. For tables with a composite primary key (team_members above), use ON CONFLICT DO NOTHING without specifying columns and Postgres infers the constraint.

Deterministic vs randomized data#

Two paths, with a real tradeoff. Deterministic seeds ((1, 'alice@example.com')) are reproducible — your tests can assert that user 1 is Alice — but they look fake and they hide bugs that only show up at scale. Randomized seeds (Faker, UUIDs) look real and stress more code paths, but the rows change every run and you can't write tests that pin to specific IDs. Most projects end up with both: a small block of deterministic reference rows for tests and a randomized block for volume.

Optional vs required columns#

The seed file above only sets assignee_id on tasks where it makes sense, because the column is nullable. If you blindly populate every column, you erase the difference between "this task has no owner yet" and "this task is owned by user 1." The schema documents which columns can be NULL; the seed file should respect that distinction or you lose a real piece of test coverage.

Sequence collisions#

Inserting an explicit id = 1 into a BIGSERIAL column doesn't advance the sequence. The next INSERT without an explicit id tries 1 again and trips the primary key. The setval(...) block at the end of the script above is the fix: walk the table, find MAX(id), point the sequence at it. Skipping this step is the most common reason a seed file works under psql -f seed.sql and then breaks the moment the application tries to insert anything new.

A side note on packaging: if you ship a Postgres image to other developers via Docker, dropping seed.sql and the schema file into /docker-entrypoint-initdb.d/ makes the official postgres image run them automatically the first time the data directory is empty. That gets you a one-command local environment without a separate seed step in your dev script.

What happens after the next migration#

Here's the failure mode no tutorial wants to show. The seed file works. You merge it. Three weeks later, somebody writes a migration that adds a visibility column to projects — required, no default — because the product needs public and private projects.

-- migration-after.sql
ALTER TABLE projects
    ADD COLUMN visibility TEXT NOT NULL DEFAULT 'private';

ALTER TABLE projects
    ALTER COLUMN visibility DROP DEFAULT;

Adding a NOT NULL column with a default and then dropping the default is a common migration pattern: existing rows get the default, future inserts have to supply the value explicitly. The migration is fine. The seed file is now wrong, and it doesn't know it.

Re-run the same seed against the post-migration schema and you get this — captured verbatim from a postgres:16-alpine container:

ALTER TABLE
ALTER TABLE
--- now re-run hand-rolled-seed.sql, expect FAILURE on projects insert ---
BEGIN
INSERT 0 0
INSERT 0 0
INSERT 0 0
psql:/work/hand-rolled-seed.sql:29: ERROR:  null value in column "visibility" of relation "projects" violates not-null constraint
DETAIL:  Failing row contains (1, 1, Auth refactor, 2026-05-01 04:29:21.587522+00, null).

Three tables seeded zero rows because the whole script is wrapped in a transaction and the projects insert rolled the lot back. Nothing visible in the database. The error blames visibility, but the deeper problem is structural: the seed file was written against schema version N and the database is now at version N+1. Every migration that adds a required column, a check constraint, or a foreign key creates one more place where the same kind of failure can happen.

When the script is no longer the right tool#

A hand-rolled seed file is fine when the schema is small and stable. It pays for itself when most of your sprint goes into application code and migrations are rare. The math changes when you've fixed the same file three times in two months — you've started spending more time keeping the seed in step with the schema than writing the features the seed is supposed to demonstrate. If your file is already in this state, seed file maintenance covers the cost-side of the same problem in more depth.

The structural fix is to stop describing data by hand and start generating it from the schema as it exists right now. Seedfast reads the live database every time it runs — including the new visibility column — and resolves the insert order across the whole table set automatically. Against the post-migration schema above, with no edits to seed code, the captured run looks like this:

[2026-05-01T06:30:34+02:00] INFO: Seeding started
[2026-05-01T06:30:46+02:00] INFO: Tables in scope: public.users, public.teams, public.team_members, public.projects, public.tasks, public.comments
[2026-05-01T06:30:46+02:00] INFO: Planned: 115 records across 6 tables
[2026-05-01T06:30:46+02:00] INFO: Auto-approving plan (scope provided)
[2026-05-01T06:30:48+02:00] INFO: Seeding table public.users (1/6)
[2026-05-01T06:30:59+02:00] INFO: Table public.users completed: 8 rows in 11.249s
[2026-05-01T06:30:59+02:00] INFO: Seeding table public.teams (2/6)
[2026-05-01T06:31:07+02:00] INFO: Table public.teams completed: 3 rows in 8.017s
[2026-05-01T06:31:08+02:00] INFO: Seeding table public.team_members (3/6)
[2026-05-01T06:31:08+02:00] INFO: Seeding table public.projects (4/6)
[2026-05-01T06:31:19+02:00] INFO: Table public.team_members completed: 8 rows in 11.532s
[2026-05-01T06:31:19+02:00] INFO: Table public.projects completed: 6 rows in 11.685s
[2026-05-01T06:31:19+02:00] INFO: Seeding table public.tasks (5/6)
[2026-05-01T06:31:33+02:00] INFO: Table public.tasks completed: 30 rows in 13.736s
[2026-05-01T06:31:33+02:00] INFO: Seeding table public.comments (6/6)
[2026-05-01T06:31:44+02:00] INFO: Table public.comments completed: 60 rows in 10.852s
[2026-05-01T06:31:44+02:00] INFO: Seeding completed: 6/6 tables succeeded, 115 rows, 72.45s

Six of six tables filled, 115 rows, FK ordering resolved by reading the schema. The hand-rolled file had to be edited to know about visibility; Seedfast did not — it noticed the new column and supplied a value. That's the load-bearing claim from this run: schema-aware insert ordering against the live database, with no seed-code changes when the schema moves. The bigger arguments about data realism live in articles where the evidence supports them.

If you want to see the same workflow end-to-end on your own schema, seedfa.st walks through the first run.