PostgreSQL Test Data: A Syntax Cookbook

Q: How do you generate test data in PostgreSQL?

The PostgreSQL-native tools are for single-table volume ( ), for bulk-loading CSV from your local machine, and for carving subsets out of an existing database. Past a few tables with FKs, all three need help with insert order — a topological sort against gives you the right sequence; cyclic FKs need . For the method-level decision (script vs. masking vs. schema-aware generation), see the test data generation hub.

Q: What does `\copy` do that `COPY` doesn't?

is a meta-command that streams data over the client connection, so the CSV file lives on the machine running rather than on the database server. is a server-side SQL command that reads the file from the server's filesystem, which requires (or superuser) and is unavailable on most managed Postgres services (RDS, Cloud SQL, Supabase, Neon). For developer-side bulk loads against managed Postgres, is the practical choice.

Q: How do I handle FK insert order in PostgreSQL test data?

Two patterns. For acyclic FKs, derive the dependency graph from (catalog query in the topological-sort section above) and insert in level order — parents first, children second. For cyclic FKs ( , or two tables that reference each other), declare the FK as and wrap inserts in a single transaction; PostgreSQL checks the constraint at rather than at each row. Hand-rolling this works at small scale and becomes a maintenance burden as the FK graph grows.

Q: Can I seed a Postgres database without writing SQL?

Yes — schema-aware generators like Seedfast connect to your PostgreSQL database, derive the FK graph from , and produce inserts in topological order from a plain-English scope. No , no factories. See database seeding for the broader workflow.

By Mikhail Shytsko, Founder at Seedfast · April 9, 2026 · Updated June 26, 2026

TL;DR: PostgreSQL ships three primitives that cover most test-data work — generate_series() for single-table volume, psql \copy for bulk-loading prepared CSV, and pg_dump --data-only for carving production-shaped slices. The thing none of them solve is FK insert order across a moving schema, which is where hand-rolled approaches start needing a topological sort and start needing maintenance.

Key Takeaways

generate_series() is the right tool for single-table volume and for synthesizing time-series-shaped rows; pair it with random() and interval arithmetic to get realistic dates and amounts in one statement
psql \copy is the fastest path for loading prepared CSV from a developer machine — it runs client-side, so it works against managed Postgres where server-side COPY would need superuser or pg_read_server_files
pg_dump --data-only --table carves a single table out of an existing database; combined with --exclude-table-data it gives you a structural restore with selective contents
FK insert order across more than a few tables turns into a topological sort against pg_constraint; deferrable constraints (SET CONSTRAINTS ALL DEFERRED) are the escape hatch for cyclic FKs
For the broader method comparison (Faker, masking, production copies, schema-aware generation), see the methods hub linked below — this page stays in PostgreSQL syntax land

This page is the syntax-level cookbook for getting test data into PostgreSQL: the built-in functions, the psql meta-commands, the pg_dump invocations, and the catalog queries that let you reason about insert order against a real schema. For the broader question of which method to use — Faker scripts, masking, production copies, schema-aware generation — see test data generation: methods compared, which is the methods hub. This page assumes you've already picked the method and you're staring at a psql prompt. Where the syntax runs out — FK insert order across a schema that keeps changing — the last section shows what Seedfast does instead.

`generate_series()` — the workhorse

generate_series() produces a one-column virtual table over an integer or timestamp range. It's the cheapest way to produce volume in a single table and it composes cleanly with random(), interval arithmetic, and JOINs against existing rows.

The minimal pattern most developers write first:

INSERT INTO users (email, created_at)
SELECT
  'user' || n || '@example.test',
  NOW() - (random() * interval '365 days')
FROM generate_series(1, 100000) AS n;

Ten seconds for 100k rows on a laptop. No dependencies, runs anywhere psql runs.

The timestamp form is what makes it useful beyond row count. Producing one event per day over a year:

INSERT INTO events (occurred_at, kind)
SELECT
  d,
  CASE WHEN random() < 0.3 THEN 'login' ELSE 'page_view' END
FROM generate_series(
  '2025-01-01'::timestamptz,
  '2025-12-31'::timestamptz,
  interval '1 day'
) AS d;

Multi-table volume in one statement, when FK targets already exist:

INSERT INTO orders (user_id, placed_at, total_cents)
SELECT
  u.id,
  NOW() - (random() * interval '180 days'),
  (random() * 50000)::int + 100
FROM users u, generate_series(1, 5) AS o
WHERE random() < 0.6;

That produces roughly 0–5 orders per user, on a 60% probability. The users table has to exist first — and that's where the limits begin. generate_series() is for volume against rows that already resolve. The moment you need to populate users → orders → order_items → products in a single seed run, you're back to writing application code or to a topological-sort query (below). Volume is worth the effort precisely because it surfaces the bugs that only real test data catches — pagination off-by-ones, missing indexes, and timeout cascades that stay invisible at ten rows.

`psql \copy` for bulk CSV

When you already have a CSV — exported from a colleague's database, scraped from a public dataset, hand-crafted in a spreadsheet — psql \copy is the fastest path in. Two things make it the right primitive for test-data work rather than the server-side COPY command:

It runs on the client side. The file path is local to wherever you're running psql, not local to the Postgres server. This is what makes it work against managed Postgres (RDS, Cloud SQL, Supabase, Neon) where you don't have shell access to the server filesystem.
It needs no special privileges. Server-side COPY FROM 'filename' requires pg_read_server_files or superuser; \copy just needs INSERT on the target table.

Minimal invocation from a psql session:

\copy users (email, created_at) FROM 'users.csv' WITH (FORMAT csv, HEADER true);

A few patterns that come up in practice:

-- Skip the header row (the default is FORMAT text, which has no header concept)
\copy products FROM 'products.csv' WITH (FORMAT csv, HEADER true)

-- Different delimiter and a NULL-as-literal-string convention
\copy events FROM 'events.tsv' WITH (FORMAT csv, DELIMITER E'\t', NULL '\N')

-- Only load specific columns; the rest take DEFAULT (works with serial IDs)
\copy users (email, signup_source) FROM 'users.csv' WITH (FORMAT csv, HEADER true)

-- Quote handling for free-text columns containing commas
\copy comments FROM 'comments.csv' WITH (FORMAT csv, HEADER true, QUOTE '"', ESCAPE '"')

Gotchas worth flagging:

Encoding mismatch is the most common silent failure. If your CSV is UTF-8 with a BOM and your database is UTF-8 without, the first row's first column has an invisible prepended. Strip the BOM before loading, or use iconv to normalize.
Errors halt the whole load by default. \copy is transactional; one bad row rolls back the entire file. PostgreSQL 17 added ON_ERROR ignore to skip bad rows; on older versions, either pre-clean the CSV or load into a staging table with looser types and INSERT … SELECT … WHERE your way out.
\copy runs your triggers. BEFORE INSERT and AFTER INSERT triggers fire per row, exactly like a regular INSERT. Useful when you need a computed column populated at load time; a hidden cost when you forgot the audit-log trigger fires 100k times for a 100k-row CSV.

`pg_dump` mechanics for test-data carving

pg_dump is best known for full backups, but a handful of flags turn it into a precise test-data extractor. Useful for the case where production-shaped data (with PII handled separately upstream) is the source of truth and you want to carve out the parts a developer needs without dragging the whole database.

Data-only dump of a single table:

pg_dump --data-only --table=public.products \
  --no-owner --no-acl \
  -h prod-host -U readonly -d appdb \
  > products-data.sql

Schema for everything, data for a curated subset:

pg_dump --schema-only --no-owner -d appdb > schema.sql

pg_dump --data-only \
  --table=public.users \
  --table=public.products \
  --table=public.categories \
  --no-owner -d appdb \
  > seed-data.sql

Inverse: dump the schema and all data except the noisy tables:

pg_dump \
  --exclude-table-data='audit_*' \
  --exclude-table-data='public.event_log' \
  --no-owner -d appdb \
  > app-without-audit.sql

Restoring to a fresh development database is then just two commands:

createdb app_dev
psql -d app_dev -f schema.sql -f seed-data.sql

Limits of this approach:

pg_dump does not strip PII. If you pipe production data through it, the dump file is in HIPAA / GDPR / SOC 2 scope until it's masked, deleted, or kept inside a vetted environment. Anonymizers like Greenmask exist to slot in front of restore.
The custom format (-Fc) is what you want for selective restore with pg_restore --table=…; SQL format is what you want for grep-able human inspection.
pg_dump produces a snapshot of a moment in time. Re-running it as the seed source means re-pulling production. That's the maintenance trap, and it's why most teams that start here eventually move to a generator.

FK insert ordering: the topological sort

The hard problem with hand-rolled PostgreSQL test data isn't producing values — it's producing them in an order that preserves referential integrity: every child row's foreign key has to point at a parent row that already exists. A six-table schema with three FK chains is already past the point where humans get this right by eye.

The pg_constraint catalog query that gives you the dependency graph for the public schema:

SELECT
  child.relname  AS child_table,
  parent.relname AS parent_table
FROM pg_constraint c
  JOIN pg_class      child  ON child.oid  = c.conrelid
  JOIN pg_class      parent ON parent.oid = c.confrelid
  JOIN pg_namespace  ns     ON ns.oid     = child.relnamespace
WHERE c.contype = 'f'
  AND ns.nspname = 'public'
ORDER BY child.relname, parent.relname;

Each row reads "child references parent" — the edges of a directed graph. A valid insert order is any topological sort of that graph, with parents inserted before children.

A recursive CTE that assigns an insert_order to every public table — parents at level 0, their children at level 1, and so on:

WITH RECURSIVE deps AS (
  -- Anchor: every public table starts at level 0
  SELECT c.oid AS table_oid, c.relname AS table_name, 0 AS level
  FROM pg_class c
    JOIN pg_namespace ns ON ns.oid = c.relnamespace
  WHERE c.relkind = 'r' AND ns.nspname = 'public'

  UNION ALL

  -- Recursive: a child sits one level deeper than its deepest parent
  SELECT child.oid, child.relname, d.level + 1
  FROM deps d
    JOIN pg_constraint fk ON fk.confrelid = d.table_oid AND fk.contype = 'f'
    JOIN pg_class      child ON child.oid = fk.conrelid
  WHERE d.level < 20
)
SELECT table_name, MAX(level) AS insert_order
FROM deps
GROUP BY table_name
ORDER BY insert_order, table_name;

Sort the output by insert_order ascending and you've got a valid sequence. (The WHERE d.level < 20 guards against infinite recursion on cyclic FKs, which the next paragraph covers.)

When FKs are cyclic — users.created_by → users.id, or orders ↔ shipments where each references the other — topological sort has no valid order. The PostgreSQL escape hatch is deferrable constraints: declare the FK with DEFERRABLE INITIALLY DEFERRED, then wrap the whole seed in a transaction. Constraint checks run at COMMIT, so you can insert rows in either order. The cyclic case gets a full walkthrough in circular foreign key seed, which compares three workarounds against real terminal output.

ALTER TABLE orders
  ADD CONSTRAINT orders_shipment_fk
  FOREIGN KEY (shipment_id) REFERENCES shipments(id)
  DEFERRABLE INITIALLY DEFERRED;

BEGIN;
  -- Insert in any order; the constraint is checked at COMMIT
  INSERT INTO orders   (id, shipment_id) VALUES (1, 1);
  INSERT INTO shipments (id, order_id)   VALUES (1, 1);
COMMIT;

If the constraint already exists as non-deferrable, you can toggle it in place — no drop required (PostgreSQL 9.4+):

ALTER TABLE orders
  ALTER CONSTRAINT orders_shipment_fk DEFERRABLE INITIALLY DEFERRED;

(ALTER CONSTRAINT works on foreign keys for the deferrable attributes only; you can't change the referenced columns this way.)

For one-off seeding, SET session_replication_role = replica disables FK and trigger enforcement for the session entirely. It bypasses checks rather than deferring them, and it requires superuser (or, on PostgreSQL 15+, a non-superuser explicitly granted SET on the parameter). That's a pg_dump-restore pattern, not a development-loop pattern.

This is the part of the problem where hand-rolled scripts start needing real engineering. The query above gives you the order, but it doesn't generate the data. The data still has to be produced — with valid types, plausible values, and FK column values that match the IDs you just inserted upstream. That's the maintenance cliff most teams hit around 15–20 tables.

When the script approach stops paying

The three primitives above cover most of what PostgreSQL gives you for test data at the syntax level. Past a certain schema size, the bottleneck stops being syntax and starts being the rate at which your schema changes versus the rate at which someone updates the seed script — the idempotency, sequence resets, and FK-ordered inserts that keep it running after the next migration.

Seedfast reads the live schema on every run, derives the FK graph (the same one the topological-sort query above produces), and generates valid rows in dependency order from a plain-English scope:

seedfast seed --scope "an e-commerce database with 500 products, realistic order history, and a mix of customer account ages"

The migration story is the part scripts don't survive. Add a referrals table next sprint, and a Seedfast run picks it up automatically; a hand-rolled script needs the topological-sort query re-run and the data-generation code updated for the new table. A 30-day free trial is available — see pricing or run your first seed.

For the broader landscape — why you'd use schema-aware generation rather than Faker, masking, or pg_dump-and-anonymize — that comparison lives in the test data generation methods hub.

Frequently asked questions

How do you generate test data in PostgreSQL?

The PostgreSQL-native tools are generate_series() for single-table volume (INSERT INTO users SELECT 'user'||n FROM generate_series(1, 100000) AS n), psql \copy for bulk-loading CSV from your local machine, and pg_dump --data-only --table=… for carving subsets out of an existing database. Past a few tables with FKs, all three need help with insert order — a topological sort against pg_constraint gives you the right sequence; cyclic FKs need DEFERRABLE INITIALLY DEFERRED. For the method-level decision (script vs. masking vs. schema-aware generation), see the test data generation hub.

What does `\copy` do that `COPY` doesn't?

\copy is a psql meta-command that streams data over the client connection, so the CSV file lives on the machine running psql rather than on the database server. COPY FROM 'filename' is a server-side SQL command that reads the file from the server's filesystem, which requires pg_read_server_files (or superuser) and is unavailable on most managed Postgres services (RDS, Cloud SQL, Supabase, Neon). For developer-side bulk loads against managed Postgres, \copy is the practical choice.

How do I handle FK insert order in PostgreSQL test data?

Two patterns. For acyclic FKs, derive the dependency graph from pg_constraint (catalog query in the topological-sort section above) and insert in level order — parents first, children second. For cyclic FKs (users.created_by → users.id, or two tables that reference each other), declare the FK as DEFERRABLE INITIALLY DEFERRED and wrap inserts in a single transaction; PostgreSQL checks the constraint at COMMIT rather than at each row. Hand-rolling this works at small scale and becomes a maintenance burden as the FK graph grows.

Can I seed a Postgres database without writing SQL?

Yes — schema-aware generators like Seedfast connect to your PostgreSQL database, derive the FK graph from pg_constraint, and produce inserts in topological order from a plain-English scope. No seed.sql, no factories. See database seeding for the broader workflow.

Test Data Generation: Methods Compared — the broader, non-PostgreSQL-specific decision matrix across all approaches
Database Seeding: From Your First seed.sql to Fully Automated — the broader context for how test data fits into seeding workflows
Database Seeder: 7 Tools and Per-ORM Quick Reference — the seeder tool itself (CLI/library choice), not the generation method
Best Postgres Test Data Generator: How to Choose — the tool-shopping counterpart to this syntax cookbook, comparing generators on schema-awareness and FK validity
Circular Foreign Key Seed: Three Workarounds That Actually Run — the cyclic-FK case the topological sort can't order, with deferred constraints, nullable back-edges, and data-modifying CTEs compared
Postgres Seed Script: Survive the Next Migration — turning these primitives into a maintained seed.sql: idempotency, sequence resets, and FK-ordered inserts
Seed File Maintenance: When Your Seeds Fall Out of Sync — what happens to hand-rolled scripts over time
Load Testing With an Empty Database? Here's Your Problem — why these volumes matter: an empty database makes every load-test number fiction
Get started with Seedfast — connect to your PostgreSQL database and run your first schema-aware seed