Review SQL Migrations in 30 Seconds: Seed, Migrate, Compare
By Mikhail Shytsko, Founder at Seedfast · · Updated
You review application code with tests. You review migrations by reading SQL and hoping for the best. Here's how to actually validate what a migration does before you approve it.
It's Wednesday morning. You have three PRs to review before standup. Two are application code — you check the tests, read the logic, leave a comment about a missing null check, approve the rest. The third PR adds a migration: ALTER TABLE users ALTER COLUMN email TYPE TEXT. You read the SQL. It looks fine. You approve.
On Thursday, the migration runs against production. The users table has 4 million rows. The column type change triggers a full table rewrite. That's not the problem — the team expected that. The problem is what the change quietly removes: the old VARCHAR(255) column capped every email at 255 characters, and a downstream billing export to a partner system relies on that cap to fit its own fixed-width email field. Once the column becomes TEXT with no length limit, the next batch of sign-ups with longer addresses flows straight through and the partner export starts rejecting payloads it was never built to hold.
The migration was syntactically correct. The schema change made sense. But the effect of that migration on real data was something nobody looked at. You approved a diff. You didn't approve an outcome.
- Reading migration SQL checks syntax and intent. It cannot show you what the migration does to real data — NULLs, duplicates, edge-case strings, and out-of-range values that only exist once a table has been in production a while.
- A migration review needs three things a diff can't give you: the before state, the after state, and a diff of the data, not just the schema.
- The bug is usually in the data, not the SQL. A syntactically valid
UPDATE ... = a || bstill produces NULLs, a validADD CONSTRAINT UNIQUEstill fails on existing duplicates, a valid type cast still rounds money. - Correctness testing (production-shaped data: NULLs, edge cases, ~500 rows) is a different question from performance testing (production-scale data: millions of rows). This article is about correctness, the part that belongs in code review. Its performance cousin is migration testing at scale.
- Seedfast seeds FK-valid, production-shaped data from your live schema in one command, so a reviewer can run any migration against realistic variety before approving it.
Code review has a well-established loop: read the change, check the tests, verify the behavior matches the intent. When someone adds a new endpoint, you can look at the test suite and see exactly what inputs produce what outputs. When someone refactors a service, the tests prove that existing behavior is preserved.
Migrations have none of this. A typical migration PR looks like:
-- migrations/20260226_consolidate_user_names.sql
ALTER TABLE users ADD COLUMN full_name VARCHAR(200);
UPDATE users SET full_name = first_name || ' ' || last_name;
ALTER TABLE users DROP COLUMN first_name;
ALTER TABLE users DROP COLUMN last_name;
As a reviewer, you read this and think: "Okay, concatenating first and last name into a single column. Makes sense." You check for obvious mistakes — missing WHERE clause, wrong column name, bad syntax. You approve.
What you didn't check:
- Are there rows where
first_nameis NULL? The concatenation producesNULL || ' ' || 'Smith'=NULL. Thefull_namecolumn now has NULLs where you expected names. - Are there rows where
first_nameorlast_namecontain leading or trailing spaces? You now have" John " || " " || "Smith"=" John Smith". - Does the combined length of
first_name + ' ' + last_nameever exceed 200 characters? If it does, theUPDATEsilently truncates or throws — depending on your database settings.
These aren't edge cases. They're the normal state of any database that's been in production for more than six months. But you can't see them by reading SQL. You can only see them by running the migration against data that has the same shape as production.
When you review application code, you don't just read the source — you read the test output. The tests tell you: "Given this input, this is what happens." You're reviewing behavior, not just syntax.
Migration reviews should work the same way. What a reviewer actually needs is:
- Before state: What does the data look like right now?
- After state: What does the data look like after the migration?
- Diff: What changed, and does the change match the stated intent?
If a migration says "consolidate first and last name into full_name," the reviewer should be able to see a sample of actual rows before and after. Not imagine them. See them.
The problem is that this requires data. Your local development database has 12 rows, all with perfectly formatted names like "John Smith" and "Jane Doe." That data will never reveal the NULL concatenation bug, the trailing spaces, or the truncation. It's too clean, too small, and too synthetic.
The fix is a three-step workflow that takes about 30 seconds of hands-on time:
Start with a database that has your current schema (before the migration) and fill it with data that has realistic variety — NULLs, edge-case strings, varying lengths, the kind of messiness that accumulates in production.
# Seed users with realistic variation
seedfast seed --scope "500 users with varied names, some with NULL first or last names, international characters"
Seedfast generates data that reflects real-world patterns: some names are short, some are long, some fields are NULL, unicode characters show up where they would in production. This is the critical difference from hand-crafted test fixtures — you get the variety you didn't think to test for. You describe the variety you want in a plain-English scope, and because it reads your live schema each run, the seed keeps producing FK-valid rows even after the migration you're reviewing changes the shape of a table.
Capture the before state, run the migration, capture the after state.
# Snapshot the before state
psql $DATABASE_URL -c "SELECT first_name, last_name FROM users LIMIT 20" > before.txt
# Run the migration
psql $DATABASE_URL -f migrations/20260226_consolidate_user_names.sql
# Snapshot the after state
psql $DATABASE_URL -c "SELECT full_name FROM users LIMIT 20" > after.txt
Now look at what actually happened:
# Check for NULLs that shouldn't be there
psql $DATABASE_URL -c "SELECT count(*) FROM users WHERE full_name IS NULL"
# Check for unexpected whitespace
psql $DATABASE_URL -c "SELECT full_name FROM users WHERE full_name LIKE '% %' OR full_name LIKE ' %' LIMIT 10"
# Check for truncation
psql $DATABASE_URL -c "SELECT full_name, length(full_name) FROM users ORDER BY length(full_name) DESC LIMIT 5"
If the migration is correct, these queries return clean results. If it's not, you just caught a production bug from your laptop in under a minute.
The migration:
UPDATE users SET full_name = first_name || ' ' || last_name;
The bug: Any row where first_name or last_name is NULL produces a NULL full_name. In PostgreSQL, NULL concatenated with anything is NULL — the string operators short-circuit to NULL the moment one operand is NULL.
What the reviewer sees after seeding and running:
count
-------
47
(1 row)
47 users just lost their names. The fix is obvious once you see it — wrap each column in COALESCE so a NULL becomes an empty string instead of poisoning the whole expression:
UPDATE users SET full_name = COALESCE(first_name, '') || ' ' || COALESCE(last_name, '');
But you'd never catch this by reading the original SQL — because the SQL is syntactically valid and logically reasonable. The bug is in the data, not the code.
The migration:
ALTER TABLE orders ALTER COLUMN shipping_address SET NOT NULL
The reviewer reads this and thinks: "Good, shipping address should be required." They approve. In production, 8% of orders are digital-only — no shipping address. The migration fails on deploy.
After seeding 500 orders with realistic variety:
psql $DATABASE_URL -c "SELECT count(*) FROM orders WHERE shipping_address IS NULL"
count
-------
41
(1 row)
The reviewer now knows: this migration needs a backfill step first, or it needs to handle digital orders differently.
The migration:
ALTER TABLE products ALTER COLUMN price TYPE INTEGER USING price::INTEGER;
The intent: "We don't need decimal precision, let's simplify." The reviewer checks the cast syntax — looks correct. Approved. But ALTER COLUMN ... TYPE rewrites every value through the USING cast, and casting numeric to integer rounds each price to a whole number, dropping the cents.
After seeding:
price
---------
29.99
14.50
99.95
7.25
149.99
Every one of those prices is rounded to the nearest whole number — $29.99 becomes $30, $7.25 becomes $7. The cents are gone. Across thousands of products, financial reports drift by thousands of dollars. The reviewer catches this instantly by looking at the data, not the DDL.
The migration:
ALTER TABLE users ADD COLUMN role VARCHAR(20) NOT NULL DEFAULT 'user'
Seems fine. But after seeding and inspecting existing rows:
role | count
----------+-------
user | 412
admin | 23
manager | 65
Wait — all existing rows now have role = 'user', including the 23 admins and 65 managers. If the application was previously storing role information in a different table or field, this migration just demoted every admin to a regular user. The DEFAULT applies to existing rows because the column is new, and every row gets the default.
The fix: add the column as nullable first, backfill from the authoritative source, then add the NOT NULL constraint.
The migration:
ALTER TABLE customers ADD CONSTRAINT unique_email UNIQUE (email)
After seeding:
email | count
------------------------+-------
john.smith@example.com | 3
info@company.org | 2
The migration will fail on deploy. Duplicate emails exist — maybe from a legacy import, maybe from a bulk account creation that didn't enforce uniqueness, maybe from a bug that was fixed years ago but left dirty data behind. The constraint is correct, but the data isn't ready for it.
That's five different bugs with five different fixes — a COALESCE, a backfill, a column redesign, a data cleanup. What they share isn't the fix; it's that none of them are visible in the SQL. The only way to find them is to seed realistic data, run the migration, and look at what actually changed before it reaches production.
Wrap the workflow into a script that any reviewer can run:
#!/bin/bash
# review-migration.sh — Validate a migration against realistic data
set -e
MIGRATION_FILE=$1
DATABASE_URL=${DATABASE_URL:-"postgres://dev:dev@localhost:5432/migration_review"}
if [ -z "$MIGRATION_FILE" ]; then
echo "Usage: ./review-migration.sh <migration-file>"
exit 1
fi
echo "=== Step 1: Reset database ==="
dropdb --if-exists migration_review
createdb migration_review
echo "=== Step 2: Apply base schema ==="
for f in migrations/*.sql; do
if [ "$f" != "$MIGRATION_FILE" ]; then
psql $DATABASE_URL -f "$f" 2>/dev/null || true
fi
done
echo "=== Step 3: Seed realistic data ==="
seedfast seed --scope "Realistic data across all tables, include NULLs and edge cases" --output plain
echo "=== Step 4: Capture before state ==="
pg_dump $DATABASE_URL --data-only --inserts > /tmp/before_data.sql
psql $DATABASE_URL -c "\d+" > /tmp/before_schema.txt
echo "=== Step 5: Run migration ==="
echo "Running: $MIGRATION_FILE"
if ! psql $DATABASE_URL -f "$MIGRATION_FILE" 2>/tmp/migration_errors.txt; then
echo "MIGRATION FAILED:"
cat /tmp/migration_errors.txt
exit 1
fi
echo "=== Step 6: Capture after state ==="
psql $DATABASE_URL -c "\d+" > /tmp/after_schema.txt
echo "=== Step 7: Show schema diff ==="
diff /tmp/before_schema.txt /tmp/after_schema.txt || true
echo "=== Step 8: Validate data integrity ==="
psql $DATABASE_URL -c "
SELECT table_name, column_name
FROM information_schema.columns
WHERE table_schema = 'public'
AND is_nullable = 'NO'
AND column_default IS NULL
ORDER BY table_name, column_name;
"
echo "=== Done. Review the output above. ==="
Now any reviewer can run ./review-migration.sh migrations/20260226_consolidate_user_names.sql and see the actual effect of the migration before approving the PR.
Don't rely on reviewers remembering to run the script. Make it part of your PR pipeline.
name: Migration Review Check
on:
pull_request:
paths:
- 'migrations/**'
jobs:
validate-migration:
runs-on: ubuntu-latest
services:
postgres:
image: postgres:16
env:
POSTGRES_DB: review
POSTGRES_USER: review
POSTGRES_PASSWORD: review
ports:
- 5432:5432
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Apply base schema
run: |
for f in migrations/*.sql; do
psql $DATABASE_URL -f "$f" 2>/dev/null || true
done
env:
DATABASE_URL: postgres://review:review@localhost:5432/review
- name: Seed realistic data
run: seedfast seed --scope "Realistic dataset across all tables, include NULLs and varied data" --output plain
env:
SEEDFAST_API_KEY: ${{ secrets.SEEDFAST_API_KEY }}
DATABASE_URL: postgres://review:review@localhost:5432/review
- name: Apply new migration(s)
run: |
NEW_MIGRATIONS=$(git diff --name-only origin/main -- migrations/ | sort)
for f in $NEW_MIGRATIONS; do
echo "--- Applying: $f ---"
psql $DATABASE_URL -f "$f"
done
env:
DATABASE_URL: postgres://review:review@localhost:5432/review
- name: Validate post-migration state
run: |
echo "=== Row counts after migration ==="
for table in $(psql $DATABASE_URL -t -c "SELECT tablename FROM pg_tables WHERE schemaname='public'"); do
psql $DATABASE_URL -c "SELECT '$table' as table_name, count(*) as rows FROM $table"
done
echo "=== Check for unexpected NULLs ==="
psql $DATABASE_URL -c "
SELECT c.table_name, c.column_name, c.is_nullable
FROM information_schema.columns c
WHERE c.table_schema = 'public'
ORDER BY c.table_name, c.ordinal_position;
"
env:
DATABASE_URL: postgres://review:review@localhost:5432/review
When a PR adds a migration, this pipeline seeds realistic data, runs the migration, and reports the outcome. Reviewers get concrete evidence in the PR checks instead of relying on their ability to mentally execute SQL. The CI/CD database seeding guide covers the full pipeline setup, including ephemeral per-PR databases so each migration is reviewed against a fresh, isolated copy.
There's an important distinction here. Testing migrations against production-scale data (millions of rows) answers the question: "Will this migration be fast enough?" That's about lock duration, rewrite time, and index build speed — the failure mode behind the ALTER TABLE that locked an 8-million-row table for 40 minutes.
Testing migrations against production-shaped data (realistic variety, NULLs, edge cases, messy strings) answers a different question: "Will this migration produce correct results?" That's about data integrity, constraint violations, and silent data loss. It's the same volume-and-variety blind spot that hides the bugs only real test data catches — except here the cost isn't a slow query, it's wrong data written to every row.
You don't need 5 million rows to catch a NULL concatenation bug. You need 500 rows with realistic NULL patterns. You don't need production scale to discover that a unique constraint will fail on duplicates. You need a dataset that contains duplicates.
This article is about the second kind of testing. It's about giving reviewers the ability to see what a migration does to data, not just how long it takes. Both are important. Both require realistic data. But they answer different questions, and they fit into different parts of the development process.
Performance testing happens before deploy. Correctness testing happens during code review.
Before approving any PR that includes a schema migration:
- Seed realistic data and run the migration locally (or check the CI results)
- Verify no unexpected NULLs were introduced
- Verify no data was silently truncated or rounded
- Verify row counts are unchanged (unless rows are intentionally removed)
- Check that new DEFAULT values make sense for existing rows, not just new ones
- If adding NOT NULL: confirm no existing rows violate the constraint
- If adding UNIQUE: confirm no existing duplicates
- If adding a FOREIGN KEY: confirm all referenced rows exist and referential integrity holds across the data you already have
- If changing a column type: confirm all existing values fit the new type
- String operations handle NULLs (use COALESCE or explicit NULL checks)
- Backfill queries produce correct results on a sample of real-shaped data
- The migration is idempotent or has a working rollback
- PR includes what the migration does AND why (not just the SQL)
- If the migration is multi-step, each step's purpose is documented
- CI migration validation check passed
You review a migration by running it against realistic data and comparing the before state, the after state, and the diff — not just reading the SQL. Start a database on the current schema, seed it with production-shaped data (NULLs, edge-case strings, duplicates, varied lengths), snapshot the affected rows, run the migration, and snapshot again. The difference between the two snapshots is the thing you're actually approving. Reading the DDL only tells you the migration is syntactically valid; running it tells you what it does.
Because most migration bugs live in the data, not the syntax. UPDATE users SET full_name = first_name || ' ' || last_name is valid SQL that silently writes NULL for every row with a NULL name. ADD CONSTRAINT ... UNIQUE is valid SQL that fails on deploy if duplicates already exist. A type cast to INTEGER is valid SQL that rounds every price. The SQL reads correctly in all three cases; the defect only appears when the statement meets real-shaped data.
Performance testing uses production-scale data (millions of rows) to answer "will this migration be fast enough?"; correctness testing uses production-shaped data (realistic variety at small volume) to answer "will this migration produce the right data?" Lock duration and rewrite time scale with row count, so performance testing needs volume. Constraint violations and silent data loss depend on the shape of the data — a NULL, a duplicate, an out-of-range value — so correctness testing only needs a few hundred rows with realistic variety. The two are separate checks: performance belongs before deploy, correctness belongs in code review.
A few hundred rows with realistic variety is usually enough. You don't need 5 million rows to catch a NULL concatenation bug — you need ~500 rows that include the NULLs, duplicates, and edge-case strings production accumulates. The requirement is variety, not volume: a dataset that actually contains a duplicate email will surface a failing UNIQUE constraint, and one that contains NULL names will surface a NULL concatenation. Clean fixtures with 12 perfectly-formatted rows will surface neither.
You can, but copying production data into dev or CI carries PII and compliance exposure that often outweighs the benefit. Moving real production records into lower environments runs into data-minimization and access-control requirements under regimes like GDPR, HIPAA, PCI DSS, and SOC 2, and the dump drifts from the schema between refreshes. Generating production-shaped data from the schema gives you the same correctness signal — the NULLs, duplicates, and edge cases — without real records in the pipeline. One thing to know before you reach for it in a regulated codebase: Seedfast reads your schema (the table and column definitions, not the rows) and uses an external AI service to generate from it, so confirm that schema handling fits your own data-governance rules. See staging without production data for the full trade-off.
Seedfast reads your live schema, resolves foreign-key order, and seeds FK-valid, production-shaped data in one command, so a reviewer can run any migration against realistic variety before approving it. It connects to a database to read the schema — the structure, not your production rows — so you point it at a local review database or a throwaway CI database, not production. It's free to start with no credit card. Because it re-reads the schema on every run, the same command keeps working after the migration changes a table — you reseed, run the migration, and compare, without maintaining a fixture script that drifts.
- ALTER TABLE, 5 Million Rows, and the Deploy That Took Down the Site — the performance cousin of this article: testing migrations against production-scale data for lock duration and rewrite time
- Small Data, Big Lies: 6 Bugs Your Test Suite Will Never Catch — the volume-and-variety blind spot behind correctness bugs that 10-row test databases hide
- What is Referential Integrity? — why a migration that adds
NOT NULL REFERENCESfails on existing orphan rows, and the producer-side discipline that prevents it - Circular Foreign Key Seed — the FK-loop case that makes seeding realistic review data non-trivial, and three Postgres patterns that handle it
- Your Staging Database Is a Compliance Violation Waiting to Happen — generating realistic review data instead of copying and anonymizing production
- Get started with Seedfast — connect your database and seed production-shaped data for your next migration review
We don't approve application code by reading the source and imagining what it does. We run the tests. We look at the output. We verify behavior.
Migrations deserve the same rigor. The SQL might look correct. The intent might be sound. But the only way to know what a migration does to your data is to run it against data that looks like yours.
Seed. Migrate. Compare. It takes about 30 seconds of work, and it catches the data-shape bugs that reading the SQL alone keeps letting through.
Get Started | Documentation | Pricing
Seedfast generates realistic, FK-valid test data from your schema — so you can review what a migration does, not just what it says.