Review SQL Migrations in 30 Seconds: Seed, Migrate, Compare
You review application code with tests. You review migrations by reading SQL and hoping for the best. Here's how to actually validate what a migration does before you approve it.
It's Wednesday morning. You have three PRs to review before standup. Two are application code — you check the tests, read the logic, leave a comment about a missing null check, approve the rest. The third PR adds a migration: ALTER TABLE users ALTER COLUMN email TYPE TEXT. You read the SQL. It looks fine. You approve.
On Thursday, the migration runs against production. The users table has 4 million rows. The column type change triggers a full table rewrite. That's not the problem — the team expected that. The problem is that 23,000 rows have email addresses longer than 254 characters stored in the old VARCHAR(255) column (a data entry bug from 2023 that nobody caught). When the application code that validates email length on read suddenly gets TEXT fields with no length constraint, downstream services start choking on payloads they weren't built for.
The migration was syntactically correct. The schema change made sense. But the effect of that migration on real data was something nobody looked at. You approved a diff. You didn't approve an outcome.
The Code Review Blind Spot
Code review has a well-established loop: read the change, check the tests, verify the behavior matches the intent. When someone adds a new endpoint, you can look at the test suite and see exactly what inputs produce what outputs. When someone refactors a service, the tests prove that existing behavior is preserved.
Migrations have none of this. A typical migration PR looks like:
ALTER TABLE users ADD COLUMN full_name VARCHAR(200);
UPDATE users SET full_name = first_name || ' ' || last_name;
ALTER TABLE users DROP COLUMN first_name;
ALTER TABLE users DROP COLUMN
ALTER TABLE users ADD COLUMN full_name VARCHAR(200);
UPDATE users SET full_name = first_name || ' ' || last_name;
ALTER TABLE users DROP COLUMN first_name;
ALTER TABLE users DROP COLUMN
ALTER TABLE users ADD COLUMN full_name VARCHAR(200);
UPDATE users SET full_name = first_name || ' ' || last_name;
ALTER TABLE users DROP COLUMN first_name;
ALTER TABLE users DROP COLUMN
As a reviewer, you read this and think: "Okay, concatenating first and last name into a single column. Makes sense." You check for obvious mistakes — missing WHERE clause, wrong column name, bad syntax. You approve.
What you didn't check:
Are there rows where first_name is NULL? The concatenation produces NULL || ' ' || 'Smith' = NULL. The full_name column now has NULLs where you expected names.
Are there rows where first_name or last_name contain leading or trailing spaces? You now have " John " || " " || "Smith" = " John Smith".
Does the combined length of first_name + ' ' + last_name ever exceed 200 characters? If it does, the UPDATE silently truncates or throws — depending on your database settings.
These aren't edge cases. They're the normal state of any database that's been in production for more than six months. But you can't see them by reading SQL. You can only see them by running the migration against data that has the same shape as production.
What Reviewers Actually Need
When you review application code, you don't just read the source — you read the test output. The tests tell you: "Given this input, this is what happens." You're reviewing behavior, not just syntax.
Migration reviews should work the same way. What a reviewer actually needs is:
Before state: What does the data look like right now?
After state: What does the data look like after the migration?
Diff: What changed, and does the change match the stated intent?
If a migration says "consolidate first and last name into full_name," the reviewer should be able to see a sample of actual rows before and after. Not imagine them. See them.
The problem is that this requires data. Your local development database has 12 rows, all with perfectly formatted names like "John Smith" and "Jane Doe." That data will never reveal the NULL concatenation bug, the trailing spaces, or the truncation. It's too clean, too small, and too synthetic.
The Workflow: Seed, Migrate, Compare
The fix is a three-step workflow that takes about 30 seconds of hands-on time:
Step 1: Seed Realistic Data
Start with a database that has your current schema (before the migration) and fill it with data that has realistic variety — NULLs, edge-case strings, varying lengths, the kind of messiness that accumulates in production.
# Seed users with realistic variation
seedfast seed --scope "500 users with varied names, some with NULL first or last names, international characters"
# Seed users with realistic variation
seedfast seed --scope "500 users with varied names, some with NULL first or last names, international characters"
# Seed users with realistic variation
seedfast seed --scope "500 users with varied names, some with NULL first or last names, international characters"
Seedfast generates data that reflects real-world patterns: some names are short, some are long, some fields are NULL, unicode characters show up where they would in production. This is the critical difference from hand-crafted test fixtures — you get the variety you didn't think to test for.
Step 2: Snapshot, Migrate, Snapshot
Capture the before state, run the migration, capture the after state.
# Snapshot the before state
psql $DATABASE_URL -c "SELECT first_name, last_name FROM users LIMIT 20" > before.txt
# Run the migration
psql $DATABASE_URL -f migrations/20260226_consolidate_user_names.sql
# Snapshot the after state
psql $DATABASE_URL -c "SELECT full_name FROM users LIMIT 20" > after.txt
# Snapshot the before state
psql $DATABASE_URL -c "SELECT first_name, last_name FROM users LIMIT 20" > before.txt
# Run the migration
psql $DATABASE_URL -f migrations/20260226_consolidate_user_names.sql
# Snapshot the after state
psql $DATABASE_URL -c "SELECT full_name FROM users LIMIT 20" > after.txt
# Snapshot the before state
psql $DATABASE_URL -c "SELECT first_name, last_name FROM users LIMIT 20" > before.txt
# Run the migration
psql $DATABASE_URL -f migrations/20260226_consolidate_user_names.sql
# Snapshot the after state
psql $DATABASE_URL -c "SELECT full_name FROM users LIMIT 20" > after.txt
Step 3: Inspect the Result
Now look at what actually happened:
# Check for NULLs that shouldn't be there
psql $DATABASE_URL -c "SELECT count(*) FROM users WHERE full_name IS NULL"
# Check for unexpected whitespace
psql $DATABASE_URL -c "SELECT full_name FROM users WHERE full_name LIKE '% %' OR full_name LIKE ' %' LIMIT 10"
# Check for truncation
psql $DATABASE_URL -c "SELECT full_name, length(full_name) FROM users ORDER BY length(full_name) DESC LIMIT 5"
# Check for NULLs that shouldn't be there
psql $DATABASE_URL -c "SELECT count(*) FROM users WHERE full_name IS NULL"
# Check for unexpected whitespace
psql $DATABASE_URL -c "SELECT full_name FROM users WHERE full_name LIKE '% %' OR full_name LIKE ' %' LIMIT 10"
# Check for truncation
psql $DATABASE_URL -c "SELECT full_name, length(full_name) FROM users ORDER BY length(full_name) DESC LIMIT 5"
# Check for NULLs that shouldn't be there
psql $DATABASE_URL -c "SELECT count(*) FROM users WHERE full_name IS NULL"
# Check for unexpected whitespace
psql $DATABASE_URL -c "SELECT full_name FROM users WHERE full_name LIKE '% %' OR full_name LIKE ' %' LIMIT 10"
# Check for truncation
psql $DATABASE_URL -c "SELECT full_name, length(full_name) FROM users ORDER BY length(full_name) DESC LIMIT 5"
If the migration is correct, these queries return clean results. If it's not, you just caught a production bug from your laptop in under a minute.
Five Scenarios Where This Catches Real Bugs
1. NULL Concatenation
The migration:
UPDATE users SET full_name = first_name || ' '
UPDATE users SET full_name = first_name || ' '
UPDATE users SET full_name = first_name || ' '
The bug: Any row where first_name or last_name is NULL produces a NULL full_name. In PostgreSQL, NULL || anything = NULL.
What the reviewer sees after seeding and running:
47 users just lost their names. The fix is obvious once you see it:
UPDATE users SET full_name = COALESCE(first_name, '') || ' ' || COALESCE(last_name, '')
UPDATE users SET full_name = COALESCE(first_name, '') || ' ' || COALESCE(last_name, '')
UPDATE users SET full_name = COALESCE(first_name, '') || ' ' || COALESCE(last_name, '')
But you'd never catch this by reading the original SQL — because the SQL is syntactically valid and logically reasonable. The bug is in the data, not the code.
2. NOT NULL Constraint on a Column With Existing NULLs
The migration:
ALTER TABLE orders ALTER COLUMN shipping_address SET NOT NULL
ALTER TABLE orders ALTER COLUMN shipping_address SET NOT NULL
ALTER TABLE orders ALTER COLUMN shipping_address SET NOT NULL
The reviewer reads this and thinks: "Good, shipping address should be required." They approve. In production, 8% of orders are digital-only — no shipping address. The migration fails on deploy.
After seeding 500 orders with realistic variety:
psql $DATABASE_URL -c "SELECT count(*) FROM orders WHERE shipping_address IS NULL"
psql $DATABASE_URL -c "SELECT count(*) FROM orders WHERE shipping_address IS NULL"
psql $DATABASE_URL -c "SELECT count(*) FROM orders WHERE shipping_address IS NULL"
The reviewer now knows: this migration needs a backfill step first, or it needs to handle digital orders differently.
3. Column Type Change That Loses Data
The migration:
ALTER TABLE products ALTER COLUMN price TYPE INTEGER USING price::
ALTER TABLE products ALTER COLUMN price TYPE INTEGER USING price::
ALTER TABLE products ALTER COLUMN price TYPE INTEGER USING price::
The intent: "We don't need decimal precision, let's simplify." The reviewer checks the cast syntax — looks correct. Approved.
After seeding:
price
29.99
14.50
99.95
7.25
149.99
price
29.99
14.50
99.95
7.25
149.99
price
29.99
14.50
99.95
7.25
149.99
Every one of those prices will be rounded. $29.99 becomes $29. Across thousands of products, financial reports will be off by thousands of dollars. The reviewer catches this instantly by looking at the data, not the DDL.
4. DEFAULT Value That Doesn't Match Business Logic
The migration:
ALTER TABLE users ADD COLUMN role VARCHAR(20) NOT NULL DEFAULT 'user'
ALTER TABLE users ADD COLUMN role VARCHAR(20) NOT NULL DEFAULT 'user'
ALTER TABLE users ADD COLUMN role VARCHAR(20) NOT NULL DEFAULT 'user'
Seems fine. But after seeding and inspecting existing rows:
role | count
user | 412
admin | 23
manager | 65
role | count
user | 412
admin | 23
manager | 65
role | count
user | 412
admin | 23
manager | 65
Wait — all existing rows now have role = 'user', including the 23 admins and 65 managers. If the application was previously storing role information in a different table or field, this migration just demoted every admin to a regular user. The DEFAULT applies to existing rows because the column is new, and every row gets the default.
The fix: add the column as nullable first, backfill from the authoritative source, then add the NOT NULL constraint.
5. Unique Constraint on Non-Unique Data
The migration:
ALTER TABLE customers ADD CONSTRAINT unique_email UNIQUE (email)
ALTER TABLE customers ADD CONSTRAINT unique_email UNIQUE (email)
ALTER TABLE customers ADD CONSTRAINT unique_email UNIQUE (email)
After seeding:
email | count
john.smith@example.com | 3
info@company.org | 2
email | count
john.smith@example.com | 3
info@company.org | 2
email | count
john.smith@example.com | 3
info@company.org | 2
The migration will fail on deploy. Duplicate emails exist — maybe from a legacy import, maybe from a bulk account creation that didn't enforce uniqueness, maybe from a bug that was fixed years ago but left dirty data behind. The constraint is correct, but the data isn't ready for it.
Making It Repeatable: The Review Script
Wrap the workflow into a script that any reviewer can run:
#!/bin/bash
# review-migration.sh — Validate a migration against realistic data
set -e
MIGRATION_FILE=$1
DATABASE_URL=${DATABASE_URL:-"postgres://dev:dev@localhost:5432/migration_review"}
if [ -z "$MIGRATION_FILE" ]; then
echo "Usage: ./review-migration.sh <migration-file>"
exit 1
fi
echo "=== Step 1: Reset database ==="
dropdb --if-exists migration_review
createdb migration_review
echo "=== Step 2: Apply base schema ==="
for f in migrations#!/bin/bash
# review-migration.sh — Validate a migration against realistic data
set -e
MIGRATION_FILE=$1
DATABASE_URL=${DATABASE_URL:-"postgres://dev:dev@localhost:5432/migration_review"}
if [ -z "$MIGRATION_FILE" ]; then
echo "Usage: ./review-migration.sh <migration-file>"
exit 1
fi
echo "=== Step 1: Reset database ==="
dropdb --if-exists migration_review
createdb migration_review
echo "=== Step 2: Apply base schema ==="
for f in migrations#!/bin/bash
# review-migration.sh — Validate a migration against realistic data
set -e
MIGRATION_FILE=$1
DATABASE_URL=${DATABASE_URL:-"postgres://dev:dev@localhost:5432/migration_review"}
if [ -z "$MIGRATION_FILE" ]; then
echo "Usage: ./review-migration.sh <migration-file>"
exit 1
fi
echo "=== Step 1: Reset database ==="
dropdb --if-exists migration_review
createdb migration_review
echo "=== Step 2: Apply base schema ==="
for f in migrationsNow any reviewer can run ./review-migration.sh migrations/20260226_consolidate_user_names.sql and see the actual effect of the migration before approving the PR.
CI/CD: Automated Migration Validation in PR Checks
Don't rely on reviewers remembering to run the script. Make it part of your PR pipeline.
name: Migration Review Check
on:
pull_request:
paths:
- 'migrations/**'
jobs:
validate-migration:
runs-on: ubuntu-latest
services:
postgres:
image: postgres:16
env:
POSTGRES_DB: review
POSTGRES_USER: review
POSTGRES_PASSWORD: review
ports:
- 5432:5432
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Apply base schema
run: |
for f in migrations/*.sql; do
psql $DATABASE_URL -f "$f" 2>/dev/null || true
done
env:
DATABASE_URL: postgres://review:review@localhost:5432/review
- name: Seed realistic data
run: seedfast seed --scope "Realistic dataset across all tables, include NULLs and varied data" --output plain
env:
SEEDFAST_API_KEY: ${{ secrets.SEEDFAST_API_KEY }}
DATABASE_URL: postgres://review:review@localhost:5432/review
- name: Apply new migration(s)
run: |
NEW_MIGRATIONS=$(git diff --name-only origin/main -- migrations/ | sort)
for f in $NEW_MIGRATIONS; do
echo "--- Applying: $f ---"
psql $DATABASE_URL -f "$f"
done
env:
DATABASE_URL: postgres://review:review@localhost:5432/review
- name: Validate post-migration state
run: |
echo "=== Row counts after migration ==="
for table in $(psql $DATABASE_URL -t -c "SELECT tablename FROM pg_tables WHERE schemaname='public'"); do
psql $DATABASE_URL -c "SELECT '$table' as table_name, count(*) as rows FROM $table"
done
echo "=== Check for unexpected NULLs ==="
psql $DATABASE_URL -c "
SELECT c.table_name, c.column_name, c.is_nullable,
(SELECT count(*) FROM information_schema.columns) as context
FROM information_schema.columns c
WHERE c.table_schema = 'public'
ORDER BY c.table_name, c.ordinal_position;
"
env:
DATABASE_URL
name: Migration Review Check
on:
pull_request:
paths:
- 'migrations/**'
jobs:
validate-migration:
runs-on: ubuntu-latest
services:
postgres:
image: postgres:16
env:
POSTGRES_DB: review
POSTGRES_USER: review
POSTGRES_PASSWORD: review
ports:
- 5432:5432
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Apply base schema
run: |
for f in migrations/*.sql; do
psql $DATABASE_URL -f "$f" 2>/dev/null || true
done
env:
DATABASE_URL: postgres://review:review@localhost:5432/review
- name: Seed realistic data
run: seedfast seed --scope "Realistic dataset across all tables, include NULLs and varied data" --output plain
env:
SEEDFAST_API_KEY: ${{ secrets.SEEDFAST_API_KEY }}
DATABASE_URL: postgres://review:review@localhost:5432/review
- name: Apply new migration(s)
run: |
NEW_MIGRATIONS=$(git diff --name-only origin/main -- migrations/ | sort)
for f in $NEW_MIGRATIONS; do
echo "--- Applying: $f ---"
psql $DATABASE_URL -f "$f"
done
env:
DATABASE_URL: postgres://review:review@localhost:5432/review
- name: Validate post-migration state
run: |
echo "=== Row counts after migration ==="
for table in $(psql $DATABASE_URL -t -c "SELECT tablename FROM pg_tables WHERE schemaname='public'"); do
psql $DATABASE_URL -c "SELECT '$table' as table_name, count(*) as rows FROM $table"
done
echo "=== Check for unexpected NULLs ==="
psql $DATABASE_URL -c "
SELECT c.table_name, c.column_name, c.is_nullable,
(SELECT count(*) FROM information_schema.columns) as context
FROM information_schema.columns c
WHERE c.table_schema = 'public'
ORDER BY c.table_name, c.ordinal_position;
"
env:
DATABASE_URL
name: Migration Review Check
on:
pull_request:
paths:
- 'migrations/**'
jobs:
validate-migration:
runs-on: ubuntu-latest
services:
postgres:
image: postgres:16
env:
POSTGRES_DB: review
POSTGRES_USER: review
POSTGRES_PASSWORD: review
ports:
- 5432:5432
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Apply base schema
run: |
for f in migrations/*.sql; do
psql $DATABASE_URL -f "$f" 2>/dev/null || true
done
env:
DATABASE_URL: postgres://review:review@localhost:5432/review
- name: Seed realistic data
run: seedfast seed --scope "Realistic dataset across all tables, include NULLs and varied data" --output plain
env:
SEEDFAST_API_KEY: ${{ secrets.SEEDFAST_API_KEY }}
DATABASE_URL: postgres://review:review@localhost:5432/review
- name: Apply new migration(s)
run: |
NEW_MIGRATIONS=$(git diff --name-only origin/main -- migrations/ | sort)
for f in $NEW_MIGRATIONS; do
echo "--- Applying: $f ---"
psql $DATABASE_URL -f "$f"
done
env:
DATABASE_URL: postgres://review:review@localhost:5432/review
- name: Validate post-migration state
run: |
echo "=== Row counts after migration ==="
for table in $(psql $DATABASE_URL -t -c "SELECT tablename FROM pg_tables WHERE schemaname='public'"); do
psql $DATABASE_URL -c "SELECT '$table' as table_name, count(*) as rows FROM $table"
done
echo "=== Check for unexpected NULLs ==="
psql $DATABASE_URL -c "
SELECT c.table_name, c.column_name, c.is_nullable,
(SELECT count(*) FROM information_schema.columns) as context
FROM information_schema.columns c
WHERE c.table_schema = 'public'
ORDER BY c.table_name, c.ordinal_position;
"
env:
DATABASE_URL
When a PR adds a migration, this pipeline seeds realistic data, runs the migration, and reports the outcome. Reviewers get concrete evidence in the PR checks instead of relying on their ability to mentally execute SQL.
The Difference Between Performance Testing and Correctness Testing
There's an important distinction here. Testing migrations against production-scale data (millions of rows) answers the question: "Will this migration be fast enough?" That's about lock duration, rewrite time, and index build speed.
Testing migrations against production-shaped data (realistic variety, NULLs, edge cases, messy strings) answers a different question: "Will this migration produce correct results?" That's about data integrity, constraint violations, and silent data loss.
You don't need 5 million rows to catch a NULL concatenation bug. You need 500 rows with realistic NULL patterns. You don't need production scale to discover that a unique constraint will fail on duplicates. You need a dataset that contains duplicates.
This article is about the second kind of testing. It's about giving reviewers the ability to see what a migration does to data, not just how long it takes. Both are important. Both require realistic data. But they answer different questions, and they fit into different parts of the development process.
Performance testing happens before deploy. Correctness testing happens during code review.
A Migration Reviewer's Checklist
Before approving any PR that includes a schema migration:
Data integrity
Seed realistic data and run the migration locally (or check the CI results)
Verify no unexpected NULLs were introduced
Verify no data was silently truncated or rounded
Verify row counts are unchanged (unless rows are intentionally removed)
Check that new DEFAULT values make sense for existing rows, not just new ones
Constraint safety
If adding NOT NULL: confirm no existing rows violate the constraint
If adding UNIQUE: confirm no existing duplicates
If adding a FOREIGN KEY: confirm all referenced rows exist
If changing a column type: confirm all existing values fit the new type
Behavioral correctness
String operations handle NULLs (use COALESCE or explicit NULL checks)
Backfill queries produce correct results on a sample of real-shaped data
The migration is idempotent or has a working rollback
Review process
PR includes what the migration does AND why (not just the SQL)
If the migration is multi-step, each step's purpose is documented
CI migration validation check passed
Stop Approving Migrations You Haven't Tested
We don't approve application code by reading the source and imagining what it does. We run the tests. We look at the output. We verify behavior.
Migrations deserve the same rigor. The SQL might look correct. The intent might be sound. But the only way to know what a migration does to your data is to run it against data that looks like yours.
Seed. Migrate. Compare. It takes 30 seconds and it catches the bugs that code review alone never will.