Seedfast – Documentation | Don't ALTER TABLE on 5M Rows Blind

Search Docs…

Don't ALTER TABLE on 5M Rows Blind

ALTER TABLE, 5 Million Rows, and the Deploy That Took Down the Site

A migration that takes 50ms on your dev database can lock a production table for 20 minutes. Here's how to find out before your users do.

It was a Thursday afternoon deploy. The migration looked harmless: add a NOT NULL column with a default value to the orders table. It passed review. It ran fine on staging. The deploy pipeline showed green.

Then production went quiet. Not the good kind of quiet. The orders table — 8 million rows — was locked. Every API endpoint that touched orders started queuing. Load balancers began returning 502s. The on-call Slack channel lit up. Forty minutes later, the migration finished, the locks released, and the incident retro began.

The migration wasn't wrong. The schema change was exactly what the team needed. The problem was that nobody tested it against 8 million rows before running it against 8 million rows.

The Migration Time Bomb

Schema migrations have a property that most code changes don't: their execution time is a function of data volume. A migration that alters a table's structure doesn't just update metadata — on many operations, PostgreSQL has to rewrite or scan every row.

This creates a class of problems that are completely invisible in development:

Operation	100 rows	1M rows	5M rows
`ADD COLUMN ... DEFAULT` (pre-PG 11)	< 1ms	8s	42s
`ALTER COLUMN TYPE` (varchar to text)	< 1ms	12s	65s
`CREATE INDEX` (single column)	< 1ms	4s	22s
`ADD COLUMN NOT NULL DEFAULT` (PG 11+)	< 1ms	< 1ms	< 1ms
`CREATE INDEX CONCURRENTLY`	< 1ms	6s	35s

That fourth row is interesting — PostgreSQL 11 made ADD COLUMN ... DEFAULT a metadata-only operation. But the other operations still rewrite or scan the table. And even "fast" operations have nuances that bite at scale.

The point: your dev database with 50 rows will never tell you which category your migration falls into.

5 Migrations That Look Harmless (Until They're Not)

1. Adding a NOT NULL Column With Backfill

-- Step 1: Looks fine
ALTER TABLE orders ADD COLUMN region VARCHAR(50);

-- Step 2: The backfill that locks the table
UPDATE orders SET region = 'us-east-1' WHERE region IS NULL;

-- Step 3: Now add the constraint
ALTER TABLE orders ALTER COLUMN region SET NOT NULL

-- Step 1: Looks fine
ALTER TABLE orders ADD COLUMN region VARCHAR(50);

-- Step 2: The backfill that locks the table
UPDATE orders SET region = 'us-east-1' WHERE region IS NULL;

-- Step 3: Now add the constraint
ALTER TABLE orders ALTER COLUMN region SET NOT NULL

-- Step 1: Looks fine
ALTER TABLE orders ADD COLUMN region VARCHAR(50);

-- Step 2: The backfill that locks the table
UPDATE orders SET region = 'us-east-1' WHERE region IS NULL;

-- Step 3: Now add the constraint
ALTER TABLE orders ALTER COLUMN region SET NOT NULL

Step 2 is the problem. A single UPDATE touching 5 million rows acquires row locks across the entire table. Concurrent writes block. Auto-vacuum can't run. If you have foreign key references, those tables might lock too.

At scale: A 5M-row backfill UPDATE can take 30+ seconds and block all concurrent writes to the table for the duration.

The safer pattern:

-- Batch the backfill
UPDATE orders SET region = 'us-east-1'
WHERE id IN (SELECT id FROM orders WHERE region IS NULL LIMIT 10000);
-- Repeat until done

-- Batch the backfill
UPDATE orders SET region = 'us-east-1'
WHERE id IN (SELECT id FROM orders WHERE region IS NULL LIMIT 10000);
-- Repeat until done

-- Batch the backfill
UPDATE orders SET region = 'us-east-1'
WHERE id IN (SELECT id FROM orders WHERE region IS NULL LIMIT 10000);
-- Repeat until done

But you won't know you need the safer pattern unless you've tested the naive one against real data volumes.

2. Creating an Index on a Large Table

-- Acquires a SHARE lock — blocks writes for the entire build
CREATE INDEX idx_orders_customer_id ON orders (customer_id)

-- Acquires a SHARE lock — blocks writes for the entire build
CREATE INDEX idx_orders_customer_id ON orders (customer_id)

-- Acquires a SHARE lock — blocks writes for the entire build
CREATE INDEX idx_orders_customer_id ON orders (customer_id)

On 100 rows, this is instant. On 10 million rows, this holds a SHARE lock for the duration of the index build. Every INSERT, UPDATE, and DELETE on that table queues behind it.

The safer pattern:

-- Builds the index without blocking writes
CREATE INDEX CONCURRENTLY idx_orders_customer_id ON orders (customer_id)

-- Builds the index without blocking writes
CREATE INDEX CONCURRENTLY idx_orders_customer_id ON orders (customer_id)

-- Builds the index without blocking writes
CREATE INDEX CONCURRENTLY idx_orders_customer_id ON orders (customer_id)

CONCURRENTLY doesn't block writes, but it takes roughly 2-3x longer and can fail if there are concurrent schema changes. It also can't run inside a transaction block — which means most migration frameworks need special handling.

3. Changing a Column Type

-- "Just" widening a varchar field
ALTER TABLE users ALTER COLUMN name TYPE VARCHAR(500)

-- "Just" widening a varchar field
ALTER TABLE users ALTER COLUMN name TYPE VARCHAR(500)

-- "Just" widening a varchar field
ALTER TABLE users ALTER COLUMN name TYPE VARCHAR(500)

This looks like a metadata change. It isn't. PostgreSQL rewrites the entire table when you change a column type — even when the new type is strictly wider. For a 5M-row table, that's a full table rewrite while holding an ACCESS EXCLUSIVE lock. Nothing reads. Nothing writes. The table is gone from the application's perspective until the rewrite finishes.

The safer pattern: Add a new column, backfill in batches, swap with a rename, drop the old column. More steps, but no extended lock.

4. Adding a Foreign Key Constraint

ALTER TABLE orders ADD CONSTRAINT fk_orders_customer
  FOREIGN KEY (customer_id) REFERENCES customers (id)

ALTER TABLE orders ADD CONSTRAINT fk_orders_customer
  FOREIGN KEY (customer_id) REFERENCES customers (id)

ALTER TABLE orders ADD CONSTRAINT fk_orders_customer
  FOREIGN KEY (customer_id) REFERENCES customers (id)

Adding a foreign key requires PostgreSQL to validate every existing row. On 5 million orders, that's a full table scan while holding a lock on both orders and customers. If customers is also a large table, you've just locked two critical tables simultaneously.

The safer pattern:

-- Add the constraint without validating existing rows
ALTER TABLE orders ADD CONSTRAINT fk_orders_customer
  FOREIGN KEY (customer_id) REFERENCES customers (id) NOT VALID;

-- Validate in a separate step (holds a weaker lock)
ALTER TABLE orders VALIDATE CONSTRAINT

-- Add the constraint without validating existing rows
ALTER TABLE orders ADD CONSTRAINT fk_orders_customer
  FOREIGN KEY (customer_id) REFERENCES customers (id) NOT VALID;

-- Validate in a separate step (holds a weaker lock)
ALTER TABLE orders VALIDATE CONSTRAINT

-- Add the constraint without validating existing rows
ALTER TABLE orders ADD CONSTRAINT fk_orders_customer
  FOREIGN KEY (customer_id) REFERENCES customers (id) NOT VALID;

-- Validate in a separate step (holds a weaker lock)
ALTER TABLE orders VALIDATE CONSTRAINT

NOT VALID adds the constraint for new rows immediately, then VALIDATE checks existing rows with a less aggressive lock. But again — you need real data to know whether the VALIDATE step takes 2 seconds or 20 minutes.

5. Dropping a Column (Yes, Really)

ALTER TABLE orders DROP COLUMN

ALTER TABLE orders DROP COLUMN

ALTER TABLE orders DROP COLUMN

In PostgreSQL, DROP COLUMN doesn't actually rewrite the table — it marks the column as invisible. Fast, right? Usually. But it still acquires an ACCESS EXCLUSIVE lock. If there are long-running queries reading from the table, the DROP waits for them to finish. While it waits, every new query queues behind it. A 1ms metadata operation can block the table for minutes if there's a slow SELECT running.

At scale: The lock acquisition time becomes unpredictable based on concurrent query workload — something you can only test with realistic data and realistic query patterns.

The Seedfast Workflow: Seed, Migrate, Measure

The fix is straightforward: run your migration against production-scale data before running it in production.

Step 1: Seed Production-Scale Data

# Match your production table sizes
seedfast seed --scope "seed 5 million orders with customers, order items, and payments"

# Match your production table sizes
seedfast seed --scope "seed 5 million orders with customers, order items, and payments"

# Match your production table sizes
seedfast seed --scope "seed 5 million orders with customers, order items, and payments"

Seedfast analyzes your schema, resolves foreign keys, and generates realistic data with proper distributions. You get the right proportions automatically:

Seeding Plan:
  public.customers    — 500,000 records
  public.orders       — 5,000,000 records
  public.order_items  — 12,000,000 records
  public.payments     — 4,800,000 records

Total: 22,300,000 records across 4 tables

Approve? (Y/n)

Seeding Plan:
  public.customers    — 500,000 records
  public.orders       — 5,000,000 records
  public.order_items  — 12,000,000 records
  public.payments     — 4,800,000 records

Total: 22,300,000 records across 4 tables

Approve? (Y/n)

Seeding Plan:
  public.customers    — 500,000 records
  public.orders       — 5,000,000 records
  public.order_items  — 12,000,000 records
  public.payments     — 4,800,000 records

Total: 22,300,000 records across 4 tables

Approve? (Y/n)

Step 2: Run the Migration and Measure

# Time the migration
time psql $DATABASE_URL -f migrations/20260225_add_region_column.sql

# Or with your migration framework
time flyway migrate
time rails db:migrate
time alembic upgrade head

# Time the migration
time psql $DATABASE_URL -f migrations/20260225_add_region_column.sql

# Or with your migration framework
time flyway migrate
time rails db:migrate
time alembic upgrade head

# Time the migration
time psql $DATABASE_URL -f migrations/20260225_add_region_column.sql

# Or with your migration framework
time flyway migrate
time rails db:migrate
time alembic upgrade head

Step 3: Check for Lock Contention

While the migration runs, open another terminal and monitor locks:

-- See what's locked and what's waiting
SELECT
  blocked.pid AS blocked_pid,
  blocked_activity.query AS blocked_query,
  blocking.pid AS blocking_pid,
  blocking_activity.query AS blocking_query,
  now() - blocked_activity.query_start AS waiting_time
FROM pg_catalog.pg_locks blocked
JOIN pg_catalog.pg_stat_activity blocked_activity ON blocked.pid = blocked_activity.pid
JOIN pg_catalog.pg_locks blocking ON blocked.transactionid = blocking.transactionid AND blocked.pid != blocking.pid
JOIN pg_catalog.pg_stat_activity blocking_activity ON

-- See what's locked and what's waiting
SELECT
  blocked.pid AS blocked_pid,
  blocked_activity.query AS blocked_query,
  blocking.pid AS blocking_pid,
  blocking_activity.query AS blocking_query,
  now() - blocked_activity.query_start AS waiting_time
FROM pg_catalog.pg_locks blocked
JOIN pg_catalog.pg_stat_activity blocked_activity ON blocked.pid = blocked_activity.pid
JOIN pg_catalog.pg_locks blocking ON blocked.transactionid = blocking.transactionid AND blocked.pid != blocking.pid
JOIN pg_catalog.pg_stat_activity blocking_activity ON

-- See what's locked and what's waiting
SELECT
  blocked.pid AS blocked_pid,
  blocked_activity.query AS blocked_query,
  blocking.pid AS blocking_pid,
  blocking_activity.query AS blocking_query,
  now() - blocked_activity.query_start AS waiting_time
FROM pg_catalog.pg_locks blocked
JOIN pg_catalog.pg_stat_activity blocked_activity ON blocked.pid = blocked_activity.pid
JOIN pg_catalog.pg_locks blocking ON blocked.transactionid = blocking.transactionid AND blocked.pid != blocking.pid
JOIN pg_catalog.pg_stat_activity blocking_activity ON

If you see queries stacking up behind your migration, that's exactly what will happen in production — except with real user traffic behind them.

Step 4: Benchmark the Safe Alternative

If the naive migration locks the table for too long, implement the safer pattern and measure again:

# Reseed a fresh database (or use a separate test database)
seedfast seed --scope "seed 5 million orders with customers"

# Run the batched migration
time psql $DATABASE_URL -f migrations/20260225_add_region_column_safe.sql

# Reseed a fresh database (or use a separate test database)
seedfast seed --scope "seed 5 million orders with customers"

# Run the batched migration
time psql $DATABASE_URL -f migrations/20260225_add_region_column_safe.sql

# Reseed a fresh database (or use a separate test database)
seedfast seed --scope "seed 5 million orders with customers"

# Run the batched migration
time psql $DATABASE_URL -f migrations/20260225_add_region_column_safe.sql

Now you have concrete numbers: "The naive migration locks orders for 38 seconds. The batched version takes 2 minutes total but never holds a lock for more than 200ms."

That's the data your team needs to make the right call.

CI/CD: Automated Migration Benchmarking

Don't make this a manual exercise. Put it in your pipeline.

name: Migration Benchmark

on:
  pull_request:
    paths:
      - 'migrations/**'

jobs:
  benchmark-migration:
    runs-on: ubuntu-latest

    services:
      postgres:
        image: postgres:16
        env:
          POSTGRES_DB: bench
          POSTGRES_USER: bench
          POSTGRES_PASSWORD: bench
        ports:
          - 5432:5432

    steps:
      - uses: actions/checkout@v4

      - name: Apply base schema
        run: |
          for f in migrations/*.sql; do
            psql $DATABASE_URL -f "$f" 2>/dev/null || true
          done
        env:
          DATABASE_URL: postgres://bench:bench@localhost:5432/bench

      - name: Seed test data
        run: seedfast seed --scope "seed 1 million orders with customers and payments" --output plain
        env:
          SEEDFAST_API_KEY: ${{ secrets.SEEDFAST_API_KEY }}
          DATABASE_URL: postgres://bench:bench@localhost:5432/bench

      - name: Benchmark new migration
        run: |
          NEW_MIGRATIONS=$(git diff --name-only origin/main -- migrations/ | sort)
          for f in $NEW_MIGRATIONS; do
            echo "--- Benchmarking: $f ---"
            time psql $DATABASE_URL -f "$f"
          done
        env:
          DATABASE_URL: postgres://bench:bench@localhost:5432/bench

      - name: Check migration duration
        run: |
          echo "Review migration timing above. Migrations over 30s need batching or CONCURRENTLY."

name: Migration Benchmark

on:
  pull_request:
    paths:
      - 'migrations/**'

jobs:
  benchmark-migration:
    runs-on: ubuntu-latest

    services:
      postgres:
        image: postgres:16
        env:
          POSTGRES_DB: bench
          POSTGRES_USER: bench
          POSTGRES_PASSWORD: bench
        ports:
          - 5432:5432

    steps:
      - uses: actions/checkout@v4

      - name: Apply base schema
        run: |
          for f in migrations/*.sql; do
            psql $DATABASE_URL -f "$f" 2>/dev/null || true
          done
        env:
          DATABASE_URL: postgres://bench:bench@localhost:5432/bench

      - name: Seed test data
        run: seedfast seed --scope "seed 1 million orders with customers and payments" --output plain
        env:
          SEEDFAST_API_KEY: ${{ secrets.SEEDFAST_API_KEY }}
          DATABASE_URL: postgres://bench:bench@localhost:5432/bench

      - name: Benchmark new migration
        run: |
          NEW_MIGRATIONS=$(git diff --name-only origin/main -- migrations/ | sort)
          for f in $NEW_MIGRATIONS; do
            echo "--- Benchmarking: $f ---"
            time psql $DATABASE_URL -f "$f"
          done
        env:
          DATABASE_URL: postgres://bench:bench@localhost:5432/bench

      - name: Check migration duration
        run: |
          echo "Review migration timing above. Migrations over 30s need batching or CONCURRENTLY."

name: Migration Benchmark

on:
  pull_request:
    paths:
      - 'migrations/**'

jobs:
  benchmark-migration:
    runs-on: ubuntu-latest

    services:
      postgres:
        image: postgres:16
        env:
          POSTGRES_DB: bench
          POSTGRES_USER: bench
          POSTGRES_PASSWORD: bench
        ports:
          - 5432:5432

    steps:
      - uses: actions/checkout@v4

      - name: Apply base schema
        run: |
          for f in migrations/*.sql; do
            psql $DATABASE_URL -f "$f" 2>/dev/null || true
          done
        env:
          DATABASE_URL: postgres://bench:bench@localhost:5432/bench

      - name: Seed test data
        run: seedfast seed --scope "seed 1 million orders with customers and payments" --output plain
        env:
          SEEDFAST_API_KEY: ${{ secrets.SEEDFAST_API_KEY }}
          DATABASE_URL: postgres://bench:bench@localhost:5432/bench

      - name: Benchmark new migration
        run: |
          NEW_MIGRATIONS=$(git diff --name-only origin/main -- migrations/ | sort)
          for f in $NEW_MIGRATIONS; do
            echo "--- Benchmarking: $f ---"
            time psql $DATABASE_URL -f "$f"
          done
        env:
          DATABASE_URL: postgres://bench:bench@localhost:5432/bench

      - name: Check migration duration
        run: |
          echo "Review migration timing above. Migrations over 30s need batching or CONCURRENTLY."

Now every PR that adds a migration gets benchmarked against realistic data. No more "it ran fine on staging" — because staging had 200 rows.

For Smaller Teams

Even without full CI integration, you can add a one-liner to your PR template:

## Migration Checklist
- [ ] Tested against production-scale data (`seedfast seed --scope "..."`)
- [ ] Migration completes in under 30 seconds on 1M+ rows
- [ ] No ACCESS EXCLUSIVE locks held for more than 5 seconds
- [ ] Uses CONCURRENTLY for index creation (if applicable)
- [ ] Backfills are batched (if applicable)

## Migration Checklist
- [ ] Tested against production-scale data (`seedfast seed --scope "..."`)
- [ ] Migration completes in under 30 seconds on 1M+ rows
- [ ] No ACCESS EXCLUSIVE locks held for more than 5 seconds
- [ ] Uses CONCURRENTLY for index creation (if applicable)
- [ ] Backfills are batched (if applicable)

## Migration Checklist
- [ ] Tested against production-scale data (`seedfast seed --scope "..."`)
- [ ] Migration completes in under 30 seconds on 1M+ rows
- [ ] No ACCESS EXCLUSIVE locks held for more than 5 seconds
- [ ] Uses CONCURRENTLY for index creation (if applicable)
- [ ] Backfills are batched (if applicable)

PostgreSQL Migration Gotchas Worth Knowing

A few additional things that catch teams off guard at scale:

Transaction-wrapped migrations lock longer than you think. Most frameworks run each migration inside a transaction. That means the lock acquired at the start isn't released until the entire migration file finishes — including any backfills, constraint validations, or index builds inside the same transaction.

CONCURRENTLY can't run in a transaction. If your framework wraps migrations in transactions (Rails, Flyway, Alembic by default), CREATE INDEX CONCURRENTLY will fail. You need to configure your framework to run specific migrations outside a transaction block.

pg_repack is your friend for zero-downtime rewrites. When you need to rewrite a table (e.g., changing a column type), pg_repack can do it with minimal locking by rebuilding the table in the background. But you need to test it against real data volumes first — because pg_repack on a 50M-row table still takes significant time and I/O.

Autovacuum matters. A large UPDATE for backfill generates dead tuples. If autovacuum can't keep up, table bloat causes subsequent queries to slow down. Test the migration and then check table bloat with pg_stat_user_tables.

Before You Run That Migration in Production

Checklist for any migration touching a table with more than 100K rows:

Seed production-scale data locally

seedfast seed --scope "seed [your production row count] [table] with related records"

seedfast seed --scope "seed [your production row count] [table] with related records"

seedfast seed --scope "seed [your production row count] [table] with related records"

Time the migration. If it takes more than 10 seconds, consider batching or alternative approaches.
Monitor locks during execution. An ACCESS EXCLUSIVE lock held for more than a few seconds will impact production traffic.
Test the rollback too. A 30-second migration with a 5-minute rollback is a risky deploy.
Run it during low traffic if the lock duration is unavoidable. But know exactly how long "unavoidable" is — in seconds, not "it should be quick."
Check CONCURRENTLY support in your migration framework for index operations.
Validate constraints separately using NOT VALID + VALIDATE CONSTRAINT for foreign keys and check constraints.
Batch backfills — never UPDATE millions of rows in a single statement.

The common thread: every item on this list requires production-scale data to validate. You can't measure lock duration on 50 rows. You can't benchmark a backfill on an empty table. You can't discover that CONCURRENTLY fails inside your framework's transaction wrapper without actually running it.

Stop discovering migration problems in production. Discover them on your laptop.

Get Started | Documentation | Pricing

Seedfast generates production-scale test data from your schema description. Seed millions of rows, test your migration, and deploy with confidence.