Load Testing With an Empty Database? Here's Your Problem
Your k6 scripts are flawless. Your Gatling scenarios are tuned. Your Locust swarm is ready. But your database has 47 rows — and every result you're about to collect is fiction.
You spent two weeks building a load testing suite. The scenarios cover login, search, checkout, reporting. You've modeled think times, ramp-up curves, and error thresholds. You run it. The results look great: P99 under 200ms, zero errors, 3,000 requests per second. You present the numbers to the team, green-light the release, and deploy to production.
Monday morning, the dashboard is on fire. Search takes 8 seconds. The reporting endpoint is timing out. The checkout flow that handled 3,000 RPS in your test can barely sustain 400 in production.
Your load test wasn't wrong. Your database was.
The Empty Database Illusion
Here's a dirty secret of performance engineering: most load tests run against databases with trivially small datasets. The reasons are understandable — nobody wants to spend a week building a data generation pipeline when the actual goal is testing application performance. So teams take shortcuts:
Use the default dev database with 20 rows per table
Run migrations and insert a handful of fixture records
Copy a small CSV extract from production
Skip data setup entirely and hope the application layer is the bottleneck
The problem is that databases don't behave the same way at 50 rows and 5 million rows. Not approximately the same — fundamentally different. Query planners make different decisions. Indexes that were ignored become critical. Disk I/O patterns change. Buffer cache hit rates collapse. Connection pool behavior shifts. Join strategies flip from nested loops to hash joins.
When you load test against an empty database, you're measuring the performance of a system that doesn't exist in production.
What Changes at Scale: A Technical Walkthrough
Query Plans Are Data-Dependent
PostgreSQL's query planner uses table statistics — row counts, value distributions, null fractions — to choose execution plans. With 100 rows, almost every query gets a sequential scan because the entire table fits in a single page. The planner correctly decides that an index lookup would be slower.
At 10 million rows, the same query gets an index scan, a bitmap heap scan, or a parallel sequential scan — depending on selectivity. And if your indexes are missing, you get a sequential scan that reads gigabytes instead of kilobytes.
-- With 100 rows:Seq Scan,0.2ms(entire table is one page)EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 42;
-- With 10M rows,no index:Seq Scan,4,200ms
-- With 10M rows,withindex:Index Scan,0.3ms
-- With 100 rows:Seq Scan,0.2ms(entire table is one page)EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 42;
-- With 10M rows,no index:Seq Scan,4,200ms
-- With 10M rows,withindex:Index Scan,0.3ms
-- With 100 rows:Seq Scan,0.2ms(entire table is one page)EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 42;
-- With 10M rows,no index:Seq Scan,4,200ms
-- With 10M rows,withindex:Index Scan,0.3ms
Your load test at 100 rows will show 0.2ms. Production with 10M rows and a missing index will show 4,200ms. You've measured nothing useful.
Buffer Cache Behavior Changes Completely
PostgreSQL's shared buffer cache works well when the working set fits in memory. With a small database, everything is cached. Every query hits memory. Latency is sub-millisecond. Throughput is limited only by CPU.
At production scale, the working set exceeds cache size. Queries start hitting disk. Latency spikes. Throughput drops. And it doesn't degrade linearly — it falls off a cliff when the cache hit rate drops below a threshold.
Database size:50MB → Cache hit rate:99.9% → Avg query:0.5msDatabase size:5GB → Cache hit rate:97% → Avg query:2msDatabase size:50GB → Cache hit rate:78% → Avg query:45ms
Database size:50MB → Cache hit rate:99.9% → Avg query:0.5msDatabase size:5GB → Cache hit rate:97% → Avg query:2msDatabase size:50GB → Cache hit rate:78% → Avg query:45ms
Database size:50MB → Cache hit rate:99.9% → Avg query:0.5msDatabase size:5GB → Cache hit rate:97% → Avg query:2msDatabase size:50GB → Cache hit rate:78% → Avg query:45ms
The 0.5ms you measured in your load test becomes 45ms in production. Your P99 of 200ms becomes P99 of 2,000ms. Your capacity model is off by an order of magnitude.
Connection Pool Saturation
With a small database, every query completes in microseconds. Connections return to the pool almost instantly. A pool of 20 connections can handle thousands of concurrent requests because each connection is occupied for a tiny fraction of a second.
At production scale, queries take longer. Connections stay checked out for milliseconds instead of microseconds. The pool starts queuing. Latency increases. The longer latency causes more concurrent requests to stack up, which increases pool pressure further. The system enters a feedback loop that doesn't exist at small scale.
Small DB:Query time 0.5ms → 20connections handle 40,000req/sLarge DB:Query time 50ms → 20connections handle 400req/s→ Requests start queuing at 400req/s→ Queue adds latency → more queuing → cascade
Small DB:Query time 0.5ms → 20connections handle 40,000req/sLarge DB:Query time 50ms → 20connections handle 400req/s→ Requests start queuing at 400req/s→ Queue adds latency → more queuing → cascade
Small DB:Query time 0.5ms → 20connections handle 40,000req/sLarge DB:Query time 50ms → 20connections handle 400req/s→ Requests start queuing at 400req/s→ Queue adds latency → more queuing → cascade
Your load test showed the system handling 3,000 RPS with headroom. Production hits a wall at 400 RPS.
Index Usage and Join Strategies
With small tables, the query planner ignores indexes entirely. It doesn't matter — sequential scans on small tables are fast. But this means your load test never exercises index performance at all.
At scale, missing indexes become the dominant performance factor. And compound indexes matter — a query that filters on customer_id and status might use an index on customer_id but still scan thousands of rows for the status filter if there's no compound index.
Join strategies change too. Small tables get nested loop joins. Large tables get hash joins or merge joins. These have completely different memory requirements, CPU profiles, and performance characteristics.
-- Small tables:Nested Loop Join,0.5ms
-- Both tables fitincache,inner loop is trivialSELECT o.*,c.nameFROM orders o JOIN customers c ON o.customer_id = c.idWHERE o.status = 'pending';
-- Large tables:Hash Join,180ms(or worse)
-- Hash table may spill to disk,changing performance profile entirely
-- Small tables:Nested Loop Join,0.5ms
-- Both tables fitincache,inner loop is trivialSELECT o.*,c.nameFROM orders o JOIN customers c ON o.customer_id = c.idWHERE o.status = 'pending';
-- Large tables:Hash Join,180ms(or worse)
-- Hash table may spill to disk,changing performance profile entirely
-- Small tables:Nested Loop Join,0.5ms
-- Both tables fitincache,inner loop is trivialSELECT o.*,c.nameFROM orders o JOIN customers c ON o.customer_id = c.idWHERE o.status = 'pending';
-- Large tables:Hash Join,180ms(or worse)
-- Hash table may spill to disk,changing performance profile entirely
The Anti-Patterns
Every performance engineering team has hit at least one of these. Some are hitting all four.
Anti-Pattern 1: The Empty Database
The most common and most dangerous. The database has whatever the migration scripts created — empty tables with correct schemas. The load test runs. Everything is fast because there's nothing to query.
# "We tested it with k6, performance is fine"k6 run load-test.js
# ✓ P99 < 200ms
# ✓ Error rate < 0.1%
# ✓ Throughput > 2000RPS
# (against adatabase with0rowsinevery table)
# "We tested it with k6, performance is fine"k6 run load-test.js
# ✓ P99 < 200ms
# ✓ Error rate < 0.1%
# ✓ Throughput > 2000RPS
# (against adatabase with0rowsinevery table)
# "We tested it with k6, performance is fine"k6 run load-test.js
# ✓ P99 < 200ms
# ✓ Error rate < 0.1%
# ✓ Throughput > 2000RPS
# (against adatabase with0rowsinevery table)
This is not a load test. This is a test of how fast your application can return empty result sets.
Anti-Pattern 2: The Dozen Rows
Slightly better — someone ran the seed script from the development setup. Now there are 10-20 rows per table. The queries actually return data. But the performance profile is identical to an empty database because every table fits in a single disk page.
Anti-Pattern 3: The CSV Import Marathon
The responsible engineer exports a CSV from production, scrubs it for PII (hopefully), and writes an import script. This takes three days. The data is a snapshot from last month. Half the foreign keys are broken because the export didn't capture all related tables. The import script breaks on the next schema change and nobody updates it.
# The "proper"approach that nobody maintainspsql staging_db -c "\COPY users FROM 'users_sanitized.csv' CSV HEADER"psql staging_db -c "\COPY orders FROM 'orders_sanitized.csv' CSV HEADER"
# ERROR:insert or update on table "orders"violates foreign key constraint
# (because the users CSV was from Monday and orders CSV was from Wednesday)
# The "proper"approach that nobody maintainspsql staging_db -c "\COPY users FROM 'users_sanitized.csv' CSV HEADER"psql staging_db -c "\COPY orders FROM 'orders_sanitized.csv' CSV HEADER"
# ERROR:insert or update on table "orders"violates foreign key constraint
# (because the users CSV was from Monday and orders CSV was from Wednesday)
# The "proper"approach that nobody maintainspsql staging_db -c "\COPY users FROM 'users_sanitized.csv' CSV HEADER"psql staging_db -c "\COPY orders FROM 'orders_sanitized.csv' CSV HEADER"
# ERROR:insert or update on table "orders"violates foreign key constraint
# (because the users CSV was from Monday and orders CSV was from Wednesday)
Anti-Pattern 4: The "Application Layer Is the Bottleneck" Assumption
The most seductive anti-pattern. The reasoning goes: "Our bottleneck is CPU-bound application logic, not the database. So database size doesn't matter."
This is sometimes true — and it's always temporary. The moment you optimize the application layer, the database becomes the bottleneck. And by then your load tests have established baselines against an empty database, so you can't measure the improvement accurately.
What Realistic Data Actually Reveals
Teams that switch from empty-database load testing to realistic-data load testing consistently find the same things. These are not hypothetical — they are patterns we hear about repeatedly.
Finding 1: Your Actual Throughput Is 3-5x Lower
The most common discovery. The endpoint that handled 3,000 RPS with an empty database sustains 600-1,000 RPS with realistic data. The bottleneck was always the database — you just couldn't see it without data.
Finding 2: Your P99 Is 10-50x Higher
Average latency increases modestly, but tail latency explodes. The occasional query that hits an uncached page, takes a suboptimal execution plan, or waits for a connection adds seconds to the P99. Your SLA is on the P99, not the average.
Finding 3: You Have Missing Indexes
Nearly every schema with more than 20 tables has at least one missing index that only matters at scale. The load test with realistic data triggers slow query logs that were silent with 50 rows.
Finding 4: Your Connection Pool Is Undersized
Pool sizing calculated against sub-millisecond queries doesn't hold when queries take 10-50ms. You need 10-50x more connections — or more likely, you need to fix the queries first.
Finding 5: Background Jobs Compete With Request Traffic
At small scale, background jobs (report generation, data exports, analytics queries) complete instantly and don't interfere with request-handling queries. At production scale, they compete for the same connections, buffer cache, and I/O bandwidth. Your load test never surfaced this because the background jobs also ran against an empty database.
The Fix: Seed Before You Test
The workflow is straightforward. Before running your load test, populate the database with realistic data at the volume you expect in production.
Step 1: Seed Realistic Data
# Seed production-scale data fora load testseedfast seed --scope "seed 500,000 users with profiles,
2,000,000orders spread across the last 12months,payments foreach order,and product reviews for10% of orders
# Seed production-scale data fora load testseedfast seed --scope "seed 500,000 users with profiles,
2,000,000orders spread across the last 12months,payments foreach order,and product reviews for10% of orders
# Seed production-scale data fora load testseedfast seed --scope "seed 500,000 users with profiles,
2,000,000orders spread across the last 12months,payments foreach order,and product reviews for10% of orders
Seedfast reads your schema, resolves foreign key dependencies, and generates data with realistic distributions — timestamps clustered during business hours, status fields with realistic ratios, amounts following natural distributions.
Now run the same k6/Gatling/Locust scripts you already have. No changes needed. The application hits the same endpoints, executes the same queries — but now the database has data in it.
# Same test,dramatically different resultsk6 run load-test.js
# Same test,dramatically different resultsk6 run load-test.js
# Same test,dramatically different resultsk6 run load-test.js
Step 3: Compare Results
The numbers will be different. That's the point. The new numbers are closer to what production will experience.
Empty DB 500K UsersP50 Latency 12ms 45msP99 Latency 85ms 890msThroughput 3,200RPS 780RPSError Rate 0.01% 2.3%
DB Connections Used 4/2020/20(saturated)
Empty DB 500K UsersP50 Latency 12ms 45msP99 Latency 85ms 890msThroughput 3,200RPS 780RPSError Rate 0.01% 2.3%
DB Connections Used 4/2020/20(saturated)
Empty DB 500K UsersP50 Latency 12ms 45msP99 Latency 85ms 890msThroughput 3,200RPS 780RPSError Rate 0.01% 2.3%
DB Connections Used 4/2020/20(saturated)
Those errors at 2.3%? Connection timeouts from pool saturation. The P99 at 890ms? A missing index on the orders search query. You just found two production issues before deploying.
Integrating Into Your Load Testing Pipeline
One-off load tests are useful, but the real value comes from running them automatically. Seed the database as a pipeline step, then run the load test.
load-test:
stage:performance
services:
- postgres:16
variables:
POSTGRES_DB:loadtest
DATABASE_URL: postgresql://postgres:postgres@postgres:5432/loadtest
script:
- npm run migrate
- seedfast seed --scope "seed 200,000 users with orders and payments" --output plain
- k6 run load-tests/main.js
rules:
- if:$CI_PIPELINE_SOURCE == "schedule"
- if:$CI_PIPELINE_SOURCE == "web"
load-test:
stage:performance
services:
- postgres:16
variables:
POSTGRES_DB:loadtest
DATABASE_URL: postgresql://postgres:postgres@postgres:5432/loadtest
script:
- npm run migrate
- seedfast seed --scope "seed 200,000 users with orders and payments" --output plain
- k6 run load-tests/main.js
rules:
- if:$CI_PIPELINE_SOURCE == "schedule"
- if:$CI_PIPELINE_SOURCE == "web"
load-test:
stage:performance
services:
- postgres:16
variables:
POSTGRES_DB:loadtest
DATABASE_URL: postgresql://postgres:postgres@postgres:5432/loadtest
script:
- npm run migrate
- seedfast seed --scope "seed 200,000 users with orders and payments" --output plain
- k6 run load-tests/main.js
rules:
- if:$CI_PIPELINE_SOURCE == "schedule"
- if:$CI_PIPELINE_SOURCE == "web"
Nightly Performance Regression
The most powerful pattern: seed fresh data and run load tests nightly. Compare results against previous runs. Catch performance regressions before they reach production.
#!/bin/bash
# nightly-load-test.sh
# Fresh database every rundropdb --if-exists loadtest && createdb loadtestnpm run migrate
# Seed consistent volumesseedfast seed \
--scope "seed 100,000 users with orders, payments, and activity logs" \
--output plain
# Run load test withconsistent parametersk6 run \
--out json=results/$(date +%Y-%m-%d).json \
load-tests/main.js
# Compare withyesterday's results
python scripts/compare-results.py \
results/$(date -d yesterday +%Y-%m-%d).json \
results/$(date +%Y-%m-%d).json \
--regression-threshold 15
#!/bin/bash
# nightly-load-test.sh
# Fresh database every rundropdb --if-exists loadtest && createdb loadtestnpm run migrate
# Seed consistent volumesseedfast seed \
--scope "seed 100,000 users with orders, payments, and activity logs" \
--output plain
# Run load test withconsistent parametersk6 run \
--out json=results/$(date +%Y-%m-%d).json \
load-tests/main.js
# Compare withyesterday's results
python scripts/compare-results.py \
results/$(date -d yesterday +%Y-%m-%d).json \
results/$(date +%Y-%m-%d).json \
--regression-threshold 15
#!/bin/bash
# nightly-load-test.sh
# Fresh database every rundropdb --if-exists loadtest && createdb loadtestnpm run migrate
# Seed consistent volumesseedfast seed \
--scope "seed 100,000 users with orders, payments, and activity logs" \
--output plain
# Run load test withconsistent parametersk6 run \
--out json=results/$(date +%Y-%m-%d).json \
load-tests/main.js
# Compare withyesterday's results
python scripts/compare-results.py \
results/$(date -d yesterday +%Y-%m-%d).json \
results/$(date +%Y-%m-%d).json \
--regression-threshold 15
Sizing Your Test Data
How much data do you need? It depends on what you're testing.
Goal
Minimum volume
What it reveals
Basic query plans
10,000+ rows
Whether indexes are being used
Cache behavior
Data > 25% of shared_buffers
Real-world cache hit rates
Connection pool sizing
Enough to make queries take >5ms
Pool saturation points
Join performance
100,000+ rows per joined table
Hash join vs. nested loop thresholds
Full production simulation
Match production row counts
Realistic end-to-end performance
Start with 10x your current test data. If your dev database has 100 rows per table, seed 1,000. Run the load test. Then try 10,000. Then 100,000. Watch where the performance profile changes — that's where your production issues are hiding.
# Progressive scaling approachseedfast seed --scope "seed 1,000 users with orders and payments"k6 run load-test.js # Baseline
# Drop and re-seed at higher volumeseedfast seed --scope "seed 10,000 users with orders and payments"k6 run load-test.js # Look fornon-linear degradationseedfast seed --scope "seed 100,000 users with orders and payments"k6 run load-test.js # This is usually where things get interesting
# Progressive scaling approachseedfast seed --scope "seed 1,000 users with orders and payments"k6 run load-test.js # Baseline
# Drop and re-seed at higher volumeseedfast seed --scope "seed 10,000 users with orders and payments"k6 run load-test.js # Look fornon-linear degradationseedfast seed --scope "seed 100,000 users with orders and payments"k6 run load-test.js # This is usually where things get interesting
# Progressive scaling approachseedfast seed --scope "seed 1,000 users with orders and payments"k6 run load-test.js # Baseline
# Drop and re-seed at higher volumeseedfast seed --scope "seed 10,000 users with orders and payments"k6 run load-test.js # Look fornon-linear degradationseedfast seed --scope "seed 100,000 users with orders and payments"k6 run load-test.js # This is usually where things get interesting
The point where latency stops scaling linearly with data size is the point where your architecture has a bottleneck. Finding that inflection point is the entire purpose of load testing with realistic data.
The Uncomfortable Truth
Load testing without realistic data is performance theater. It produces charts, it generates reports, it checks a compliance box — but it doesn't predict production behavior. The gap between "P99 under 200ms" on an empty database and "P99 at 2 seconds" in production is the gap between a successful launch and a 2AM incident.
You already invest significant effort in writing load test scenarios, tuning parameters, and analyzing results. The missing piece is the data. One command, five minutes, and your load tests start telling you the truth.
seedfast seed --scope "seed production-scale data across all tables"
seedfast seed --scope "seed production-scale data across all tables"
seedfast seed --scope "seed production-scale data across all tables"
Then run the test you already have. The results will be different. They'll be real.
Ready to see what your load tests are actually measuring?