Load Testing With an Empty Database? Here's Your Problem
By the Seedfast team ·
Your k6 scripts are flawless. Your Gatling scenarios are tuned. Your Locust swarm is ready. But your database has 47 rows — and every result you're about to collect is fiction.
You spent two weeks building a load testing suite. The scenarios cover login, search, checkout, reporting. You've modeled think times, ramp-up curves, and error thresholds. You run it. The results look great: P99 under 200ms, zero errors, 3,000 requests per second. You present the numbers to the team, green-light the release, and deploy to production.
Monday morning, the dashboard is on fire. Search takes 8 seconds. The reporting endpoint is timing out. The checkout flow that handled 3,000 RPS in your test can barely sustain 400 in production.
Your load test wasn't wrong. Your database was.
The Empty Database Illusion
Here's a dirty secret of performance engineering: most load tests run against databases with trivially small datasets. The reasons are understandable — nobody wants to spend a week building a data generation pipeline when the actual goal is testing application performance. So teams take shortcuts:
- Use the default dev database with 20 rows per table
- Run migrations and insert a handful of fixture records
- Copy a small CSV extract from production
- Skip data setup entirely and hope the application layer is the bottleneck
The problem is that databases don't behave the same way at 50 rows and 5 million rows. Not approximately the same — fundamentally different. Query planners make different decisions. Indexes that were ignored become critical. Disk I/O patterns change. Buffer cache hit rates collapse. Connection pool behavior shifts. Join strategies flip from nested loops to hash joins.
When you load test against an empty database, you're measuring the performance of a system that doesn't exist in production.
What Changes at Scale: A Technical Walkthrough
Query Plans Are Data-Dependent
PostgreSQL's query planner uses table statistics — row counts, value distributions, null fractions — to choose execution plans. With 100 rows, almost every query gets a sequential scan because the entire table fits in a single page. The planner correctly decides that an index lookup would be slower.
At 10 million rows, the same query gets an index scan, a bitmap heap scan, or a parallel sequential scan — depending on selectivity. And if your indexes are missing, you get a sequential scan that reads gigabytes instead of kilobytes.
-- With 100 rows: Seq Scan, 0.2ms (entire table is one page)
EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 42;
-- With 10M rows, no index: Seq Scan, 4,200ms
-- With 10M rows, with index: Index Scan, 0.3ms
Your load test at 100 rows will show 0.2ms. Production with 10M rows and a missing index will show 4,200ms. You've measured nothing useful.
Buffer Cache Behavior Changes Completely
PostgreSQL's shared buffer cache works well when the working set fits in memory. With a small database, everything is cached. Every query hits memory. Latency is sub-millisecond. Throughput is limited only by CPU.
At production scale, the working set exceeds cache size. Queries start hitting disk. Latency spikes. Throughput drops. And it doesn't degrade linearly — it falls off a cliff when the cache hit rate drops below a threshold.
Database size: 50MB → Cache hit rate: 99.9% → Avg query: 0.5ms
Database size: 5GB → Cache hit rate: 97% → Avg query: 2ms
Database size: 50GB → Cache hit rate: 78% → Avg query: 45ms
The 0.5ms you measured in your load test becomes 45ms in production. Your P99 of 200ms becomes P99 of 2,000ms. Your capacity model is off by an order of magnitude.
Connection Pool Saturation
With a small database, every query completes in microseconds. Connections return to the pool almost instantly. A pool of 20 connections can handle thousands of concurrent requests because each connection is occupied for a tiny fraction of a second.
At production scale, queries take longer. Connections stay checked out for milliseconds instead of microseconds. The pool starts queuing. Latency increases. The longer latency causes more concurrent requests to stack up, which increases pool pressure further. The system enters a feedback loop that doesn't exist at small scale.
Small DB: Query time 0.5ms → 20 connections handle 40,000 req/s
Large DB: Query time 50ms → 20 connections handle 400 req/s
→ Requests start queuing at 400 req/s
→ Queue adds latency → more queuing → cascade
Your load test showed the system handling 3,000 RPS with headroom. Production hits a wall at 400 RPS.
Index Usage and Join Strategies
With small tables, the query planner ignores indexes entirely. It doesn't matter — sequential scans on small tables are fast. But this means your load test never exercises index performance at all.
At scale, missing indexes become the dominant performance factor. And compound indexes matter — a query that filters on customer_id and ``statusmight use an index oncustomer_id` but still scan thousands of rows for the status filter if there's no compound index.
Join strategies change too. Small tables get nested loop joins. Large tables get hash joins or merge joins. These have completely different memory requirements, CPU profiles, and performance characteristics.
-- Small tables: Nested Loop Join, 0.5ms
-- Both tables fit in cache, inner loop is trivial
SELECT o.*, c.name
FROM orders o JOIN customers c ON o.customer_id = c.id
WHERE o.status = 'pending';
-- Large tables: Hash Join, 180ms (or worse)
-- Hash table may spill to disk, changing performance profile entirely
The Anti-Patterns
Every performance engineering team has hit at least one of these. Some are hitting all four.
Anti-Pattern 1: The Empty Database
The most common and most dangerous. The database has whatever the migration scripts created — empty tables with correct schemas. The load test runs. Everything is fast because there's nothing to query.
# "We tested it with k6, performance is fine"
k6 run load-test.js
# ✓ P99 < 200ms
# ✓ Error rate < 0.1%
# ✓ Throughput > 2000 RPS
# (against a database with 0 rows in every table)
This is not a load test. This is a test of how fast your application can return empty result sets.
Anti-Pattern 2: The Dozen Rows
Slightly better — someone ran the seed script from the development setup. Now there are 10-20 rows per table. The queries actually return data. But the performance profile is identical to an empty database because every table fits in a single disk page.
Anti-Pattern 3: The CSV Import Marathon
The responsible engineer exports a CSV from production, scrubs it for PII (hopefully), and writes an import script. This takes three days. The data is a snapshot from last month. Half the foreign keys are broken because the export didn't capture all related tables. The import script breaks on the next schema change and nobody updates it.
# The "proper" approach that nobody maintains
psql staging_db -c "\COPY users FROM 'users_sanitized.csv' CSV HEADER"
psql staging_db -c "\COPY orders FROM 'orders_sanitized.csv' CSV HEADER"
# ERROR: insert or update on table "orders" violates foreign key constraint
# (because the users CSV was from Monday and orders CSV was from Wednesday)
Anti-Pattern 4: The "Application Layer Is the Bottleneck" Assumption
The most seductive anti-pattern. The reasoning goes: "Our bottleneck is CPU-bound application logic, not the database. So database size doesn't matter."
This is sometimes true — and it's always temporary. The moment you optimize the application layer, the database becomes the bottleneck. And by then your load tests have established baselines against an empty database, so you can't measure the improvement accurately.
What Realistic Data Actually Reveals
Teams that switch from empty-database load testing to realistic-data load testing consistently find the same things. These are not hypothetical — they are patterns we hear about repeatedly.
Finding 1: Your Actual Throughput Is 3-5x Lower
The most common discovery. The endpoint that handled 3,000 RPS with an empty database sustains 600-1,000 RPS with realistic data. The bottleneck was always the database — you just couldn't see it without data.
Finding 2: Your P99 Is 10-50x Higher
Average latency increases modestly, but tail latency explodes. The occasional query that hits an uncached page, takes a suboptimal execution plan, or waits for a connection adds seconds to the P99. Your SLA is on the P99, not the average.
Finding 3: You Have Missing Indexes
Nearly every schema with more than 20 tables has at least one missing index that only matters at scale. The load test with realistic data triggers slow query logs that were silent with 50 rows.
Finding 4: Your Connection Pool Is Undersized
Pool sizing calculated against sub-millisecond queries doesn't hold when queries take 10-50ms. You need 10-50x more connections — or more likely, you need to fix the queries first.
Finding 5: Background Jobs Compete With Request Traffic
At small scale, background jobs (report generation, data exports, analytics queries) complete instantly and don't interfere with request-handling queries. At production scale, they compete for the same connections, buffer cache, and I/O bandwidth. Your load test never surfaced this because the background jobs also ran against an empty database.
The Fix: Seed Before You Test
The workflow is straightforward. Before running your load test, populate the database with realistic data at the volume you expect in production.
Step 1: Seed Realistic Data
# Seed production-scale data for a load test
seedfast seed --scope "seed 500,000 users with profiles,
2,000,000 orders spread across the last 12 months,
payments for each order, and product reviews for 10% of orders
Seedfast reads your schema, resolves foreign key dependencies, and generates data with realistic distributions — timestamps clustered during business hours, status fields with realistic ratios, amounts following natural distributions.
Seeding Plan:
public.users — 500,000 records
public.profiles — 500,000 records
public.products — 5,000 records
public.orders — 2,000,000 records
public.order_items — 5,800,000 records
public.payments — 2,000,000 records
public.reviews — 200,000 records
Total: 11,005,000 records across 7 tables
Approve? (Y/n)
Step 2: Run Your Load Test
Now run the same k6/Gatling/Locust scripts you already have. No changes needed. The application hits the same endpoints, executes the same queries — but now the database has data in it.
# Same test, dramatically different results
k6 run load-test.js
Step 3: Compare Results
The numbers will be different. That's the point. The new numbers are closer to what production will experience.
Empty DB 500K Users
P50 Latency 12ms 45ms
P99 Latency 85ms 890ms
Throughput 3,200 RPS 780 RPS
Error Rate 0.01% 2.3%
DB Connections Used 4/20 20/20 (saturated)
Those errors at 2.3%? Connection timeouts from pool saturation. The P99 at 890ms? A missing index on the orders search query. You just found two production issues before deploying.
Integrating Into Your Load Testing Pipeline
One-off load tests are useful, but the real value comes from running them automatically. Seed the database as a pipeline step, then run the load test.
GitHub Actions
name: Load Test
on:
schedule:
- cron: '0 2 * * 1' # Weekly Monday 2AM
workflow_dispatch:
jobs:
load-test:
runs-on: ubuntu-latest
services:
postgres:
image: postgres:16
env:
POSTGRES_DB: loadtest
POSTGRES_USER: postgres
POSTGRES_PASSWORD: postgres
ports:
- 5432:5432
steps:
- uses: actions/checkout@v4
- name: Run migrations
run: npm run migrate
env:
DATABASE_URL: postgresql://postgres:postgres@localhost:5432/loadtest
- name: Seed realistic data
run: |
seedfast seed \
--scope "seed 200,000 users with orders and payments" \
--output plain
env:
SEEDFAST_API_KEY: ${{ secrets.SEEDFAST_API_KEY }}
DATABASE_URL: postgresql://postgres:postgres@localhost:5432/loadtest
- name: Run load test
run: k6 run --out json=results.json load-tests/main.js
env:
BASE_URL: http://localhost:3000
DATABASE_URL: postgresql://postgres:postgres@localhost:5432/loadtest
- name: Check thresholds
run: |
# Fail the pipeline if P99 exceeds threshold
python scripts/check-load-test-results.py results.json \
--p99-max 500 \
--error-rate-max 1.0
GitLab CI
load-test:
stage: performance
services:
- postgres:16
variables:
POSTGRES_DB: loadtest
DATABASE_URL: postgresql://postgres:postgres@postgres:5432/loadtest
script:
- npm run migrate
- seedfast seed --scope "seed 200,000 users with orders and payments" --output plain
- k6 run load-tests/main.js
rules:
- if: $CI_PIPELINE_SOURCE == "schedule"
- if: $CI_PIPELINE_SOURCE == "web"
Nightly Performance regressions
The most powerful pattern: seed fresh data and run load tests nightly. Compare results against previous runs. Catch performance regressions before they reach production.
#!/bin/bash
# nightly-load-test.sh
# Fresh database every run
dropdb --if-exists loadtest && createdb loadtest
npm run migrate
# Seed consistent volumes
seedfast seed \
--scope "seed 100,000 users with orders, payments, and activity logs" \
--output plain
# Run load test with consistent parameters
k6 run \
--out json=results/$(date +%Y-%m-%d).json \
load-tests/main.js
# Compare with yesterday's results
python scripts/compare-results.py \
results/$(date -d yesterday +%Y-%m-%d).json \
results/$(date +%Y-%m-%d).json \
--regression-threshold 15
Sizing Your Test Data
| Goal | Minimum volume | What it reveals |
|---|---|---|
| Basic query plans | 10,000+ rows | Whether indexes are being used |
| Cache behavior | Data > 25% of shared_buffers | Real-world cache hit rates |
| Connection pool sizing | Enough to make queries take >5ms | Pool saturation points |
| Join performance | 100,000+ rows per joined table | Hash join vs. nested loop thresholds |
| Hash join vs. nested loop thresholds | Match production row counts | Realistic end-to-end performance |
Start with 10x your current test data. If your dev database has 100 rows per table, seed 1,000. Run the load test. Then try 10,000. Then 100,000. Watch where the performance profile changes — that's where your production issues are hiding.
# Progressive scaling approach
seedfast seed --scope "seed 1,000 users with orders and payments"
k6 run load-test.js # Baseline
# Drop and re-seed at higher volume
seedfast seed --scope "seed 10,000 users with orders and payments"
k6 run load-test.js # Look for non-linear degradation
seedfast seed --scope "seed 100,000 users with orders and payments"
k6 run load-test.js # This is usually where things get interesting
The point where latency stops scaling linearly with data size is the point where your architecture has a bottleneck. Finding that inflection point is the entire purpose of load testing with realistic data.
The Uncomfortable Truth
Load testing without realistic data is performance theater. It produces charts, it generates reports, it checks a compliance box — but it doesn't predict production behavior. The gap between "P99 under 200ms" on an empty database and "P99 at 2 seconds" in production is the gap between a successful launch and a 2AM incident.
You already invest significant effort in writing load test scenarios, tuning parameters, and analyzing results. The missing piece is the data. One command, five minutes, and your load tests start telling you the truth.
seedfast seed --scope "seed production-scale data across all tables"
Then run the test you already have. The results will be different. They'll be real.
Ready to see what your load tests are actually measuring?
Get Started | Documentation |Pricing
Seedfast generates production-scale test data from your schema. Realistic volumes, valid relationships, zero maintenance.