Load Testing With an Empty Database? Here's Your Problem

By the Seedfast team · Updated March 22, 2026

Your k6 scripts are flawless. Your Gatling scenarios are tuned. Your Locust swarm is ready. But your database has 47 rows — and every result you're about to collect is fiction.

You spent two weeks building a load testing suite. The scenarios cover login, search, checkout, reporting. You've modeled think times, ramp-up curves, and error thresholds. You run it. The results look great: P99 under 200ms, zero errors, 3,000 requests per second. You present the numbers to the team, green-light the release, and deploy to production.

Monday morning, the dashboard is on fire. Search takes 8 seconds. The reporting endpoint is timing out. The checkout flow that handled 3,000 RPS in your test can barely sustain 400 in production.

Your load test wasn't wrong. Your database was.

The Empty Database Illusion

Here's a dirty secret of performance engineering: most load tests run against databases with trivially small datasets. The reasons are understandable — nobody wants to spend a week building a data generation pipeline when the actual goal is testing application performance. So teams take shortcuts:

Use the default dev database with 20 rows per table
Run migrations and insert a handful of fixture records
Copy a small CSV extract from production
Skip data setup entirely and hope the application layer is the bottleneck

The problem is that databases don't behave the same way at 50 rows and 5 million rows. Not approximately the same — fundamentally different. Query planners make different decisions. Indexes that were ignored become critical. Disk I/O patterns change. Buffer cache hit rates collapse. Connection pool behavior shifts. Join strategies flip from nested loops to hash joins.

When you load test against an empty database, you're measuring the performance of a system that doesn't exist in production.

What Changes at Scale: A Technical Walkthrough

Query Plans Are Data-Dependent

PostgreSQL's query planner uses table statistics — row counts, value distributions, null fractions — to choose execution plans. With 100 rows, almost every query gets a sequential scan because the entire table fits in a single page. The planner correctly decides that an index lookup would be slower.

At 10 million rows, the same query gets an index scan, a bitmap heap scan, or a parallel sequential scan — depending on selectivity. And if your indexes are missing, you get a sequential scan that reads gigabytes instead of kilobytes.

-- With 100 rows: Seq Scan, 0.2ms (entire table is one page)
EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 42;

-- With 10M rows, no index: Seq Scan, 4,200ms
-- With 10M rows, with index: Index Scan, 0.3ms

Your load test at 100 rows will show 0.2ms. Production with 10M rows and a missing index will show 4,200ms. You've measured nothing useful.

Buffer Cache Behavior Changes Completely

PostgreSQL's shared buffer cache works well when the working set fits in memory. With a small database, everything is cached. Every query hits memory. Latency is sub-millisecond. Throughput is limited only by CPU.

At production scale, the working set exceeds cache size. Queries start hitting disk. Latency spikes. Throughput drops. And it doesn't degrade linearly — it falls off a cliff when the cache hit rate drops below a threshold.

Database size: 50MB   → Cache hit rate: 99.9%  → Avg query: 0.5ms
Database size: 5GB    → Cache hit rate: 97%    → Avg query: 2ms
Database size: 50GB   → Cache hit rate: 78%    → Avg query: 45ms

The 0.5ms you measured in your load test becomes 45ms in production. Your P99 of 200ms becomes P99 of 2,000ms. Your capacity model is off by an order of magnitude.

Connection Pool Saturation

With a small database, every query completes in microseconds. Connections return to the pool almost instantly. A pool of 20 connections can handle thousands of concurrent requests because each connection is occupied for a tiny fraction of a second.

At production scale, queries take longer. Connections stay checked out for milliseconds instead of microseconds. The pool starts queuing. Latency increases. The longer latency causes more concurrent requests to stack up, which increases pool pressure further. The system enters a feedback loop that doesn't exist at small scale.

Small DB:  Query time 0.5ms  → 20 connections handle 40,000 req/s
Large DB:  Query time 50ms   → 20 connections handle 400 req/s
                              → Requests start queuing at 400 req/s
                              → Queue adds latency → more queuing → cascade

Your load test showed the system handling 3,000 RPS with headroom. Production hits a wall at 400 RPS.

Index Usage and Join Strategies

With small tables, the query planner ignores indexes entirely. It doesn't matter — sequential scans on small tables are fast. But this means your load test never exercises index performance at all.

At scale, missing indexes become the dominant performance factor. And compound indexes matter — a query that filters on customer_id and ``statusmight use an index oncustomer_id` but still scan thousands of rows for the status filter if there's no compound index.

Join strategies change too. Small tables get nested loop joins. Large tables get hash joins or merge joins. These have completely different memory requirements, CPU profiles, and performance characteristics.

-- Small tables: Nested Loop Join, 0.5ms
-- Both tables fit in cache, inner loop is trivial
SELECT o.*, c.name
FROM orders o JOIN customers c ON o.customer_id = c.id
WHERE o.status = 'pending';

-- Large tables: Hash Join, 180ms (or worse)
-- Hash table may spill to disk, changing performance profile entirely

The Anti-Patterns

Every performance engineering team has hit at least one of these. Some are hitting all four.

Anti-Pattern 1: The Empty Database

The most common and most dangerous. The database has whatever the migration scripts created — empty tables with correct schemas. The load test runs. Everything is fast because there's nothing to query.

# "We tested it with k6, performance is fine"
k6 run load-test.js
# ✓ P99 < 200ms
# ✓ Error rate < 0.1%
# ✓ Throughput > 2000 RPS
# (against a database with 0 rows in every table)

This is not a load test. This is a test of how fast your application can return empty result sets.

Anti-Pattern 2: The Dozen Rows

Slightly better — someone ran the seed script from the development setup. Now there are 10-20 rows per table. The queries actually return data. But the performance profile is identical to an empty database because every table fits in a single disk page.

Anti-Pattern 3: The CSV Import Marathon

The responsible engineer exports a CSV from production, scrubs it for PII (hopefully), and writes an import script. This takes three days. The data is a snapshot from last month. Half the foreign keys are broken because the export didn't capture all related tables. The import script breaks on the next schema change and nobody updates it.

# The "proper" approach that nobody maintains
psql staging_db -c "\COPY users FROM 'users_sanitized.csv' CSV HEADER"
psql staging_db -c "\COPY orders FROM 'orders_sanitized.csv' CSV HEADER"
# ERROR: insert or update on table "orders" violates foreign key constraint
# (because the users CSV was from Monday and orders CSV was from Wednesday)

Anti-Pattern 4: The "Application Layer Is the Bottleneck" Assumption

The most seductive anti-pattern. The reasoning goes: "Our bottleneck is CPU-bound application logic, not the database. So database size doesn't matter."

This is sometimes true — and it's always temporary. The moment you optimize the application layer, the database becomes the bottleneck. And by then your load tests have established baselines against an empty database, so you can't measure the improvement accurately.

What Realistic Data Actually Reveals

Teams that switch from empty-database load testing to realistic-data load testing consistently find the same things. These are not hypothetical — they are patterns we hear about repeatedly.

Finding 1: Your Actual Throughput Is 3-5x Lower

The most common discovery. The endpoint that handled 3,000 RPS with an empty database sustains 600-1,000 RPS with realistic data. The bottleneck was always the database — you just couldn't see it without data.

Finding 2: Your P99 Is 10-50x Higher

Average latency increases modestly, but tail latency explodes. The occasional query that hits an uncached page, takes a suboptimal execution plan, or waits for a connection adds seconds to the P99. Your SLA is on the P99, not the average.

Finding 3: You Have Missing Indexes

Nearly every schema with more than 20 tables has at least one missing index that only matters at scale. The load test with realistic data triggers slow query logs that were silent with 50 rows.

Finding 4: Your Connection Pool Is Undersized

Pool sizing calculated against sub-millisecond queries doesn't hold when queries take 10-50ms. You need 10-50x more connections — or more likely, you need to fix the queries first.

Finding 5: Background Jobs Compete With Request Traffic

At small scale, background jobs (report generation, data exports, analytics queries) complete instantly and don't interfere with request-handling queries. At production scale, they compete for the same connections, buffer cache, and I/O bandwidth. Your load test never surfaced this because the background jobs also ran against an empty database.

The Fix: Seed Before You Test

The workflow is straightforward. Before running your load test, populate the database with realistic data at the volume you expect in production.

Step 1: Seed Realistic Data

# Seed production-scale data for a load test
seedfast seed --scope "seed 500,000 users with profiles,
  2,000,000 orders spread across the last 12 months,
  payments for each order, and product reviews for 10% of orders

Seedfast reads your schema, resolves foreign key dependencies, and generates data with realistic distributions — timestamps clustered during business hours, status fields with realistic ratios, amounts following natural distributions.

Seeding Plan:
  public.users            — 500,000 records
  public.profiles         — 500,000 records
  public.products         — 5,000 records
  public.orders           — 2,000,000 records
  public.order_items      — 5,800,000 records
  public.payments         — 2,000,000 records
  public.reviews          — 200,000 records

Total: 11,005,000 records across 7 tables

Approve? (Y/n)

Step 2: Run Your Load Test

Now run the same k6/Gatling/Locust scripts you already have. No changes needed. The application hits the same endpoints, executes the same queries — but now the database has data in it.

# Same test, dramatically different results
k6 run load-test.js

Step 3: Compare Results

The numbers will be different. That's the point. The new numbers are closer to what production will experience.

                    Empty DB        500K Users
P50 Latency         12ms            45ms
P99 Latency         85ms            890ms
Throughput           3,200 RPS      780 RPS
Error Rate           0.01%          2.3%
DB Connections Used  4/20           20/20 (saturated)

Those errors at 2.3%? Connection timeouts from pool saturation. The P99 at 890ms? A missing index on the orders search query. You just found two production issues before deploying.

Integrating Into Your Load Testing Pipeline

One-off load tests are useful, but the real value comes from running them automatically. Seed the database as a pipeline step, then run the load test.

GitHub Actions

name: Load Test
on:
  schedule:
    - cron: '0 2 * * 1'  # Weekly Monday 2AM
  workflow_dispatch:

jobs:
  load-test:
    runs-on: ubuntu-latest
    services:
      postgres:
        image: postgres:16
        env:
          POSTGRES_DB: loadtest
          POSTGRES_USER: postgres
          POSTGRES_PASSWORD: postgres
        ports:
          - 5432:5432

    steps:
      - uses: actions/checkout@v4

      - name: Run migrations
        run: npm run migrate
        env:
          DATABASE_URL: postgresql://postgres:postgres@localhost:5432/loadtest

      - name: Seed realistic data
        run: |
          seedfast seed \
            --scope "seed 200,000 users with orders and payments" \
            --output plain
        env:
          SEEDFAST_API_KEY: ${{ secrets.SEEDFAST_API_KEY }}
          DATABASE_URL: postgresql://postgres:postgres@localhost:5432/loadtest

      - name: Run load test
        run: k6 run --out json=results.json load-tests/main.js
        env:
          BASE_URL: http://localhost:3000
          DATABASE_URL: postgresql://postgres:postgres@localhost:5432/loadtest

      - name: Check thresholds
        run: |
          # Fail the pipeline if P99 exceeds threshold
          python scripts/check-load-test-results.py results.json \
            --p99-max 500 \
            --error-rate-max 1.0

GitLab CI

load-test:
  stage: performance
  services:
    - postgres:16
  variables:
    POSTGRES_DB: loadtest
    DATABASE_URL: postgresql://postgres:postgres@postgres:5432/loadtest
  script:
    - npm run migrate
    - seedfast seed --scope "seed 200,000 users with orders and payments" --output plain
    - k6 run load-tests/main.js
  rules:
    - if: $CI_PIPELINE_SOURCE == "schedule"
    - if: $CI_PIPELINE_SOURCE == "web"

Nightly Performance regressions

The most powerful pattern: seed fresh data and run load tests nightly. Compare results against previous runs. Catch performance regressions before they reach production.

#!/bin/bash
# nightly-load-test.sh

# Fresh database every run
dropdb --if-exists loadtest && createdb loadtest
npm run migrate

# Seed consistent volumes
seedfast seed \
  --scope "seed 100,000 users with orders, payments, and activity logs" \
  --output plain

# Run load test with consistent parameters
k6 run \
  --out json=results/$(date +%Y-%m-%d).json \
  load-tests/main.js

# Compare with yesterday's results
python scripts/compare-results.py \
  results/$(date -d yesterday +%Y-%m-%d).json \
  results/$(date +%Y-%m-%d).json \
  --regression-threshold 15

Sizing Your Test Data

Goal	Minimum volume	What it reveals
Basic query plans	10,000+ rows	Whether indexes are being used
Cache behavior	Data > 25% of shared_buffers	Real-world cache hit rates
Connection pool sizing	Enough to make queries take >5ms	Pool saturation points
Join performance	100,000+ rows per joined table	Hash join vs. nested loop thresholds
Hash join vs. nested loop thresholds	Match production row counts	Realistic end-to-end performance

Start with 10x your current test data. If your dev database has 100 rows per table, seed 1,000. Run the load test. Then try 10,000. Then 100,000. Watch where the performance profile changes — that's where your production issues are hiding.

# Progressive scaling approach
seedfast seed --scope "seed 1,000 users with orders and payments"
k6 run load-test.js  # Baseline

# Drop and re-seed at higher volume
seedfast seed --scope "seed 10,000 users with orders and payments"
k6 run load-test.js  # Look for non-linear degradation

seedfast seed --scope "seed 100,000 users with orders and payments"
k6 run load-test.js  # This is usually where things get interesting

The point where latency stops scaling linearly with data size is the point where your architecture has a bottleneck. Finding that inflection point is the entire purpose of load testing with realistic data.

The Uncomfortable Truth

Load testing without realistic data is performance theater. It produces charts, it generates reports, it checks a compliance box — but it doesn't predict production behavior. The gap between "P99 under 200ms" on an empty database and "P99 at 2 seconds" in production is the gap between a successful launch and a 2AM incident.

You already invest significant effort in writing load test scenarios, tuning parameters, and analyzing results. The missing piece is the data. One command, five minutes, and your load tests start telling you the truth.

seedfast seed --scope "seed production-scale data across all tables"

Then run the test you already have. The results will be different. They'll be real.

Ready to see what your load tests are actually measuring?

Get Started | Documentation |Pricing

Seedfast generates production-scale test data from your schema. Realistic volumes, valid relationships, zero maintenance.