All posts

Load Testing With an Empty Database? Here's Your Problem

By Mikhail Shytsko, Founder at Seedfast · · Updated

Your k6 scripts are flawless. Your Gatling scenarios are tuned. Your Locust swarm is ready. But your database has 47 rows — and every result you're about to collect is fiction.

You spent two weeks building a load testing suite. The scenarios cover login, search, checkout, reporting. You've modeled think times, ramp-up curves, and error thresholds. You run it. The results look great: P99 under 200ms, zero errors, 3,000 requests per second. You present the numbers to the team, green-light the release, and deploy to production.

Monday morning, the dashboard is on fire. Search takes 8 seconds. The reporting endpoint is timing out. The checkout flow that handled 3,000 RPS in your test can barely sustain 400 in production.

Your load test wasn't wrong. Your database was.

  • An empty or near-empty database makes every load-test number fiction. Query plans, buffer cache behavior, and connection-pool dynamics are all data-dependent, so a system measured at 50 rows behaves nothing like the same system at 5 million.
  • The four numbers that move most: throughput typically drops 3–5x, P99 latency rises 10–50x, missing indexes surface, and the connection pool saturates — none of which an empty database reveals.
  • The fix isn't a better load-test script — it's realistic data volume before the test runs, so your existing k6/Gatling/Locust scenarios hit a production-shaped database.
  • Seedfast seeds foreign-key-valid data at production volume from your live schema in one command, so you can drop it in as a pipeline step before the load test with no script changes.

Here's a dirty secret of performance engineering: most load tests run against databases with trivially small datasets. The reasons are understandable — nobody wants to spend a week building a data generation pipeline when the actual goal is testing application performance. So teams take shortcuts:

  • Use the default dev database with 20 rows per table
  • Run migrations and insert a handful of fixture records
  • Copy a small CSV extract from production
  • Skip data setup entirely and hope the application layer is the bottleneck

The problem is that databases don't behave the same way at 50 rows and 5 million rows. Not approximately the same — fundamentally different. Query planners make different decisions. Indexes that were ignored become critical. Disk I/O patterns change. Buffer cache hit rates collapse. Connection pool behavior shifts. Join strategies flip from nested loops to hash joins.

When you load test against an empty database, you're measuring the performance of a system that doesn't exist in production.

PostgreSQL's query planner uses table statistics — row counts, value distributions, null fractions — to choose execution plans. With 100 rows, almost every query gets a sequential scan because reading the whole table sequentially costs less than an index lookup's random I/O. The planner correctly decides the index is not worth it.

At 10 million rows, the same query gets an index scan, a bitmap heap scan, or a parallel sequential scan — depending on selectivity. And if your indexes are missing, you get a sequential scan that reads gigabytes instead of kilobytes.

-- Illustrative timings (the shape of the change, not a benchmark of any hardware):
-- With 100 rows: Seq Scan, 0.2ms (whole table fits in cache)
EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 42;

-- With 10M rows, no index: Seq Scan, 4,200ms
-- With 10M rows, with index: Index Scan, 0.3ms

Your load test at 100 rows will show 0.2ms. Production with 10M rows and a missing index will show 4,200ms. You've measured nothing useful.

PostgreSQL's shared buffer cache works well when the working set fits in memory. With a small database, everything is cached. Every query hits memory. Latency is sub-millisecond. Throughput is limited only by CPU.

At production scale, the working set exceeds cache size. Queries start hitting disk. Latency spikes. Throughput drops. And it doesn't degrade linearly — it falls off a cliff when the cache hit rate drops below a threshold. The mapping below is illustrative of that cliff shape, not a benchmark of any specific hardware:

Database size: 50MB   → Cache hit rate: 99.9%  → Avg query: 0.5ms
Database size: 5GB    → Cache hit rate: 97%    → Avg query: 2ms
Database size: 50GB   → Cache hit rate: 78%    → Avg query: 45ms

The 0.5ms you measured in your load test becomes 45ms in production. Your P99 of 200ms becomes P99 of 2,000ms. Your capacity model is off by an order of magnitude.

With a small database, every query completes in microseconds. Connections return to the pool almost instantly. A pool of 20 connections can handle thousands of concurrent requests because each connection is occupied for a tiny fraction of a second.

At production scale, queries take longer. Connections stay checked out for milliseconds instead of microseconds. The pool starts queuing. Latency increases. The longer latency causes more concurrent requests to stack up, which increases pool pressure further. The system enters a feedback loop that doesn't exist at small scale. The figures below assume query time dominates the connection hold time — a simplification that shows the direction, not a literal throughput ceiling:

Small DB:  Query time 0.5ms  → 20 connections handle 40,000 req/s
Large DB:  Query time 50ms   → 20 connections handle 400 req/s
                              → Requests start queuing at 400 req/s
                              → Queue adds latency → more queuing → cascade

Your load test showed the system handling 3,000 RPS with headroom. Production hits a wall at 400 RPS.

With small tables, the query planner ignores indexes entirely. It doesn't matter, because sequential scans on small tables are fast. But this means your load test never exercises index performance at all.

At scale, missing indexes become the dominant performance factor. And compound indexes matter — a query that filters on customer_id and status might use an index on customer_id but still scan thousands of rows for the status filter if there's no compound index. The bugs that only real test data catches walk through this same index-at-scale failure from the application side.

Join strategies change too. Small tables get nested loop joins. Large tables get hash joins or merge joins. These have completely different memory requirements, CPU profiles, and performance characteristics.

-- Small tables: Nested Loop Join, 0.5ms
-- Both tables fit in cache, inner loop is trivial
SELECT o.*, c.name
FROM orders o JOIN customers c ON o.customer_id = c.id
WHERE o.status = 'pending';

-- Large tables: Hash Join, 180ms (or worse)
-- Hash table may spill to disk, changing performance profile entirely

Every performance engineering team has hit at least one of these. Some are hitting all four.

The most common and most dangerous. The database has whatever the migration scripts created — empty tables with correct schemas. The load test runs. Everything is fast because there's nothing to query.

# "We tested it with k6, performance is fine"
k6 run load-test.js
# ✓ P99 < 200ms
# ✓ Error rate < 0.1%
# ✓ Throughput > 2000 RPS
# (against a database with 0 rows in every table)

This is not a load test. This is a test of how fast your application can return empty result sets.

Slightly better — someone ran the seed script from the development setup. Now there are 10-20 rows per table. The queries actually return data. But the performance profile is identical to an empty database because every table fits in a single disk page.

The responsible engineer exports a CSV from production, scrubs it for PII (hopefully), and writes an import script. This takes three days. The data is a snapshot from last month. Half the foreign keys are broken because the export didn't capture all related tables. The import script breaks on the next schema change and nobody updates it.

# The "proper" approach that nobody maintains
psql staging_db -c "\COPY users FROM 'users_sanitized.csv' CSV HEADER"
psql staging_db -c "\COPY orders FROM 'orders_sanitized.csv' CSV HEADER"
# ERROR: insert or update on table "orders" violates foreign key constraint
# (because the users CSV was from Monday and orders CSV was from Wednesday)

The most seductive anti-pattern. The reasoning goes: "Our bottleneck is CPU-bound application logic, not the database. So database size doesn't matter."

This is sometimes true, and it's always temporary. The moment you optimize the application layer, the database becomes the bottleneck. And by then your load tests have established baselines against an empty database, so you can't measure the improvement accurately.

Teams that switch to realistic-data load testing keep hitting the same five issues. These are patterns we hear about repeatedly, not benchmarks from a lab.

The most common discovery. The endpoint that handled 3,000 RPS with an empty database sustains 600-1,000 RPS with realistic data. The bottleneck was always the database — you just couldn't see it without data.

Average latency increases modestly, but tail latency explodes. The occasional query that hits an uncached page, takes a suboptimal execution plan, or waits for a connection adds seconds to the P99. Your SLA is on the P99, not the average.

Nearly every schema with more than 20 tables has at least one missing index that only matters at scale. The load test with realistic data triggers slow query logs that were silent with 50 rows. Missing indexes are one of six bug classes that only real test data catches — alongside N+1 queries, pagination off-by-ones, and timeout cascades.

Pool sizing calculated against sub-millisecond queries doesn't hold when queries take 10-50ms. The real fix is usually the queries themselves, not a bigger pool — adding connections often just moves the contention into PostgreSQL's backends.

At small scale, background jobs (report generation, data exports, analytics queries) complete instantly and don't interfere with request-handling queries. At production scale, they compete for the same connections, buffer cache, and I/O bandwidth. Your load test never surfaced this because the background jobs also ran against an empty database.

The workflow is straightforward. Before running your load test, populate the database with realistic data at the volume you expect in production.

# Seed production-scale data for a load test
seedfast seed --scope "seed 500,000 users with profiles, 2,000,000 orders spread across the last 12 months, payments for each order, and product reviews for 10% of orders"

Seedfast reads your schema, resolves foreign key dependencies into a valid insert order, and generates data with realistic distributions — timestamps clustered during business hours, status fields with realistic ratios, amounts following natural distributions. Because it re-reads the schema on every run, it adapts to migrations instead of breaking like the hand-written CSV importer above.

Seeding Plan:
  public.users            — 500,000 records
  public.profiles         — 500,000 records
  public.products         — 5,000 records
  public.orders           — 2,000,000 records
  public.order_items      — 5,800,000 records
  public.payments         — 2,000,000 records
  public.reviews          — 200,000 records

Total: 11,005,000 records across 7 tables

Approve? (Y/n)

Now run the same k6/Gatling/Locust scripts you already have. No changes needed. The application hits the same endpoints, executes the same queries — but now the database has data in it.

# Same test, dramatically different results
k6 run load-test.js

The numbers will be different. That's the point. The new numbers are closer to what production will experience. The figures below are illustrative of the shape of the change — throughput down, tail latency up, pool saturated — not a benchmark of any specific application:

                    Empty DB        500K Users
P50 Latency         12ms            45ms
P99 Latency         85ms            890ms
Throughput           3,200 RPS      780 RPS
Error Rate           0.01%          2.3%
DB Connections Used  4/20           20/20 (saturated)

An error rate like that is the kind of thing pool saturation produces; a P99 that climbs like that is what a missing index on a hot query tends to look like. Load tests exist to surface those failures, but they only appear against a realistically sized database. Seedfast generates one from your own schema, so the test runs against production-shaped data instead of an empty table — run your first seed.

One-off load tests are useful, but the real value comes from running them automatically. Seed the database as a pipeline step, then run the load test. The CI/CD database seeding guide covers the full setup, including ephemeral per-PR databases and non-interactive mode.

name: Load Test
on:
  schedule:
    - cron: '0 2 * * 1'  # Weekly Monday 2AM
  workflow_dispatch:

jobs:
  load-test:
    runs-on: ubuntu-latest
    services:
      postgres:
        image: postgres:16
        env:
          POSTGRES_DB: loadtest
          POSTGRES_USER: postgres
          POSTGRES_PASSWORD: postgres
        ports:
          - 5432:5432

    steps:
      - uses: actions/checkout@v4

      - name: Run migrations
        run: npm run migrate
        env:
          DATABASE_URL: postgresql://postgres:postgres@localhost:5432/loadtest

      - name: Seed realistic data
        run: |
          seedfast seed \
            --scope "seed 200,000 users with orders and payments" \
            --output plain
        env:
          SEEDFAST_API_KEY: ${{ secrets.SEEDFAST_API_KEY }}
          DATABASE_URL: postgresql://postgres:postgres@localhost:5432/loadtest

      - name: Run load test
        run: k6 run --out json=results.json load-tests/main.js
        env:
          BASE_URL: http://localhost:3000
          DATABASE_URL: postgresql://postgres:postgres@localhost:5432/loadtest

      - name: Check thresholds
        run: |
          # Fail the pipeline if P99 exceeds threshold
          python scripts/check-load-test-results.py results.json \
            --p99-max 500 \
            --error-rate-max 1.0
load-test:
  stage: performance
  services:
    - postgres:16
  variables:
    POSTGRES_DB: loadtest
    DATABASE_URL: postgresql://postgres:postgres@postgres:5432/loadtest
  script:
    - npm run migrate
    - seedfast seed --scope "seed 200,000 users with orders and payments" --output plain
    - k6 run load-tests/main.js
  rules:
    - if: $CI_PIPELINE_SOURCE == "schedule"
    - if: $CI_PIPELINE_SOURCE == "web"

The most powerful pattern: seed fresh data and run load tests nightly. Compare results against previous runs. Catch performance regressions before they reach production.

#!/bin/bash
# nightly-load-test.sh

# Fresh database every run
dropdb --if-exists loadtest && createdb loadtest
npm run migrate

# Seed consistent volumes
seedfast seed \
  --scope "seed 100,000 users with orders, payments, and activity logs" \
  --output plain

# Run load test with consistent parameters
k6 run \
  --out json=results/$(date +%Y-%m-%d).json \
  load-tests/main.js

# Compare with yesterday's results
python scripts/compare-results.py \
  results/$(date -d yesterday +%Y-%m-%d).json \
  results/$(date +%Y-%m-%d).json \
  --regression-threshold 15
GoalMinimum volumeWhat it reveals
Basic query plans10,000+ rowsWhether indexes are being used
Cache behaviorData > 25% of shared_buffersReal-world cache hit rates
Connection pool sizingEnough to make queries take >5msPool saturation points
Join performance100,000+ rows per joined tableHash join vs. nested loop thresholds
End-to-end realismMatch production row countsRealistic full-pipeline performance

Start with 10x your current test data. If your dev database has 100 rows per table, seed 1,000. Run the load test. Then try 10,000. Then 100,000. Watch where the performance profile changes — that's where your production issues are hiding.

# Progressive scaling approach
seedfast seed --scope "seed 1,000 users with orders and payments"
k6 run load-test.js  # Baseline

# Drop and re-seed at higher volume
seedfast seed --scope "seed 10,000 users with orders and payments"
k6 run load-test.js  # Look for non-linear degradation

seedfast seed --scope "seed 100,000 users with orders and payments"
k6 run load-test.js  # This is usually where things get interesting

The point where latency stops scaling linearly with data size is the point where your architecture has a bottleneck. Finding that inflection point is the entire purpose of load testing with realistic data. Re-seeding between volume tiers takes minutes, not the three-day CSV marathon: in our internal runs, Seedfast generated about 1M foreign-key-valid rows on a 20-table SaaS schema in roughly 3.5 minutes (local Postgres, single run). For the highest volumes, the large-volume seeding guide covers batching and scope tuning.

Load testing without realistic data is performance theater. It produces charts, it generates reports, it checks a compliance box — but it doesn't predict production behavior. The gap between "P99 under 200ms" on an empty database and "P99 at 2 seconds" in production is the gap between a successful launch and a 2AM incident.

You already invest significant effort in writing load test scenarios, tuning parameters, and analyzing results. The missing piece is the data. One command, a few minutes, and your load tests start telling you the truth.

seedfast seed --scope "seed production-scale data across all tables"

Then run the test you already have. The results will be different. They'll be real.

Database performance is data-dependent. With a near-empty table, PostgreSQL's planner picks a sequential scan because the table is cheaper to read whole, every read is served from the buffer cache, and connections return to the pool in microseconds. At production volume the same query may need an index scan or hash join, reads start hitting disk, and connections stay checked out long enough to saturate the pool. So an empty-database load test really measures how fast your application returns near-empty result sets, not how it handles production query load.

Match production row counts when you can; when you cannot, start at roughly 10x your current test volume and scale up in tiers (1,000 → 10,000 → 100,000 → 1,000,000), running the load test at each step. The thresholds differ by effect. Index usage shows up around 10,000 rows, cache behavior once the data exceeds about 25% of shared_buffers, and join-strategy changes around 100,000 rows per joined table. Watch for the volume where latency stops scaling linearly — that inflection is where your bottleneck lives.

Yes. The planner chooses a plan from table statistics — row counts, value distributions, null fractions. At 100 rows it favors sequential scans; at 10 million rows the same query may use an index scan, a bitmap heap scan, or a parallel scan depending on selectivity, and a missing index turns into a scan that reads gigabytes. Joins flip too: small tables get nested-loop joins, large tables get hash or merge joins with entirely different memory and CPU profiles. Run EXPLAIN ANALYZE on realistic volumes to see the plan production will actually run.

It's a common shortcut with two separate problems. First, whether you're even allowed to move real production records into a test environment is a compliance question — under HIPAA, PCI-DSS, GDPR, or SOC 2 it is often restricted — so check with your own compliance team, not a load-test guide. Second, it's brittle: production dumps drift from the schema and break on the next migration, and partial exports leave foreign keys dangling. Seedfast addresses the second problem by generating foreign-key-valid data from your schema, so no real customer records enter the load-test pipeline. It is not an anonymization or compliance tool — it doesn't make copying prod data safe, it removes the reason to copy it at all.

To add database seeding to a load-testing CI pipeline, add a seed step after migrations and before the load test. Set SEEDFAST_API_KEY in the environment to run the CLI non-interactively (no confirmation prompt), pass your volume as a plain-English --scope, then run your existing k6, Gatling, or Locust scenario against the seeded database. The CI/CD database seeding guide shows the full GitHub Actions and GitLab CI setup.

Get Started | Documentation | Pricing

Seedfast generates production-scale test data from your schema. Realistic volumes, valid relationships, no seed scripts to maintain.