Small Data, Big Lies
Small Data, Big Lies: 6 Bugs Your Test Suite Will Never Catch
The bugs that only show up when your database has real data in it — and how to catch them before production does.
Your test suite is green. Coverage is 94%. The PR gets approved, merged, deployed. Two hours later, the on-call gets paged: the orders page takes 47 seconds to load. The customer export endpoint is OOM-killing pods. Pagination skips page 7 entirely.
Nothing changed in the code. What changed is that production has 2 million orders, and your test database had 12.
This isn't a testing failure. It's a data volume blind spot — and almost every team has one.
The Volume Blind Spot
Most test databases contain between 5 and 50 rows per table. Just enough to verify that the happy path works. This is fine for unit tests, but it creates a dangerous assumption: if it works with 10 rows, it works with 10 million.
It doesn't. Entire categories of bugs are invisible at low volume and catastrophic at scale. Here are the six most common ones.
1. Pagination Off-by-One
The classic. Your API returns paginated results. With 10 rows and a page size of 10, there's one page. Everything works. With 10,001 rows, page 1001 is either empty or duplicates page 1000 — depending on whether your offset calculation uses > or >=.
The fix is usually cursor-based pagination, but you'll never discover the bug until you have enough rows to fill multiple pages — with realistic timestamp distributions that create collisions.
How to catch it:
Then run your pagination endpoint from page 1 to the last page. Count the total items. If it doesn't match SELECT COUNT(*), you have a bug.
2. N+1 Queries
Your ORM loads a list of orders. For each order, it lazily loads the customer. With 5 orders, that's 6 queries — nobody notices. With 5,000 orders, that's 5,001 queries. The endpoint that responded in 80ms now takes 12 seconds.
N+1 bugs are particularly dangerous because they scale linearly with data. Every new row adds one more query. The performance degradation is proportional and predictable — but only if you have enough data to notice it.
How to catch it:
Enable query logging, hit your endpoints, and count the queries. Any endpoint executing more than ~10 queries for a list view has a problem.
3. Missing Indexes
PostgreSQL can scan 100 rows in under a millisecond without an index. With 1 million rows, that same scan takes seconds. The query planner switches from sequential scan to... still sequential scan, because there's no index to use.
The insidious part: your test suite passes at full speed. Your development database has 50 users. The missing index is invisible until production traffic arrives.
How to catch it:
Then run EXPLAIN ANALYZE on your critical queries. Any sequential scan on a table over 10K rows is a red flag.
4. Memory Blowups
Your ORM loads query results into memory. With 100 rows, that's a few kilobytes. With 100,000 rows, the endpoint allocates 500MB and gets OOM-killed.
This pattern hides in export endpoints, report generators, batch jobs, and admin dashboards. Any code path that collects results into a slice, list, or array without a limit is a time bomb that detonates at scale.
How to catch it:
Hit your export/report endpoints and monitor memory usage. If RSS climbs linearly with row count, you have a memory leak.
5. Timeout Cascades
Service A calls Service B, which queries the database. With small tables, the query runs in 5ms. The total request takes 50ms. With large tables, the query takes 3 seconds. Service A's 2-second timeout fires. The retry hits Service B again. Service B is now handling two slow queries. The third request times out too. The circuit breaker opens. The dashboard goes red.
Timeout cascades don't happen with 50 rows. They happen with 500,000 rows, when one slow query is enough to breach a timeout boundary.
How to catch it:
Run your integration test suite and watch for any request that approaches your timeout thresholds. A request at 80% of the timeout limit today is a timeout tomorrow when the table grows.
6. Unique Constraint Collisions
Your test fixtures use carefully crafted values: user1@test.com, user2@test.com. No collisions. In production, the email generation logic produces duplicates at scale — two users sign up with the same normalized email, or a batch import contains near-duplicates that pass validation individually but violate constraints together.
AI-generated test data produces realistic distributions — including the edge cases that hand-crafted fixtures miss. When Seedfast generates 10,000 email addresses, it follows realistic patterns that surface constraint issues before production does.
How to catch it:
If your constraints hold under 10K realistic rows, they'll hold in production. If they don't — better to find out now.
The Pattern
Every bug in this list shares the same pattern:
Invisible at small scale — test suite passes, code review looks fine
Proportional to data volume — gets worse as tables grow
Discovered in production — where the data is, and the users are
Expensive to fix after the fact — incident response, hotfixes, post-mortems
The fix is also the same: test with realistic data volumes before deploying.
Shift Left With One Command
You don't need a production database copy. You don't need to write fixture factories for 50 tables. You don't need to maintain SQL dump files that drift from your schema.
Seedfast analyzes your schema, resolves foreign key dependencies, and generates the right proportions automatically. You describe what you need, review the plan, and approve:
If the scope exceeds your plan limits, Seedfast asks you to refine — right there in the terminal, no restart needed. If tables already have data, they're skipped automatically.
In CI/CD
Add a seeding step to your pipeline. Run your test suite against real data volumes on every PR:
The --scope flag auto-approves the plan, making it fully non-interactive. Table skipping makes it idempotent — safe to re-run.
Start Small, Then Scale
You don't have to jump to a million rows. Start with enough to surface the first category of bugs:
Goal | Suggested scope | What it catches |
|---|---|---|
Pagination bugs | 1,000+ rows in paginated tables | Off-by-one, cursor issues |
N+1 queries | 500+ rows with relationships | Lazy loading performance |
Missing indexes | 50,000+ rows | Sequential scan bottlenecks |
Memory issues | 100,000+ rows | Unbounded collection growth |
Timeout cascades | 500,000+ rows | Cross-service timeout breaches |
Once you find (and fix) the first bug, you'll want to run every PR against realistic volumes. That's the point — make it a habit, not a one-time exercise.
Ready to find the bugs hiding in your small test database?
Get Started | Documentation | Pricing
Seedfast generates realistic test data from your schema description. No fixtures, no dumps, no maintenance.