How to Fill 5 Databases That Reference Each Other
By the Seedfast team ·
The monolith had one seed script. Your microservices have five databases, three implicit ID contracts, and a prayer that someone seeds them in the right order.
You split the monolith. Services are independent. Each team owns their schema, their migrations, their deploy cadence. The architecture diagrams look clean.
Then someone on the team tries to set up a local development environment. They spin up the user service, seed its database, and start the order service. Immediately: ERROR: relation "users" does not exist. Of course it doesn't — that's in a different database now. So they seed the order service database too. But the order records reference user IDs that don't exist in the user service. The payment service expects order IDs that were generated by a different seed run. The notification service tries to look up user preferences from a user ID that maps to nothing.
Five databases. Zero data consistency. Every service works in isolation, and nothing works together.
This is the microservice data problem, and it's one of the most underestimated costs of distributed architecture.
The Monolith Had It Easy
In a monolith, seeding is straightforward. One database, one seed.sql, one transaction:
-- seed.sql: everything in one place
INSERT INTO users (id, name, email) VALUES
(1, 'Alice Chen', 'alice@example.com'),
(2, 'Bob Martinez', 'bob@example.com');
INSERT INTO products (id, name, price) VALUES
(101, 'Widget Pro', 29.99),
(102, 'Widget Lite', 9.99);
INSERT INTO orders (id, user_id, product_id, status) VALUES
(1001, 1, 101, 'completed'),
(1002, 2, 102, 'pending');
INSERT INTO payments (id, order_id, amount, status) VALUES
(5001, 1001, 29.99, 'captured'),
(5002, 1002, 9.99, 'authorized');
INSERT INTO notifications (id, user_id, type, message) VALUES
(9001, 1, 'order_shipped', 'Your Widget Pro has shipped'),
(9002, 2, 'payment_pending', 'Complete your payment for Widget Lite')
Foreign keys enforce consistency. One psql command, and the entire application has coherent data. Every join works. Every API response makes sense.
Now distribute that across five services:
┌──────────────┐ ┌──────────────┐ ┌───────────────┐
│ User Service │ │Product Service│ │ Order Service │
│ users_db │ │ products_db │ │ orders_db │
│ │ │ │ │ │
│ users │ │ products │ │ orders │
│ preferences │ │ categories │ │ order_items │
└──────┴───────┘ └──────┴───────┘ └──────┴───────┘
│ │ │
│ ┌────────────┴────┐ │
└───►│ Payment Service │◄───────────┘
│ payments_db │
│ │
│ payments │
│ refunds │
└────────┴───────┘
│
┌────────▼───────┐
│ Notification │
│ Service │
│ notifs_db │
│ │
│ notifications │
│ templates │
└────────────────┘
There are no foreign keys between these databases. The order service stores a user_id column, but nothing enforces that the user actually exists. The payment service stores an order_id, but there's no constraint linking it to the orders database. These are *implicit references — contracts that exist in application code, not in database schemas.
Seed one database without the others, and you get orphan records. Seed them all independently, and the IDs don't match. Seed them in the wrong order, and your application logic breaks in ways that look like bugs but are actually data inconsistency.
The Cross-Service Reference Problem
Microservice databases reference each other through several patterns, and each one creates a seeding challenge.
Shared User IDs
Almost every service stores a user ID. The user service is the source of truth, but every other service has a user_id column pointing back to it. Seed the order service with user IDs 1-100, and the user service with user IDs 500-600, and every order belongs to a nonexistent user.
orders_db.orders.user_id = 42 -> users_db.users.id = ???
payments_db.payments.user_id = 42 -> users_db.users.id = ???
notifs_db.notifications.user_id = 42 -> users_db.users.id
Service-to-Service ID Contracts
The payment service doesn't just reference users. It references orders. The notification service references both users and orders. Some services reference products by ID, others by SKU. These implicit contracts form a dependency graph that's invisible to any single service's schema:
notification_service.notifications:
- user_id -> user_service.users.id
- order_id -> order_service.orders.id
- product_sku -> product_service.products.sku
payment_service.payments:
- user_id -> user_service.users.id
- order_id -> order_service.orders.id
order_service.orders:
- user_id -> user_service.users.id
- product_id -> product_service.products.id
Event-Sourced State
Some microservice architectures use event sourcing or event-driven communication. The order service doesn't call the payment service directly — it emits an OrderCreated event, and the payment service builds its state from that event stream. Seeding the payment database directly, without replaying events, produces state that could never exist in a real system.
# What actually happens in production:
OrderCreated { order_id: 1001, user_id: 42, total: 29.99 }
-> PaymentService creates payment { order_id: 1001, amount: 29.99, status: "pending" }
-> NotificationService sends "Order received" to user 42
# What happens when you seed databases independently:
payments_db has payment { order_id: 9999 } -- order 9999 doesn't exist
notifs_db has notification for user 7777 -- user 7777 doesn't exist
Anti-Patterns That Teams Actually Use
When confronted with cross-service seeding, teams typically reach for one of these approaches. They all look reasonable at first. They all break.
1. "Just Seed Service A First, Then B, Then C"
The ordering approach. Seed users first (they have no dependencies), then products, then orders (depends on users and products), then payments (depends on orders), then notifications (depends on everything).
#!/bin/bash
# seed_all.sh -- the script that someone wrote at 2 AM
echo "Seeding user service..."
psql $USERS_DB < seeds/users.sql
echo "Seeding product service..."
psql $PRODUCTS_DB < seeds/products.sql
echo "Seeding order service..."
psql $ORDERS_DB < seeds/orders.sql
echo "Seeding payment service..."
psql $PAYMENTS_DB < seeds/payments.sql
echo "Seeding notification service..."
psql $NOTIFS_DB < seeds/notifications.sql
This works until the product team changes their ID generation from sequential integers to UUIDs. Or until the user service adds a required tenant_id column. Or until a new service appears that nobody adds to the script. The seed files reference each other by hardcoded IDs, and any change to one file requires updating all downstream files.
Maintenance cost: proportional to the square of the number of services.
2. Copying Production Data Subsets
"Let's just dump 1% of production data from each service."
# "Clever" approach: dump a consistent slice
pg_dump --where="id < 1000" users_db > users_slice.sql
pg_dump --where="user_id < 1000" orders_db > orders_slice.sql
pg_dump --where="order_id IN (SELECT id FROM ...)" payments_db > payments_slice.sql
You now have real production data sitting in every developer's laptop. Multiple compliance frameworks have opinions about this. The subsets are also nearly impossible to keep consistent — if you dump users with id < 1000 from the user service, you need to find all orders belonging to those users, all payments for those orders, all notifications for those users. That's a cross-database join across five databases. Someone writes a script that approximates this, and it works 90% of the time. The other 10% produces orphan references that cause subtle, intermittent test failures.
3. Shared Test Fixtures in a Repo
A single repository with JSON or SQL fixtures that every service reads:
test-fixtures/
users.json # { "users": [{ "id": 1, ... }, ...] }
products.json
orders.json # references user IDs from users.json
payments.json # references order IDs from orders.json
Every service imports the relevant fixtures during testing. This works for small datasets with stable schemas, but it has the same coupling problem as the ordered script — changing the users fixture requires updating every fixture that references user IDs. It also forces every service to depend on a shared repository, which undermines the independence that microservices are supposed to provide.
And the fixtures are always tiny. 10 users, 20 orders. Nobody maintains a fixture set with 50,000 users and realistic distributions across five services.
What Actually Works
There's no silver bullet for cross-service seeding. But there are strategies that hold up better than the anti-patterns above.
Strategy 1: Dependency-Ordered Seeding with Shared ID Ranges
Define explicit ID ranges or conventions that all services agree on. Seed services in dependency order, using IDs from the agreed-upon ranges.
# seed-config.yaml -- shared convention
id_ranges:
users: 1-10000
products: 100001-110000
orders: 200001-300000
payments: 400001-500000
seeding_order:
- user_service # no dependencies
- product_service # no dependencies
- order_service # depends on users, products
- payment_service # depends on orders, users
- notification_service # depends on users, orders
This works if you enforce the convention and every team respects it. The downside is rigidity: the ID ranges are arbitrary constraints that don't exist in production, and they can mask bugs related to ID collision or generation strategy.
Strategy 2: API-Driven Seeding
Instead of inserting directly into each database, use each service's API to create data. Seed users through the user service API. Use the returned user IDs to create orders through the order service API. Use the returned order IDs to create payments.
# seed_via_apis.py
import requests
# Create users
users = []
for i in range(100):
resp = requests.post("http://user-service/api/users", json={
"name": f"Test User {i}",
"email": f"user{i}@test.com"
})
users.append(resp.json())
# Create orders using real user IDs
orders = []
for user in users[:50]:
resp = requests.post("http://order-service/api/orders", json={
"user_id": user["id"], # real ID from user service
"product_id": "prod-101",
"quantity": 2
})
orders.append(resp.json())
This guarantees consistency — you're using the actual IDs that each service generates. It also triggers events, so downstream services (payments, notifications) get their data through the normal event flow.
The downside: it's slow. Creating 10,000 orders through an API that creates them one at a time takes minutes. Creating 100,000 takes an unacceptable amount of time. You're also limited by API capabilities — if there's no bulk creation endpoint, you're making N HTTP requests. And if any service is down during seeding, the entire chain breaks.
Strategy 3: Contract-Based Seeding with Scope
This is where describing relationships in plain language becomes powerful. Instead of hardcoding IDs or chaining API calls, you describe what the data should look like and let the seeding tool resolve the references.
For each service database, you seed with a scope that describes its role in the broader system:
# Seed user service -- the root of the dependency graph
DATABASE_URL="$USERS_DB_URL" seedfast seed \
--scope "1,000 users with varied profiles, addresses, and preferences"
# Seed product service -- independent root
DATABASE_URL="$PRODUCTS_DB_URL" seedfast seed \
--scope "200 products across 10 categories with pricing tiers"
# Seed order service -- references users and products
DATABASE_URL="$ORDERS_DB_URL" seedfast seed \
--scope "5,000 orders referencing user IDs 1-1000 and product IDs 1-200,
with realistic status distribution across the last 6 months"
# Seed payment service -- references orders and users
DATABASE_URL="$PAYMENTS_DB_URL" seedfast seed \
--scope "payments for orders with IDs matching the order service,
mix of completed, pending, and refunded statuses"
# Seed notification service -- references everything
DATABASE_URL="$NOTIFS_DB_URL" seedfast seed \
--scope "notifications for users 1-1000, referencing recent orders,
including order confirmations, shipping updates, and payment receipts
Each database is seeded independently, but the scope description creates implicit coordination. The user IDs in the order service match the user IDs in the user service. The order IDs in the payment service match the order IDs in the order service. The seeding tool handles referential integrity within each database; the scope descriptions handle cross-service consistency.
This isn't magic. You still need to think about which ID ranges overlap. But the maintenance burden drops dramatically — when the user service adds a new column, you don't need to update five fixture files. You re-run the same scope description, and the tool adapts to the new schema.
A Practical Example: E-Commerce Platform
Let's walk through seeding a complete e-commerce platform with five services. This is a real topology that many teams operate.
The Services
- Users (users_db): users, addresses, preferences — No dependencies (root)
- Products (products_db): products, categories, inventory — No dependencies (root)
- Orders (orders_db): orders, order_items — References user_id, product_id
- Payments (payments_db): payments, refunds — References user_id, order_id
- Notifications (notifs_db): notifications, templates — References user_id, order_id
The Dependency Graph
users_service ---------------+
+---> order_service ---> payment_service
product_service ------------+ | |
+------------------+---> notification_service
Users and products are roots — they can be seeded first, in any order. Orders depend on both. Payments depend on orders (and transitively on users). Notifications depend on everything.
Seeding Script
#!/bin/bash
set -euo pipefail
# Phase 1: Seed root services (no cross-service dependencies)
echo "Phase 1: Seeding root services..."
DATABASE_URL="$USERS_DB_URL" seedfast seed \
--scope "1,000 users with names, emails, phone numbers,
billing and shipping addresses, and notification preferences" \
--output plain &
DATABASE_URL="$PRODUCTS_DB_URL" seedfast seed \
--scope "500 products across 15 categories including electronics,
clothing, and home goods, with prices ranging from 5 to 500 dollars,
and inventory counts" \
--output plain &
wait
echo "Phase 1 complete."
# Phase 2: Seed services that depend on roots
echo "Phase 2: Seeding order service..."
DATABASE_URL="$ORDERS_DB_URL" seedfast seed \
--scope "8,000 orders for user IDs 1-1000 referencing product IDs 1-500,
with 1-5 items per order, status distribution of 60% completed
25% shipped 10% processing 5% cancelled,
spread across the last 12 months" \
--output plain
echo "Phase 2 complete."
# Phase 3: Seed services that depend on orders
echo "Phase 3: Seeding downstream services..."
DATABASE_URL="$PAYMENTS_DB_URL" seedfast seed \
--scope "payments for order IDs 1-8000, with amounts matching order totals,
90% captured 5% authorized 3% refunded 2% failed" \
--output plain &
DATABASE_URL="$NOTIFS_DB_URL" seedfast seed \
--scope "notifications for user IDs 1-1000 about order IDs 1-8000,
including order confirmation, shipping update, and delivery
confirmation types, with timestamps after the corresponding order dates" \
--output plain &
wait
echo "Phase 3 complete. All services seeded."
Phase 1 seeds roots in parallel. Phase 2 seeds services that depend on roots. Phase 3 seeds services that depend on phase 2 — again in parallel. Total wall clock time is roughly the time of the slowest service in each phase, not the sum of all services.
Docker Compose for Local Development
Most microservice teams use Docker Compose for local development. Here's how the seeding integrates:
# docker-compose.yml
services:
users-db:
image: postgres:16
environment:
POSTGRES_DB: users_db
POSTGRES_PASSWORD: localdev
ports:
- "5433:5432"
products-db:
image: postgres:16
environment:
POSTGRES_DB: products_db
POSTGRES_PASSWORD: localdev
ports:
- "5434:5432"
orders-db:
image: postgres:16
environment:
POSTGRES_DB: orders_db
POSTGRES_PASSWORD: localdev
ports:
- "5435:5432"
payments-db:
image: postgres:16
environment:
POSTGRES_DB: payments_db
POSTGRES_PASSWORD: localdev
ports:
- "5436:5432"
notifs-db:
image: postgres:16
environment:
POSTGRES_DB: notifs_db
POSTGRES_PASSWORD: localdev
ports:
- "5437:5432"
# dev-setup.sh -- the script every new developer runs
docker compose up -d
echo "Waiting for databases..."
sleep 5
echo "Running migrations..."
cd services/users && npm run db:migrate && cd ../..
cd services/products && npm run db:migrate && cd ../..
cd services/orders && npm run db:migrate && cd ../..
cd services/payments && npm run db:migrate && cd ../..
cd services/notifications && npm run db:migrate && cd ../..
echo "Seeding databases..."
./scripts/seed-all.sh # the phased script from above
echo "Done. All services ready with consistent test data."
A new developer clones the repo, runs ./dev-setup.sh, and has five databases with coherent, cross-referenced data. No manual coordination. No "ask Alice for a copy of her database dump."
CI/CD Pipeline
In CI, you need the same consistency but with clean databases every run:
# .github/workflows/e2e.yml
name: E2E Tests
on: [push]
jobs:
e2e:
runs-on: ubuntu-latest
services:
users-db:
image: postgres:16
env:
POSTGRES_DB: users_db
POSTGRES_PASSWORD: test
ports: ["5433:5432"]
products-db:
image: postgres:16
env:
POSTGRES_DB: products_db
POSTGRES_PASSWORD: test
ports: ["5434:5432"]
orders-db:
image: postgres:16
env:
POSTGRES_DB: orders_db
POSTGRES_PASSWORD: test
ports: ["5435:5432"]
payments-db:
image: postgres:16
env:
POSTGRES_DB: payments_db
POSTGRES_PASSWORD: test
ports: ["5436:5432"]
notifs-db:
image: postgres:16
env:
POSTGRES_DB: notifs_db
POSTGRES_PASSWORD: test
ports: ["5437:5432"]
steps:
- uses: actions/checkout@v4
- name: Run all migrations
run: ./scripts/migrate-all.sh
- name: Seed root services
run: |
DATABASE_URL="postgres://postgres:test@localhost:5433/users_db" \
seedfast seed --scope "100 users with addresses" --output plain &
DATABASE_URL="postgres://postgres:test@localhost:5434/products_db" \
seedfast seed --scope "50 products in 5 categories" --output plain &
wait
env:
SEEDFAST_API_KEY: ${{ secrets.SEEDFAST_API_KEY }}
- name: Seed dependent services
run: |
DATABASE_URL="postgres://postgres:test@localhost:5435/orders_db" \
seedfast seed --scope "500 orders for users 1-100 with products 1-50" --output plain
env:
SEEDFAST_API_KEY: ${{ secrets.SEEDFAST_API_KEY }}
- name: Seed downstream services
run: |
DATABASE_URL="postgres://postgres:test@localhost:5436/payments_db" \
seedfast seed --scope "payments for orders 1-500" --output plain &
DATABASE_URL="postgres://postgres:test@localhost:5437/notifs_db" \
seedfast seed --scope "notifications for users 1-100 about orders 1-500" --output plain &
wait
env:
SEEDFAST_API_KEY: ${{ secrets.SEEDFAST_API_KEY }}
- name: Run E2E tests
run: npm run test:e2e
Note the smaller volumes in CI — 100 users instead of 1,000. CI seeding should be fast. Save the large volumes for staging and load testing.
Lessons From Teams Who've Done This
After talking to teams running 5-20 microservices, a few patterns consistently emerge.
Document Your Cross-Service ID Contracts
Most teams discover their implicit ID dependencies when seeding breaks. Document them explicitly:
# Cross-Service Data Contracts
## user_id
- Source: user-service (users.id, auto-increment)
- Referenced by: order-service, payment-service, notification-service
- Format: integer, 1-based
## order_id
- Source: order-service (orders.id, UUID v4)
- Referenced by: payment-service, notification-service
- Format: UUID
## product_id
- Source: product-service (products.id, auto-increment)
- Referenced by: order-service (order_items.product_id)
- Format: integer, 1-based
This document becomes the source of truth for seeding, testing, and debugging. It also helps new developers understand how data flows across services — something that's notoriously opaque in microservice architectures.
Seed Fewer Services Than You Think
Not every E2E test needs all five databases seeded. If you're testing the checkout flow, you need users, products, and orders. You don't need the notification service's database — the notification service can tolerate missing user profiles gracefully (or it should). Seed only the services that your test scenario actually exercises.
# Checkout flow test: only seed what the checkout touches
DATABASE_URL="$USERS_DB_URL" seedfast seed --scope "10 users with addresses"
DATABASE_URL="$PRODUCTS_DB_URL" seedfast seed --scope "20 products with inventory"
# orders and payments will be created by the test itself
Accept That Some Inconsistency Is OK
In production, microservice databases are eventually consistent. There are windows where the order service has an order that the payment service hasn't processed yet. Your test data can reflect this reality. Not every cross-service reference needs to be perfect — some tests specifically need to verify how services handle missing references.
# Deliberately seed some orphan references to test error handling
DATABASE_URL="$ORDERS_DB_URL" seedfast seed \
--scope "100 orders, 10% with user IDs that don't exist in the user service,
to test the order service's graceful degradation"
This is a feature, not a bug. Your services should handle missing cross-service data, and your tests should verify that they do.
The Uncomfortable Truth
Microservice data seeding is genuinely hard. There's no tool — including Seedfast — that makes it trivial. The fundamental challenge is that microservices trade data consistency for operational independence, and seeding is where that tradeoff becomes viscerally obvious.
What you can do is reduce the manual coordination. Define your cross-service contracts. Seed in dependency order. Use scope descriptions to maintain loose coupling between seed runs. And accept that perfect consistency across five independently-seeded databases requires deliberate effort.
The monolith's seed.sql was simple because the monolith was simple. The microservice equivalent is a phased seeding pipeline with explicit ID contracts — more complex, but manageable if you treat it as a first-class engineering problem instead of an afterthought.
Struggling with cross-service test data consistency?
Get Started | Documentation |Pricing
Seedfast seeds each database independently while respecting the cross-service relationships you describe. No shared fixtures, no coordination scripts, no production data copies.