All posts

How to Fill 5 Databases That Reference Each Other

By the Seedfast team ยท ยท Updated

The monolith had one seed script. Your microservices have five databases, three implicit ID contracts, and a prayer that someone seeds them in the right order.

Microservice database seeding is not just "run your seed script five times". You split the monolith. Services are independent. Each team owns their schema, their migrations, their deploy cadence. The architecture diagrams look clean.

Then someone on the team tries to set up a local development environment. They spin up the user service, seed its database, and start the order service. Immediately: ERROR: relation "users" does not exist. Of course it doesn't โ€” that's in a different database now. So they seed the order service database too. But the order records reference user IDs that don't exist in the user service. The payment service expects order IDs that were generated by a different seed run. The notification service tries to look up user preferences from a user ID that maps to nothing.

Five databases. Zero data consistency. Every service works in isolation, and nothing works together.

That's the microservice data problem, and it's one of the most underestimated costs of distributed architecture.


One note before we dig in. If your "microservices" share a single PostgreSQL instance and just own different tables, this article is overkill โ€” Seedfast seeds the full FK graph in one command (see the database seeding guide, or the per-ORM database seeder reference if you're choosing between Prisma, Drizzle, or TypeORM seeders). This article is for teams with genuinely separate databases per service, where nothing auto-coordinates IDs across service boundaries.


In a monolith, seeding is straightforward. One database, one seed.sql, one transaction:

-- seed.sql: everything in one place
INSERT INTO users (id, name, email) VALUES
  (1, 'Alice Chen', 'alice@example.com'),
  (2, 'Bob Martinez', 'bob@example.com');

INSERT INTO products (id, name, price) VALUES
  (101, 'Widget Pro', 29.99),
  (102, 'Widget Lite', 9.99);

INSERT INTO orders (id, user_id, product_id, status) VALUES
  (1001, 1, 101, 'completed'),
  (1002, 2, 102, 'pending');

INSERT INTO payments (id, order_id, amount, status) VALUES
  (5001, 1001, 29.99, 'captured'),
  (5002, 1002, 9.99, 'authorized');

INSERT INTO notifications (id, user_id, type, message) VALUES
  (9001, 1, 'order_shipped', 'Your Widget Pro has shipped'),
  (9002, 2, 'payment_pending', 'Complete your payment for Widget Lite');

Foreign keys enforce consistency. One psql command, and the entire application has coherent data. Every join works. Every API response makes sense.

Now distribute that across five services:

      โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
      โ”‚ User Service โ”‚  โ”‚ Product Service โ”‚  โ”‚ Order Service โ”‚
      โ”‚  users_db    โ”‚  โ”‚  products_db    โ”‚  โ”‚  orders_db    โ”‚
      โ”‚              โ”‚  โ”‚                 โ”‚  โ”‚               โ”‚
      โ”‚  users       โ”‚  โ”‚  products       โ”‚  โ”‚  orders       โ”‚
      โ”‚  preferences โ”‚  โ”‚  categories     โ”‚  โ”‚  order_items  โ”‚
      โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
              โ”‚                  โ”‚                   โ”‚
              โ”‚         โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”          โ”‚
              โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–บโ”‚ Payment Service โ”‚โ—„โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                        โ”‚  payments_db    โ”‚
                        โ”‚                 โ”‚
                        โ”‚  payments       โ”‚
                        โ”‚  refunds        โ”‚
                        โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                                 โ”‚
                        โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                        โ”‚ Notification   โ”‚
                        โ”‚  Service       โ”‚
                        โ”‚  notifs_db     โ”‚
                        โ”‚                โ”‚
                        โ”‚  notifications โ”‚
                        โ”‚  templates     โ”‚
                        โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

There are no foreign keys between these databases. The order service stores a user_id column, but nothing enforces that the user actually exists. The payment service stores an order_id, but there's no constraint linking it to the orders database. These are implicit references โ€” contracts that exist in application code, not in database schemas.

Seed one database without the others, and you get orphan records. Seed them all independently, and the IDs don't match. Seed them in the wrong order, and your application logic breaks in ways that look like bugs but are actually data inconsistency.

Microservice databases reference each other through several patterns, and each one creates a seeding challenge.

Almost every service stores a user ID. The user service is the source of truth, but every other service has a user_id column pointing back to it. Seed the order service with user IDs 1-100, and the user service with user IDs 500-600, and every order belongs to a nonexistent user.

orders_db.orders.user_id        = 42  ->  users_db.users.id = ???
payments_db.payments.user_id    = 42  ->  users_db.users.id = ???
notifs_db.notifications.user_id = 42  ->  users_db.users.id = ???

The payment service doesn't just reference users. It references orders. The notification service references both users and orders. Some services reference products by ID, others by SKU. These implicit contracts form a dependency graph that's invisible to any single service's schema:

notification_service.notifications:
  - user_id     ->  user_service.users.id
  - order_id    ->  order_service.orders.id
  - product_sku ->  product_service.products.sku

payment_service.payments:
  - user_id     ->  user_service.users.id
  - order_id    ->  order_service.orders.id

order_service.orders:
  - user_id     ->  user_service.users.id
  - product_id  ->  product_service.products.id

Before choosing a seeding strategy, choose an ID type โ€” it shapes every seed scope you write for the rest of the system.

With integer IDs, seed scopes can reference ranges: "orders for user IDs 1โ€“1000". Ranges are human-readable, auditable, and stable across runs as long as the seed order is deterministic. The downside is that services must avoid overlapping ranges, and the moment one service switches to UUIDs (common when moving to distributed ID generation) every downstream seed file breaks.

With UUIDs, cross-service collision is impossible by design, but scopes can't be expressed as ranges. The pattern instead is: seed the service that owns the ID first, query the actual values back, pass them explicitly to the next scope.

# Seed the user service first
SEEDFAST_DSN="$USERS_DB_URL" seedfast seed \
  --scope "50 users with profiles and addresses"

# Query the IDs back โ€” works identically for integer or UUID keys
USER_IDS=$(psql "$USERS_DB_URL" -t -A -c \
  "SELECT id FROM users ORDER BY created_at DESC LIMIT 50" | paste -sd,)

# Pass them explicitly to the next service's scope
SEEDFAST_DSN="$ORDERS_DB_URL" seedfast seed \
  --scope "200 orders for users with IDs: $USER_IDS"

If you're building a new system and can choose, UUID v7 is the pragmatic default โ€” collision-safe like UUID v4, but time-ordered so range-style reasoning still works.

The cross-service ID contracts doc your team maintains (see the Lessons section below) should record the ID type for every entity. That document is more durable than any individual seed script โ€” it's the schema contract every seed scope depends on.

First time seeing SEEDFAST_DSN? It's the Postgres connection string Seedfast's CLI reads from the environment, falling back to DATABASE_URL if unset. Using a dedicated variable keeps Seedfast from conflicting with other tools in your CI environment that also read DATABASE_URL.

If your services use direct REST or gRPC calls rather than events, skim this section โ€” the warnings apply only to event-driven architectures.

Some microservice architectures use event sourcing or event-driven communication. The order service doesn't call the payment service directly โ€” it emits an OrderCreated event, and the payment service builds its state from that event stream. Seeding the payment database directly, without replaying events, produces state that could never exist in a real system.

# What actually happens in production:
OrderCreated { order_id: 1001, user_id: 42, total: 29.99 }
  -> PaymentService creates payment { order_id: 1001, amount: 29.99, status: "pending" }
  -> NotificationService sends "Order received" to user 42

# What happens when you seed databases independently:
payments_db has payment { order_id: 9999 }  -- order 9999 doesn't exist
notifs_db has notification for user 7777    -- user 7777 doesn't exist

Inserting directly into the read-model database produces aggregates that are inconsistent with the event log โ€” state that could never exist if the system were running normally. The projection version is wrong, downstream replays diverge, and the first production-shaped test that reads the event log fails to reconstruct the aggregate correctly.

For Postgres-based event stores, the fix is to seed the event log directly and let the projection rebuild:

# Seed the event log, not the read model
SEEDFAST_DSN="$EVENTS_DB_URL" seedfast seed \
  --scope "1000 domain events: OrderCreated, PaymentCaptured, OrderShipped,
           with realistic aggregate_id grouping and monotonic timestamps"

# Replay events into the read model via your service's projection worker.
# The exact command is service-specific โ€” replace with whatever rebuilds
# your projections (a rake task, a Go CLI subcommand, etc.):
docker compose run --rm order-service project-events --from-genesis

For Kafka-backed stores the same idea applies, but injection is done via a producer script (kafka-console-producer or equivalent) โ€” the article you're reading is already long, so we won't duplicate Kafka-specific tooling here. The principle is identical: seed the event stream, let the projection build the read model.

Direct read-model seeding is acceptable only when the read model is rebuilt on every restart (test containers with --rm) and you never need to inspect event history.

When confronted with cross-service seeding, teams typically reach for one of these approaches. They all look reasonable at first. They all break.

The ordering approach. Seed users first (they have no dependencies), then products, then orders (depends on users and products), then payments (depends on orders), then notifications (depends on everything).

#!/bin/bash
# seed_all.sh -- the script that someone wrote at 2 AM

echo "Seeding user service..."
psql $USERS_DB < seeds/users.sql

echo "Seeding product service..."
psql $PRODUCTS_DB < seeds/products.sql

echo "Seeding order service..."
psql $ORDERS_DB < seeds/orders.sql

echo "Seeding payment service..."
psql $PAYMENTS_DB < seeds/payments.sql

echo "Seeding notification service..."
psql $NOTIFS_DB < seeds/notifications.sql

It works until the product team changes their ID generation from sequential integers to UUIDs. Or until the user service adds a required tenant_id column. Or until a new service appears that nobody adds to the script. The seed files reference each other by hardcoded IDs, and any change to one file requires updating all downstream files.

Maintenance cost: proportional to the square of the number of services.

"Let's just dump 1% of production data from each service."

# "Clever" approach: dump a consistent slice

psql "$USERS_DB_URL" -c "\copy (SELECT * FROM users WHERE id < 1000) TO 'users_slice.sql'"
psql "$ORDERS_DB_URL" -c "\copy (SELECT * FROM orders WHERE user_id < 1000) TO 'orders_slice.sql'"
# The third query needs a cross-database join โ€” already a smell:
psql "$PAYMENTS_DB_URL" -c "\copy (SELECT * FROM payments WHERE order_id IN (...)) TO 'payments_slice.sql'"

You now have real production data sitting in every developer's laptop. GDPR, HIPAA, and SOC 2 each restrict where personal data may reside โ€” developer laptops routinely fail those checks. The subsets are also nearly impossible to keep consistent โ€” if you dump users with id < 1000 from the user service, you need to find all orders belonging to those users, all payments for those orders, all notifications for those users. That's a cross-database join across five databases. Someone writes a script that approximates this, and it works 90% of the time. The other 10% produces orphan references that cause subtle, intermittent test failures. For the full rationale on avoiding this pattern, see staging without production data.

A single repository with JSON or SQL fixtures that every service reads:

test-fixtures/
  users.json       # { "users": [{ "id": 1, ... }, ...] }
  products.json
  orders.json      # references user IDs from users.json
  payments.json    # references order IDs from orders.json

Every service imports the relevant fixtures during testing. This works for small datasets with stable schemas, but it has the same coupling problem as the ordered script โ€” changing the users fixture requires updating every fixture that references user IDs. It also forces every service to depend on a shared repository, which undermines the independence that microservices are supposed to provide.

And the fixtures are always tiny. 10 users, 20 orders. Nobody maintains a fixture set with 50,000 users and realistic distributions across five services.

There's no silver bullet for cross-service seeding. But there are strategies that hold up better than the anti-patterns above.

Define explicit ID ranges or conventions that all services agree on. Seed services in dependency order, using IDs from the agreed-upon ranges.

# seed-config.yaml -- shared convention
id_ranges:
  users: 1-10000
  products: 100001-110000
  orders: 200001-300000
  payments: 400001-500000

seeding_order:
  - user_service     # no dependencies
  - product_service  # no dependencies
  - order_service    # depends on users, products
  - payment_service  # depends on orders, users
  - notification_service  # depends on users, orders

This works if you enforce the convention and every team respects it. The downside is rigidity: the ID ranges are arbitrary constraints that don't exist in production, and they can mask bugs related to ID collision or generation strategy.

Instead of inserting directly into each database, use each service's API to create data. Seed users through the user service API. Use the returned user IDs to create orders through the order service API. Use the returned order IDs to create payments.

# seed_via_apis.py
import requests

# Create users
users = []
for i in range(100):
    resp = requests.post("http://user-service/api/users", json={
        "name": f"Test User {i}",
        "email": f"user{i}@test.com"
    })
    users.append(resp.json())

# Create orders using real user IDs
orders = []
for user in users[:50]:
    resp = requests.post("http://order-service/api/orders", json={
        "user_id": user["id"],  # real ID from user service
        "product_id": "prod-101",
        "quantity": 2
    })
    orders.append(resp.json())

The approach guarantees consistency โ€” you're using the actual IDs that each service generates. It also triggers events, so downstream services (payments, notifications) get their data through the normal event flow.

The downside: it's slow. Creating 10,000 orders through an API that creates them one at a time takes minutes. Creating 100,000 takes an unacceptable amount of time. You're also limited by API capabilities โ€” if there's no bulk creation endpoint, you're making N HTTP requests. And if any service is down during seeding, the entire chain breaks.

Plain-language scope descriptions are where cross-service coordination becomes powerful. Instead of hardcoding IDs or chaining API calls, you describe what the data should look like and let Seedfast resolve the references.

For each service database, you seed with a scope that describes its role in the broader system:

# Seed user service -- the root of the dependency graph
SEEDFAST_DSN="$USERS_DB_URL" seedfast seed \
  --scope "1,000 users with varied profiles, addresses, and preferences"

# Seed product service -- independent root
SEEDFAST_DSN="$PRODUCTS_DB_URL" seedfast seed \
  --scope "200 products across 10 categories with pricing tiers"

# Seed order service -- references users and products
SEEDFAST_DSN="$ORDERS_DB_URL" seedfast seed \
  --scope "5,000 orders referencing user IDs 1-1000 and product IDs 1-200,
           with realistic status distribution across the last 6 months"

# Seed payment service -- references orders and users
SEEDFAST_DSN="$PAYMENTS_DB_URL" seedfast seed \
  --scope "payments for orders with IDs matching the order service,
           mix of completed, pending, and refunded statuses"

# Seed notification service -- references everything
SEEDFAST_DSN="$NOTIFS_DB_URL" seedfast seed \
  --scope "notifications for users 1-1000, referencing recent orders,
           including order confirmations, shipping updates, and payment receipts"

Each database is seeded independently, but the scope description creates implicit coordination. The user IDs in the order service match the user IDs in the user service. The order IDs in the payment service match the order IDs in the order service. Seedfast handles referential integrity within each database; your scope descriptions handle cross-service consistency.

It isn't magic. You still need to think about which ID ranges overlap. But the maintenance burden drops dramatically โ€” when the user service adds a new column, you don't need to update five fixture files. Re-run the same scope description, and Seedfast adapts to the updated schema.

The scope examples in the rest of this article use integer ranges for readability. For UUID-based systems, compose this with the query-back pattern from the UUID section above โ€” seed the root service first, fetch the generated UUIDs via psql, pass them into downstream scopes as a comma-separated list.

Let's walk through seeding a complete e-commerce platform with five services. This is a real topology that many teams operate.

The Services

  • Users (users_db): users, addresses, preferences โ€” No dependencies (root)
  • Products (products_db): products, categories, inventory โ€” No dependencies (root)
  • Orders (orders_db): orders, order_items โ€” References user_id, product_id
  • Payments (payments_db): payments, refunds โ€” References user_id, order_id
  • Notifications (notifs_db): notifications, templates โ€” References user_id, order_id
users_service   โ”€โ”€โ”
                  โ”œโ”€โ”€โ–บ  order_service  โ”€โ”€โ”ฌโ”€โ”€โ–บ  payment_service  โ”€โ”€โ”
product_service โ”€โ”€โ”˜                      โ”‚                        โ”‚
                                         โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ–บ  notification_service

Users and products are roots โ€” they can be seeded first, in any order. Orders depend on both. Payments depend on orders (and transitively on users). Notifications depend on everything.

#!/bin/bash
set -euo pipefail

# Phase 1: Seed root services (no cross-service dependencies)
echo "Phase 1: Seeding root services..."

SEEDFAST_DSN="$USERS_DB_URL" seedfast seed \
  --scope "1,000 users with names, emails, phone numbers,
           billing and shipping addresses, and notification preferences" \
  --output plain &

SEEDFAST_DSN="$PRODUCTS_DB_URL" seedfast seed \
  --scope "500 products across 15 categories including electronics,
           clothing, and home goods, with prices ranging from 5 to 500 dollars,
           and inventory counts" \
  --output plain &

wait
echo "Phase 1 complete."

# Phase 2: Seed services that depend on roots
echo "Phase 2: Seeding order service..."

SEEDFAST_DSN="$ORDERS_DB_URL" seedfast seed \
  --scope "8,000 orders for user IDs 1-1000 referencing product IDs 1-500,
           with 1-5 items per order, status distribution of 60% completed
           25% shipped 10% processing 5% cancelled,
           spread across the last 12 months" \
  --output plain

echo "Phase 2 complete."

# Phase 3: Seed services that depend on orders
echo "Phase 3: Seeding downstream services..."

SEEDFAST_DSN="$PAYMENTS_DB_URL" seedfast seed \
  --scope "payments for order IDs 1-8000, with amounts matching order totals,
           90% captured 5% authorized 3% refunded 2% failed" \
  --output plain &

SEEDFAST_DSN="$NOTIFS_DB_URL" seedfast seed \
  --scope "notifications for user IDs 1-1000 about order IDs 1-8000,
           including order confirmation, shipping update, and delivery
           confirmation types, with timestamps after the corresponding order dates" \
  --output plain &

wait
echo "Phase 3 complete. All services seeded."

Phase 1 seeds roots in parallel. Phase 2 seeds services that depend on roots. Phase 3 seeds services that depend on phase 2 โ€” again in parallel. Total wall clock time is roughly the time of the slowest service in each phase, not the sum of all services.

See it run on your setup. Seedfast's phased seeding works against any Postgres-compatible database โ€” Supabase, RDS, Neon, self-hosted. Try it with the getting-started guide.

A successful seed run doesn't always mean the data is consistent across separate databases. Seedfast exits with an error on constraint violations within a single database, but cross-service references are invisible to it โ€” the order service can successfully insert 8,000 orders referencing users 1โ€“1000, while the user service ended up with only 800 rows because a scope quirk hit a unique constraint at row 801. A one-page verification script catches this before your tests run:

#!/bin/bash
# verify-seed-consistency.sh -- runs after all phases complete
set -euo pipefail

# Dump IDs from each service to temp files
psql "$USERS_DB_URL"  -t -A -c "SELECT id FROM users"        | sort -u > /tmp/users.ids
psql "$ORDERS_DB_URL" -t -A -c "SELECT user_id FROM orders"  | sort -u > /tmp/order-users.ids

# Any user_id in orders that doesn't exist in the user service?
ORPHANS=$(comm -23 /tmp/order-users.ids /tmp/users.ids | wc -l)

if [ "$ORPHANS" -gt 0 ]; then
  echo "FAIL: $ORPHANS orders reference non-existent users"
  comm -23 /tmp/order-users.ids /tmp/users.ids | head -5
  exit 1
fi

echo "OK: orders โ†” users cross-service references are consistent"

The check uses only standard psql and shell tools โ€” no dblink, no extensions, no superuser. It works on Supabase, RDS, Neon, and anywhere else your five microservice databases might live. Add a block per cross-service reference you care about (payments โ†” orders, notifications โ†” users, etc.) and the whole check runs in two or three seconds. The script assumes a POSIX shell โ€” on Windows, run it in WSL or Git Bash.

The alternative is discovering the inconsistency when the third integration test fails with a 404 for an entity that "definitely should exist". A few seconds of verification here beats hours of chasing a phantom test failure later.

Plugging phased seeding into Docker Compose or CI/CD is mechanical โ€” one environment variable per database, seeds after migrations, parallelize within phases. The patterns are the same ones Seedfast uses everywhere else. For the full walkthrough โ€” ephemeral Postgres services in GitHub Actions, SEEDFAST_DSN per service, exit-code handling, Docker Compose templates โ€” see CI/CD database seeding.

After talking to teams running 5-20 microservices, a few patterns consistently emerge.

Most teams discover their implicit ID dependencies when seeding breaks. Document them explicitly:

# Cross-Service Data Contracts

## user_id
- Source: user-service (users.id, auto-increment)
- Referenced by: order-service, payment-service, notification-service
- Format: integer, 1-based

## order_id
- Source: order-service (orders.id, UUID v4)
- Referenced by: payment-service, notification-service
- Format: UUID

## product_id
- Source: product-service (products.id, auto-increment)
- Referenced by: order-service (order_items.product_id)
- Format: integer, 1-based

That document becomes the source of truth for seeding, testing, and debugging. It also helps new developers understand how data flows across services โ€” something that's notoriously opaque in microservice architectures.

Not every E2E test needs all five databases seeded. If you're testing the checkout flow, you need users, products, and orders. You don't need the notification service's database โ€” the notification service can tolerate missing user profiles gracefully (or it should). Seed only the services that your test scenario actually exercises โ€” the same minimalism behind scoped E2E test fixtures.

# Checkout flow test: only seed what the checkout touches
SEEDFAST_DSN="$USERS_DB_URL" seedfast seed --scope "10 users with addresses"
SEEDFAST_DSN="$PRODUCTS_DB_URL" seedfast seed --scope "20 products with inventory"
# orders and payments will be created by the test itself

In production, microservice databases are eventually consistent. There are windows where the order service has an order that the payment service hasn't processed yet. Your test data can reflect this reality. Not every cross-service reference needs to be perfect โ€” some tests specifically need to verify how services handle missing references.

# Deliberately seed some orphan references to test error handling
SEEDFAST_DSN="$ORDERS_DB_URL" seedfast seed \
  --scope "100 orders, 10% with user IDs that don't exist in the user service,
           to test the order service's graceful degradation"

Treat it as a feature, not a bug. Your services should handle missing cross-service data, and your tests should verify that they do.

Microservice data seeding is genuinely hard. There's no tool โ€” including Seedfast โ€” that makes it trivial. The fundamental challenge is that microservices trade data consistency for operational independence, and seeding is where that tradeoff becomes viscerally obvious.

What you can do is reduce the manual coordination. Define your cross-service contracts. Seed in dependency order. Use scope descriptions to maintain loose coupling between seed runs. And accept that perfect consistency across five independently-seeded databases requires deliberate effort.

The monolith's seed.sql was simple because the monolith was simple. The microservice equivalent is a phased seeding pipeline with explicit ID contracts โ€” more complex, but manageable if you treat it as a first-class engineering problem instead of an afterthought.

Seed services with no cross-service dependencies first (typically users and products), then services that reference those IDs (orders), then downstream services (payments, notifications). The phased seeding script above runs each phase in parallel to minimize total wall-clock time.

Either define explicit ID ranges per service in a shared config (users 1โ€“10,000; orders 200,001โ€“300,000) and honor them in every seed scope, or use query-back seeding where you seed the user service first and pass the returned IDs into downstream scopes. Both approaches are covered under Strategies 1 and 3.

UUIDs eliminate collision risk when services generate IDs independently, but they make seed scopes harder to write as ranges. If your services already use integers, keep them and coordinate ranges. If you're starting fresh, UUID v7 gives you collision safety plus time-ordering โ€” the best of both.

Not safely. Consistent cross-database subsets require joins across separate hosts, which is nearly impossible to keep referentially clean, and you put real PII on every developer's laptop. The anti-patterns section above covers the 10% edge-case failure rate this strategy produces.

You produce state that could never exist in production โ€” aggregates inconsistent with the event log. The correct approach is to seed the event stream and let the projection rebuild from it. See the Event-Sourced State section above.

CI seeds should finish in under 30 seconds โ€” typically 10ร— smaller than local dev. The phased seeding script in this article uses 1,000 users and 8,000 orders for local development; in CI, drop the same scopes to roughly 100 users and 500 orders. The scope descriptions are identical; only the counts change. See also large-volume seeding for tuning Postgres itself when you do need big runs.

Get Started | Documentation | Pricing

Seedfast seeds each database independently while respecting the cross-service relationships you describe. No shared fixtures, no coordination scripts, no production data copies.