Documentation

Data Realism

Seedfast doesn't produce random placeholder values. It reads your schema and generates data that fits what your product is actually for.

How domain inference works

Seedfast analyzes table names, column names, data types, and constraints to understand the domain before generating anything. This analysis happens automatically — no configuration required.

A table named orders with columns like status, total_amount, and placed_at tells Seedfast enough to generate plausible order statuses (pending, shipped, delivered), realistic monetary amounts, and timestamps distributed across a reasonable time window.

A table named patients with columns like date_of_birth, diagnosis_code, and attending_physician_id gets a different treatment — Seedfast generates data that fits a medical context, not an e-commerce one.

Examples by domain

E-commerce

Tables like products, orders, customers, order_items get:

Product names and descriptions that match the product category
Order statuses drawn from a realistic lifecycle
Pricing with plausible distribution (not uniform random)
Timestamps that reflect realistic purchase patterns

SaaS

Tables like users, workspaces, subscriptions, activity_logs get:

Email addresses that look real (not user1@test.com)
Workspace names that fit a B2B context
Subscription plans matched to your plan column values
Activity events in logical sequence

HR and internal tools

Tables like employees, departments, salaries, performance_reviews get:

Full names with realistic distributions
Job titles and department names that make sense together
Salary ranges calibrated to role and seniority inferred from column context

Default quality vs. guided quality

By default, Seedfast squeezes the maximum quality it can from your schema alone. The more signal available in your schema — descriptive column names, meaningful enum values, well-named foreign keys — the better the results.

When you describe what you need in the scope, results improve further:

# Seedfast infers what it can from the schema
seedfast seed --scope "seed 100 users"

# More context → better data quality
seedfast seed --scope "seed 100 enterprise customers, mostly US-based, with active subscriptions"

The second command produces customers with company names, US addresses, and subscription records that reflect an active state — because you told Seedfast what "customers" means in your context.

Refining realism in interactive mode

When running seedfast seed without --scope, Seedfast proposes a plan and shows you what it intends to generate. At this point you can refine not just the volume but the character of the data:

Approve? (Y/n): make the orders use European addresses and EUR currency

Seedfast incorporates the instruction and replans. See Scoping for the full interactive workflow.

What stays consistent

Across all generated data:

Foreign keys are always valid — Seedfast resolves insert order and generates parent records before dependent ones
Relationships are coherent — an order belongs to a real customer; a subscription belongs to a real workspace
Enum and check constraints are respected — status columns only contain values your schema allows
Data types match exactly — UUIDs, timestamps, integers, and text fields are generated in the correct format

Privacy

Seedfast generates data from scratch. Your production records are never used as source material or training input. The AI receives only schema metadata — table names, column types, and constraints. See Privacy & Data Handling for details.

Large-Volume Seeding How It Works