Data Realism
Seedfast doesn't produce random placeholder values. It reads your schema and generates data that fits what your product is actually for.
How domain inference works#
Seedfast analyzes table names, column names, data types, and constraints to understand the domain before generating anything. This analysis happens automatically — no configuration required.
A table named orders with columns like status, total_amount, and placed_at tells Seedfast enough to generate plausible order statuses (pending, shipped, delivered), realistic monetary amounts, and timestamps distributed across a reasonable time window.
A table named patients with columns like date_of_birth, diagnosis_code, and attending_physician_id gets a different treatment — Seedfast generates data that fits a medical context, not an e-commerce one.
Examples by domain#
E-commerce
Tables like products, orders, customers, order_items get:
- Product names and descriptions that match the product category
- Order statuses drawn from a realistic lifecycle
- Pricing with plausible distribution (not uniform random)
- Timestamps that reflect realistic purchase patterns
SaaS
Tables like users, workspaces, subscriptions, activity_logs get:
- Email addresses that look real (not
user1@test.com) - Workspace names that fit a B2B context
- Subscription plans matched to your plan column values
- Activity events in logical sequence
HR and internal tools
Tables like employees, departments, salaries, performance_reviews get:
- Full names with realistic distributions
- Job titles and department names that make sense together
- Salary ranges calibrated to role and seniority inferred from column context
Default quality vs. guided quality#
By default, Seedfast squeezes the maximum quality it can from your schema alone. The more signal available in your schema — descriptive column names, meaningful enum values, well-named foreign keys — the better the results.
When you describe what you need in the scope, results improve further:
# Seedfast infers what it can from the schema
seedfast seed --scope "seed 100 users"
# More context → better data quality
seedfast seed --scope "seed 100 enterprise customers, mostly US-based, with active subscriptions"
The second command produces customers with company names, US addresses, and subscription records that reflect an active state — because you told Seedfast what "customers" means in your context.
Refining realism in interactive mode#
When running seedfast seed without --scope, Seedfast proposes a plan and shows you what it intends to generate. At this point you can refine not just the volume but the character of the data:
Approve? (Y/n): make the orders use European addresses and EUR currency
Seedfast incorporates the instruction and replans. See Scoping for the full interactive workflow.
What stays consistent#
Across all generated data:
- Foreign keys are always valid — Seedfast resolves insert order and generates parent records before dependent ones
- Relationships are coherent — an order belongs to a real customer; a subscription belongs to a real workspace
- Enum and check constraints are respected — status columns only contain values your schema allows
- Data types match exactly — UUIDs, timestamps, integers, and text fields are generated in the correct format
Privacy#
Seedfast generates data from scratch. Your production records are never used as source material or training input. The AI receives only schema metadata — table names, column types, and constraints. See Privacy & Data Handling for details.