Seedfast

Seedfast

Data Realism

Seedfast doesn't produce random placeholder values. It reads your schema and generates data that fits what your product is actually for.

How domain inference works#

Seedfast analyzes table names, column names, data types, and constraints to understand the domain before generating anything. This analysis happens automatically — no configuration required.

A table named orders with columns like status, total_amount, and placed_at tells Seedfast enough to generate plausible order statuses (pending, shipped, delivered), realistic monetary amounts, and timestamps distributed across a reasonable time window.

A table named patients with columns like date_of_birth, diagnosis_code, and attending_physician_id gets a different treatment — Seedfast generates data that fits a medical context, not an e-commerce one.

Examples by domain#

E-commerce

Tables like products, orders, customers, order_items get:

  • Product names and descriptions that match the product category
  • Order statuses drawn from a realistic lifecycle
  • Pricing with plausible distribution (not uniform random)
  • Timestamps that reflect realistic purchase patterns

SaaS

Tables like users, workspaces, subscriptions, activity_logs get:

  • Email addresses that look real (not user1@test.com)
  • Workspace names that fit a B2B context
  • Subscription plans matched to your plan column values
  • Activity events in logical sequence

HR and internal tools

Tables like employees, departments, salaries, performance_reviews get:

  • Full names with realistic distributions
  • Job titles and department names that make sense together
  • Salary ranges calibrated to role and seniority inferred from column context

Default quality vs. guided quality#

By default, Seedfast squeezes the maximum quality it can from your schema alone. The more signal available in your schema — descriptive column names, meaningful enum values, well-named foreign keys — the better the results.

When you describe what you need in the scope, results improve further:

# Seedfast infers what it can from the schema
seedfast seed --scope "seed 100 users"

# More context → better data quality
seedfast seed --scope "seed 100 enterprise customers, mostly US-based, with active subscriptions"

The second command produces customers with company names, US addresses, and subscription records that reflect an active state — because you told Seedfast what "customers" means in your context.

Refining realism in interactive mode#

When running seedfast seed without --scope, Seedfast proposes a plan and shows you what it intends to generate. At this point you can refine not just the volume but the character of the data:

Approve? (Y/n): make the orders use European addresses and EUR currency

Seedfast incorporates the instruction and replans. See Scoping for the full interactive workflow.

What stays consistent#

Across all generated data:

  • Foreign keys are always valid — Seedfast resolves insert order and generates parent records before dependent ones
  • Relationships are coherent — an order belongs to a real customer; a subscription belongs to a real workspace
  • Enum and check constraints are respected — status columns only contain values your schema allows
  • Data types match exactly — UUIDs, timestamps, integers, and text fields are generated in the correct format

Privacy#

Seedfast generates data from scratch. Your production records are never used as source material or training input. The AI receives only schema metadata — table names, column types, and constraints. See Privacy & Data Handling for details.