Referential Integrity, Explained for Working Developers
By the Seedfast team ·
Key Takeaways#
- Referential integrity is the property that every foreign key in a row points to a row that actually exists in the parent table — nothing more elaborate than that
- The database enforces referential integrity through
FOREIGN KEYconstraints; anything that produces data — your application, your migrations, your seed scripts — has to honor it on the way in. Same property, two responsibilities - The five constraint actions (
CASCADE,NO ACTION,RESTRICT,SET NULL,SET DEFAULT) decide what the database does when a parent row is updated or deleted while child rows still reference it - When referential integrity breaks, the symptoms are quiet: orphaned rows, joins that drop records, dashboards that under-count
- For test data, referential integrity is mostly a producer-side problem: insert order, optional vs. required relations, and cyclic foreign keys are where developers actually hit it
What is referential integrity?#
Referential integrity is the property that every foreign key in a row points to a row that actually exists in the parent table.
That's the whole definition. The textbook version uses "tuple" instead of "row" and defines it over relations rather than tables, but for a developer working in Postgres or MySQL, the working definition is the one above.
A small example. You have a customers table and an orders table:
CREATE TABLE customers (
id bigserial PRIMARY KEY,
email text NOT NULL UNIQUE
);
CREATE TABLE orders (
id bigserial PRIMARY KEY,
customer_id bigint NOT NULL REFERENCES customers(id),
placed_at timestamptz NOT NULL DEFAULT now()
);
The REFERENCES customers(id) clause is what turns customer_id into a foreign key. From that line forward, the database holds you to one rule: every value in orders.customer_id must match the id of some row in customers. That rule, applied across every row at every moment, is referential integrity.
What it isn't: data integrity in general. Data integrity is a broader umbrella that also covers domain constraints (age >= 0), uniqueness (email UNIQUE), and entity integrity (no row without a primary key). Referential integrity is specifically about the relationship between a child row and its parent. People conflate the two, but they're separate properties — a row can satisfy every domain constraint and still violate referential integrity by referencing a parent that was deleted last Tuesday.
The other distinction worth making early: the database is on the enforcing side of this property. Anything that produces data — an application, an ORM migration, a fixture loader, a schema-aware generator like Seedfast — operates on the producer side. They have to honor the same property the database is enforcing. That split runs through the rest of the article, and it's the source of most confusion when "referential integrity" shows up in an error message.
How databases enforce referential integrity#
The database checks three operations against every foreign key:
INSERTinto the child — the database rejects an insert if the foreign-key value doesn't exist in the parent.UPDATEon the parent or child — if you change a child's foreign-key value, it must still resolve to a parent row. If you change a parent's primary key (rare, but legal in Postgres), the change has to propagate to the children, or the update is rejected.DELETEon the parent — you can't delete a parent row that still has children pointing at it, unless you've configured the constraint to do something else.
That last clause matters. The "do something else" is configured through the constraint's referential actions, declared on the foreign key:
CREATE TABLE orders (
id bigserial PRIMARY KEY,
customer_id bigint NOT NULL
REFERENCES customers(id)
ON DELETE RESTRICT
ON UPDATE CASCADE,
placed_at timestamptz NOT NULL DEFAULT now()
);
Postgres, MySQL/InnoDB, and SQL Server all support the same five actions, with minor syntax variations:
NO ACTION(the default) — the database raises an error and rolls back the transaction. Postgres defers the check until the end of the statement, which lets you do multi-row updates that are momentarily inconsistent and resolve them before commit.RESTRICT— same outcome asNO ACTIONin most cases, but the check fires immediately rather than at end-of-statement. Useful when you want failures to surface early.CASCADE— the database applies the same change to the children.ON DELETE CASCADEdeletes the orders when their customer is deleted;ON UPDATE CASCADErewrites the foreign-key value when the parent's key changes.SET NULL/SET DEFAULT— the foreign key on the child rows is set toNULLor to the column's default, leaving the children orphaned-but-valid. Requires the column to be nullable, or to have a default that points to a real parent row.
Picking the right action is design work, not configuration. CASCADE fits when the child has no independent meaning — a pageviews row makes no sense without its session. RESTRICT fits when the child carries history you can't lose — an order belongs to a customer for accounting purposes long after the customer record is closed. SET NULL fits when the parent is metadata that may go away — an assigned_to user on a ticket. The default NO ACTION is the right starting point when you haven't decided yet.
What goes wrong when referential integrity breaks#
The dramatic version is "your data becomes corrupt". The accurate version is quieter:
- Orphaned rows. A child row points to a parent that no longer exists. The row is still in the table, still queryable, still backed up. It just has nowhere to go on the next
JOIN. Reports filter it out without saying so. Dashboards under-count. - Broken joins.
INNER JOINsilently drops rows whose foreign key resolves to nothing. The query runs fine; the result is just incomplete. The teams that catch this are usually the ones who notice their numbers don't add up across two reports. - Re-importable, but not really. A backup of a child table won't load into a constrained schema if its parent rows aren't loaded first. Referential integrity also constrains your import paths.
- Migrations that fail at the worst moment. A migration that adds
NOT NULL REFERENCESto an existing column will fail on any row whose value doesn't already point to a real parent. Most teams find this out in staging by accident, two days before a release.
In Postgres the symptom you'll see during seeding or testing is usually:
ERROR: insert or update on table "orders" violates foreign key constraint "orders_customer_id_fkey"
DETAIL: Key (customer_id)=(42) is not present in table "customers".
Worth noticing what this means: the database is doing its job. It's the producer of the data — your seed script, your migration, your fixture loader — that put a child in the table before its parent. Which brings us to the part most articles skip.
Referential integrity in test data#
Most developers don't first encounter the phrase "referential integrity" in production. They encounter it in test environments, during seeding or fixture loading, in the form of violates foreign key constraint errors. And that's where the framing usually goes wrong.
The instinct is to treat the error as a referential-integrity bug — to ask whether the constraint is "right". It usually is. The database is enforcing referential integrity correctly. The bug is on the producer side: the part of the stack writing rows in the wrong order, or with values that don't yet have parents. Referential integrity isn't something you turn on or off in a seeder; it's something the producer has to honor on the way in. The same is true for database seeding generally and for the broader practice of test data management.
That distinction has practical consequences:
Insert order is the actual hard problem. If a schema has 30 tables and 60 foreign keys, a seed script that inserts in alphabetical or migration order will hit a constraint error on the first child whose parent hasn't been written yet. The fix is a topological sort over the foreign-key dependency graph: parents first, children after. For a small schema you can write this by hand. For anything past a tutorial app, it grows fragile fast — every schema change re-orders the graph. This is the failure mode behind most foreign key constraint errors during seeding.
Optional vs. required relations matter. A nullable foreign-key column means "this row may or may not have a parent". A posts table with author_id bigint NULL REFERENCES users(id) allows posts with no author — useful for system-generated posts, awkward for queries that assume a join. The choice of nullable vs. non-nullable is design work; once it's made, the producer has to populate the column accordingly.
Cyclic foreign keys exist. If companies.primary_contact_id REFERENCES employees(id) and employees.company_id REFERENCES companies(id), neither table can be inserted first without violating referential integrity. Postgres supports DEFERRABLE constraints to defer the check until the end of the transaction, but the producer still has to know to use them. We cover this case directly under circular foreign keys during seeding; it's a small enough corner that most articles skip it.
This producer-side work is what schema-aware test-data generators address. Seedfast reads your live schema (point seedfa.st at your database), builds the foreign-key dependency graph, and inserts parent rows before child rows — including for cyclic and self-referential cases. That isn't the same statement as "Seedfast preserves referential integrity"; the database does that. Seedfast generates rows that satisfy the schema's constraints, in an order the database accepts, so you don't see the violates foreign key constraint error on the way in. If you've already moved on from hand-rolled fixtures to synthetic test data, this is the producer-side discipline the practice rests on.
For teams in regulated industries, the producer-side framing is a compliance question as much as a correctness one — generating from the schema rather than copying production helps keep PII out of dev environments. The same insert-order discipline shows up in test data for HIPAA workflows and GDPR-aligned test data.
Edge cases worth knowing#
A few corners the constraint-action table doesn't cover:
- Many-to-many bridge tables. A
posts_tagsrow holds two foreign keys, one topostsand one totags. Both must resolve. The producer has to insert both parents before any bridge row, and the bridge row is the canonical place where insert-order bugs surface in tests. - Self-referential foreign keys. An
employees.manager_id REFERENCES employees(id)column is legal and common (org charts, threaded comments). Producers have to insert the root rows first and order their descendants by depth — or use deferred constraints. - Soft deletes vs.
CASCADE. If you mark rows deleted by setting adeleted_attimestamp instead of issuingDELETE,ON DELETE CASCADEnever fires. Children remain pointing at "deleted" parents. That's a separate design choice worth being explicit about; it isn't a referential-integrity failure, but it interacts with how you reason about orphans. - Deferrable constraints. Postgres lets you mark a foreign-key constraint
DEFERRABLE INITIALLY IMMEDIATE(orDEFERRED) so the check is postponed to commit. Useful for cyclic graphs and bulk loads. The constraint is still in force; you've just changed when it's evaluated. - Audit and compliance contexts. Industries with audit constraints — PCI DSS test data for cardholder data, SOC 2 test data for service-org controls — push the producer-side reasoning further: the producer has to honor referential integrity and avoid leaking real records into the schema at the same time.
These edge cases share a pattern: the database is doing the simple thing, and the producer of the data needs more nuance to feed it correctly.
Related guides#
- Test Data Management — the broader practice of producing data that satisfies a schema without copying production
- Synthetic Test Data — schema-aware generation as the producer-side discipline this article describes
- Get started with Seedfast — point Seedfast at your database and watch the foreign-key graph resolve end to end