What is Referential Integrity? Definition and Examples

By Mikhail Shytsko, Founder at Seedfast · May 1, 2026 · Updated July 10, 2026

What is referential integrity?

Referential integrity is a relational-database property that ensures every foreign key value in a child row points to a primary key value that actually exists in the parent table. It is enforced by FOREIGN KEY constraints, and it is the rule that keeps the relationships between tables from drifting into orphan rows over time.

Put plainly, no child row may reference a parent that doesn't exist — a orders.user_id value of 42 is only valid while users.id = 42 is still in the database.

Defined more formally, for every foreign key constraint the set of values in the child column must be a subset of the values in the referenced parent column, at every moment the database is queryable. The textbook version uses "tuple" instead of "row" and defines it over relations rather than tables, but for a developer working in Postgres or MySQL, the working definition is the one above.

Summary

A foreign key that doesn't match a live row in the parent table is the one thing referential integrity rules out — nothing more elaborate than that
FOREIGN KEY constraints are how the database enforces the property, but every writer of data — app code, migrations, seed scripts alike — is on the hook for satisfying it before a row ever reaches the table
The five constraint actions (CASCADE, NO ACTION, RESTRICT, SET NULL, SET DEFAULT) decide what the database does when a parent row is updated or deleted while child rows still reference it
When referential integrity breaks, the symptoms are quiet: orphaned rows, joins that drop records, dashboards that under-count
For test data, referential integrity is mostly a producer-side problem: insert order, optional vs. required relations, and cyclic foreign keys are where developers actually hit it

A small example. You have a users table and an orders table:

CREATE TABLE users (
    id           bigserial PRIMARY KEY,
    email        text NOT NULL UNIQUE
);

CREATE TABLE orders (
    id           bigserial PRIMARY KEY,
    user_id      bigint NOT NULL REFERENCES users(id),
    placed_at    timestamptz NOT NULL DEFAULT now()
);

The REFERENCES users(id) clause is what turns user_id into a foreign key. From that line forward, the database holds you to one rule: every value in orders.user_id must match the id of some row in users. That rule, applied across every row at every moment, is referential integrity.

Referential integrity isn't the same thing as data integrity in general, which is the broader umbrella covering domain constraints (age >= 0), uniqueness (email UNIQUE), and entity integrity (no row without a primary key). Referential integrity is specifically about the relationship between a child row and its parent. People conflate the two, but they're separate properties — a row can satisfy every domain constraint and still violate referential integrity by referencing a parent that was deleted last Tuesday.

The other distinction worth making early: the database is on the enforcing side of this property. Anything that produces data — an application, an ORM migration, a fixture loader, a schema-aware generator like Seedfast — operates on the producer side. They have to honor the same property the database is enforcing. That split runs through the rest of the article, and it's the source of most confusion when "referential integrity" shows up in an error message.

How databases enforce referential integrity

The database checks three operations against every foreign key:

INSERT into the child — the database rejects an insert if the foreign-key value doesn't exist in the parent.
UPDATE on the parent or child — if you change a child's foreign-key value, it must still resolve to a parent row. If you change a parent's primary key (rare, but legal in Postgres), the change has to propagate to the children, or the update is rejected.
DELETE on the parent — you can't delete a parent row that still has children pointing at it, unless you've configured the constraint to do something else.

That last clause matters. The "do something else" is configured through the constraint's referential actions, declared on the foreign key:

CREATE TABLE orders (
    id           bigserial PRIMARY KEY,
    user_id      bigint NOT NULL
                 REFERENCES users(id)
                 ON DELETE RESTRICT
                 ON UPDATE CASCADE,
    placed_at    timestamptz NOT NULL DEFAULT now()
);

Postgres, MySQL/InnoDB, and SQL Server all support the same five actions, with minor syntax variations:

NO ACTION (the default) — the database raises an error and rolls back the transaction. Postgres defers the check until the end of the statement, which lets you do multi-row updates that are momentarily inconsistent and resolve them before commit.
RESTRICT — same outcome as NO ACTION in most cases, but the check fires immediately rather than at end-of-statement. Useful when you want failures to surface early.
CASCADE — the database applies the same change to the children. ON DELETE CASCADE deletes the orders when their user is deleted; ON UPDATE CASCADE rewrites the foreign-key value when the parent's key changes.
SET NULL / SET DEFAULT — the foreign key on the child rows is set to NULL or to the column's default, leaving the children orphaned-but-valid. Requires the column to be nullable, or to have a default that points to a real parent row.

Picking the right action is design work, not configuration. CASCADE fits when the child has no independent meaning — a pageviews row makes no sense without its session. RESTRICT fits when the child carries history you can't lose — an order belongs to a user for accounting purposes long after the user record is closed. SET NULL fits when the parent is metadata that may go away — an assigned_to user on a ticket. The default NO ACTION is the right starting point when you haven't decided yet.

Referential integrity in PostgreSQL: FOREIGN KEY, ON DELETE, DEFERRABLE

Yes, PostgreSQL enforces referential integrity by default through FOREIGN KEY constraints, and the check fires at the end of each statement, not at the end of the transaction. That detail is the source of most surprises when teams move bulk inserts or seed scripts onto Postgres: a multi-row INSERT ... SELECT can momentarily reference rows that the same statement is about to create, and the constraint still resolves cleanly — but a sequence of separate INSERTs cannot.

A complete FOREIGN KEY declaration with cascading updates and deletes:

CREATE TABLE users (
    id           bigserial PRIMARY KEY,
    email        text NOT NULL UNIQUE
);

CREATE TABLE orders (
    id           bigserial PRIMARY KEY,
    user_id      bigint NOT NULL
                 REFERENCES users(id)
                 ON DELETE CASCADE
                 ON UPDATE CASCADE,
    placed_at    timestamptz NOT NULL DEFAULT now()
);

All five `ON DELETE` referential actions in Postgres

Postgres supports five ON DELETE referential actions: NO ACTION (the default), RESTRICT, CASCADE, SET NULL, and SET DEFAULT. They control what happens when a DELETE would orphan a child row.

NO ACTION — the default. Raises a foreign-key violation if any child still references the parent; the check is deferred to end-of-statement so multi-row updates can resolve themselves.
RESTRICT — same outcome as NO ACTION, but the check fires immediately and cannot be deferred.
CASCADE — deletes the matching child rows along with the parent; the relationship continues to satisfy referential integrity automatically.
SET NULL — sets the child's foreign-key column to NULL, leaving the child row in place without a parent. Requires the column to be nullable.
SET DEFAULT — sets the child's foreign-key column to its declared DEFAULT. Requires that default to itself reference a real parent row, or you've just moved the violation one step downstream.

ON UPDATE accepts the same five actions and applies the same logic when a parent's primary key changes.

`DEFERRABLE INITIALLY DEFERRED` for transactions where order can't be controlled

DEFERRABLE INITIALLY DEFERRED makes Postgres hold every foreign-key check on the constraint until COMMIT, so children and parents can be inserted in any order inside a single transaction. That's the escape hatch for seed scripts, bulk imports, and any transaction where parent and child are written together but the order isn't fixed — which describes most test-data generation.

CREATE TABLE orders (
    id           bigserial PRIMARY KEY,
    user_id      bigint NOT NULL
                 REFERENCES users(id)
                 ON DELETE CASCADE
                 DEFERRABLE INITIALLY DEFERRED,
    placed_at    timestamptz NOT NULL DEFAULT now()
);

With INITIALLY DEFERRED, Postgres holds the foreign-key check until COMMIT. You can insert children before parents inside the transaction; as long as every reference resolves by commit time, the database accepts the lot. If the constraint is declared DEFERRABLE INITIALLY IMMEDIATE instead, you can opt into deferral per transaction:

BEGIN;
SET CONSTRAINTS ALL DEFERRED;
-- insert orders and users in any order
COMMIT;

The constraint is still in force; you've only changed when it's evaluated. Constraints declared without DEFERRABLE cannot be deferred at all — SET CONSTRAINTS will reject them.

The practical consequence: bulk seeds in Postgres either need a topological sort of the foreign-key graph (parents first, children after) or DEFERRABLE constraints. There is no third option. This shows up the first time a fixture loader grows past a handful of tables — for the working-through, see database seeding and the circular-foreign-key seed case.

Referential integrity vs entity integrity vs domain integrity

The three "integrity" properties travel together in textbooks and get conflated in practice. They are separate, and a row can satisfy any two while violating the third.

Integrity type	What it constrains	Example violation
Referential integrity	Every foreign key in a child row points to a row that actually exists in the parent table.	`orders.user_id = 42` while no row in `users` has `id = 42`.
Entity integrity	Every row has a non-null, unique primary key, so the row can be referenced unambiguously.	Two rows in `users` with `id = 7`, or a row with `id = NULL`.
Domain integrity	Every column value falls inside the column's declared domain — type, `CHECK`, `NOT NULL`, `UNIQUE` on non-key columns, and so on.	`users.age = -3` under a `CHECK (age >= 0)`, or `users.email = NULL` under `email NOT NULL`.

Referential integrity is specifically the one that depends on the existence of another row. Entity and domain integrity are properties of a single row in isolation.

What goes wrong when referential integrity breaks

Data doesn't dramatically "become corrupt" so much as develop quiet symptoms:

Orphaned rows. A child row points to a parent that no longer exists, still sitting in the table, still queryable, still backed up, just with nowhere to go on the next JOIN — reports filter it out without saying so, and dashboards quietly under-count.
Broken joins. INNER JOIN silently drops rows whose foreign key resolves to nothing. The query runs fine; the result is just incomplete. The teams that catch this are usually the ones who notice their numbers don't add up across two reports.
Restores hit the same wall. A backup of a child table won't load into a constrained schema unless its parent rows are loaded first — referential integrity constrains your import paths just as much as live writes.
Migrations that fail at the worst moment. A migration that adds NOT NULL REFERENCES to an existing column will fail on any row whose value doesn't already point to a real parent. Most teams find this out in staging by accident, two days before a release — unless they review the migration against realistic data first.

In Postgres the symptom you'll see during seeding or testing is usually:

ERROR: insert or update on table "orders" violates foreign key constraint "orders_user_id_fkey"
DETAIL: Key (user_id)=(42) is not present in table "users".

What this error actually means is that the database is doing its job, while the producer of the data — the seed script, the migration, the fixture loader — is what put a child in the table before its parent.

Referential integrity in test data

Most developers don't first encounter the phrase "referential integrity" in production. They encounter it in test environments, during seeding or fixture loading, in the form of violates foreign key constraint errors. And that's where the framing usually goes wrong.

The instinct is to treat the error as a referential-integrity bug — to ask whether the constraint is "right". It usually is. But the database is enforcing referential integrity correctly, and the actual bug sits on the producer side, in the part of the stack writing rows in the wrong order or with values that don't yet have parents. Referential integrity isn't something you turn on or off in a seeder — it's a standing obligation the producer has to meet before rows land. The same is true for database seeding generally — and the database seeder tools that resolve FK order automatically are the ones that escape this whole class of error — and for the broader practice of test data management.

That distinction has practical consequences:

Insert order is the actual hard problem. If a schema has 30 tables and 60 foreign keys, a seed script that inserts in alphabetical or migration order will hit a constraint error on the first child whose parent hasn't been written yet. The fix is a topological sort over the foreign-key dependency graph: parents first, children after. For a small schema you can write this by hand. For anything past a tutorial app, it grows fragile fast — every schema change re-orders the graph. This is the failure mode behind most foreign key constraint errors during seeding.

Optional vs. required relations matter. A nullable foreign-key column means "this row may or may not have a parent". A posts table with author_id bigint NULL REFERENCES users(id) allows posts with no author — useful for system-generated posts, awkward for queries that assume a join. The choice of nullable vs. non-nullable is design work; once it's made, the producer has to populate the column accordingly.

Cyclic foreign keys exist. If companies.primary_contact_id REFERENCES employees(id) and employees.company_id REFERENCES companies(id), there's no insert order that satisfies both constraints at once — Postgres handles it through DEFERRABLE constraints that defer the check to end of transaction, but only if the producer knows to declare them. We cover this case directly under circular foreign keys during seeding.

This producer-side work is what schema-aware test-data generators address. Seedfast reads your live schema and produces valid, correctly-ordered inserts — including for cyclic and self-referential cases. That isn't the same statement as "Seedfast preserves referential integrity"; the database does that. Seedfast generates rows that satisfy the schema's constraints, in an order the database accepts, so you don't see the violates foreign key constraint error on the way in. If you've already moved on from hand-rolled fixtures to generated test data, this is the producer-side discipline the practice rests on.

For teams in regulated industries, the producer-side framing is a compliance question as much as a correctness one — generating from the schema rather than copying production helps keep PII out of dev environments. The same insert-order discipline shows up in test data for HIPAA workflows and GDPR-aligned test data.

How to get test data that stays referentially intact

Test data is referentially intact when every foreign key in the generated rows resolves to a parent row that already exists at the moment the row lands, whether that row came from a seed script, a fixture loader, or a schema-aware generator. That makes it a producer-side property rather than a test-only variant of referential integrity — the same rule the database already enforces, now met on the way in by whatever code writes the rows. Meeting it comes down to writing rows in an order the database accepts and handling any cyclic reference before the transaction commits, and the mechanics of both are what the next section covers.

How seeders preserve referential integrity

Seeders maintain referential integrity through three techniques: topologically sorting the foreign-key graph so parents insert before children, declaring cyclic foreign keys DEFERRABLE INITIALLY DEFERRED so checks run at COMMIT, and generating primary keys (typically UUIDs) before any INSERT so foreign-key values are known on both sides of the relationship.

A working seeder doesn't ask the database to be lenient. It hands the database rows in an order, and with values, that already satisfy every foreign key. The three techniques in detail:

Topological sort of the foreign-key graph. Treat tables as nodes and foreign keys as edges pointing from child to parent, then sort the graph so every parent is inserted before its children. At the scale where this stops being optional — dozens of tables, twice as many foreign keys — it's the difference between a seed that runs cleanly and one that fails on the first child, and the sort has to be re-run whenever the schema changes, since adding one REFERENCES clause can re-order a quarter of the graph.
DEFERRABLE constraints for cycles. Some foreign-key graphs aren't acyclic — companies references employees(id) for a primary contact, and employees references companies(id) for an employer. No topological order exists. The fix is DEFERRABLE INITIALLY DEFERRED so both inserts can land in the same transaction and the constraint check waits until commit.
Generating primary keys upfront. If you generate UUIDs in the seeder before issuing any INSERT, you know the foreign-key value before either row is written. The parent insert uses the same UUID the child insert will reference, and the dependency on database-assigned bigserial IDs disappears. This is what makes cross-table consistency in database seeding tractable at scale.

Seedfast handles this automatically across your live schema, including cyclic cases — the producer-side discipline this section describes is work the tool does on your behalf, not work you have to script by hand.

Edge cases worth knowing

Foreign keys are only one of several schema features a data producer has to honor. CHECK constraints, generated columns, and partial unique indexes can all make a row type-correct yet still invalid — this breakdown of six Postgres schema features generators skip walks through the ones that bite most often.

A few corners the constraint-action table doesn't cover:

Many-to-many bridge tables. A posts_tags row holds two foreign keys, one to posts and one to tags. Both must resolve. The producer has to insert both parents before any bridge row, and the bridge row is the canonical place where insert-order bugs surface in tests.
Self-referential foreign keys. A self-referential foreign key is one where the parent and the child are the same table — for example, employees.manager_id REFERENCES employees(id) for an org chart, or comments.parent_id REFERENCES comments(id) for threaded discussion. Producers have to insert the root rows first (the ones whose foreign key is NULL) and order their descendants by depth. The alternative is a DEFERRABLE INITIALLY DEFERRED constraint, so the entire tree can be inserted in any order inside one transaction.
Circular foreign keys between two tables. When companies.primary_contact_id REFERENCES employees(id) and employees.company_id REFERENCES companies(id), neither table can be inserted first without violating referential integrity. There is no topological order. Both constraints have to be declared DEFERRABLE and the inserts wrapped in a transaction that runs SET CONSTRAINTS ALL DEFERRED. The full walkthrough lives in circular foreign keys during seeding.
Composite foreign keys. A composite (multi-column) foreign key references a composite primary key — FOREIGN KEY (tenant_id, user_id) REFERENCES users(tenant_id, id). Referential integrity holds across the tuple, not on either column alone. A child row is valid only when the whole tuple matches a row in the parent. Producers that generate columns independently break this without noticing.
MATCH FULL vs MATCH SIMPLE. For composite foreign keys with nullable columns, MATCH SIMPLE (the default in Postgres) lets the row skip the check whenever any column in the foreign key is NULL. MATCH FULL requires either all columns to be non-null and resolving to a parent, or all columns to be NULL. MATCH PARTIAL is in the SQL standard but not implemented in Postgres. Switch to MATCH FULL when "partially-null" foreign keys would be a data bug.
Soft deletes vs. CASCADE. If you mark rows deleted by setting a deleted_at timestamp instead of issuing DELETE, ON DELETE CASCADE never fires. Children remain pointing at "deleted" parents. That's a separate design choice worth being explicit about; it isn't a referential-integrity failure, but it interacts with how you reason about orphans.
Deferrable constraints. Postgres lets you mark a foreign-key constraint DEFERRABLE INITIALLY IMMEDIATE (or DEFERRED) so the check is postponed to commit. Useful for cyclic graphs and bulk loads. The constraint is still in force; you've just changed when it's evaluated.
Audit and compliance contexts. Industries with audit constraints — PCI DSS test data for cardholder data, SOC 2 test data for service-org controls — push the producer-side reasoning further: the producer has to honor referential integrity and avoid leaking real records into the schema at the same time.

Across all of these, the database keeps doing the simple thing, while the producer of the data is the one who needs the extra nuance to feed it correctly.

Frequently asked questions

What is referential integrity in a database?

Referential integrity in a database is the guarantee that every foreign key in a child table resolves to an existing primary key in the parent table. It's the property that keeps relational data from accumulating orphan rows — children pointing at parents that no longer exist. Relational databases enforce it through FOREIGN KEY constraints; producers of data (applications, migrations, seed scripts) have to honor it on the way in.

What does referential integrity mean?

Referential integrity means that for every row that holds a reference to another row, that other row actually exists. Same idea expressed two ways: no orphan foreign keys, and no dangling references between tables. The database checks the property on every INSERT, UPDATE, and DELETE that touches a constrained column.

What is the difference between a foreign key and referential integrity?

A foreign key is the mechanism; referential integrity is the property it enforces. A FOREIGN KEY is a SQL clause declaring "this column references that one". Referential integrity is the rule that follows from it, requiring every value in the referencing column to match a value in the referenced column. Applications can enforce the property without explicit foreign keys, and foreign keys can be temporarily relaxed with DEFERRABLE, but in normal use the two travel together.

Does PostgreSQL enforce referential integrity by default?

Yes. PostgreSQL enforces referential integrity by default through FOREIGN KEY constraints, and the check fires at the end of each statement, not at the end of the transaction. Bulk inserts whose foreign-key targets are written in the same transaction need either topological ordering or a DEFERRABLE INITIALLY DEFERRED constraint. The enforcement is on by default; only the timing is configurable.

How do you maintain referential integrity in seed data?

You maintain referential integrity in seed data by inserting parent rows before children (via a topological sort of the foreign-key graph), deferring the constraint check to commit time for cycles, or generating primary keys upfront so foreign-key values are known on both sides before either INSERT runs. Schema-aware database seeder tools automate all three; hand-rolled fixture scripts have to re-encode the ordering on every schema change.

Test Data Management — the broader practice of producing data that satisfies a schema without copying production
Generate Test Data with AI — why AI coding agents break referential integrity on deep schemas, and the delegation pattern that fixes it
Test Data Generation — schema-aware generation as the producer-side discipline this article describes
Get started with Seedfast — point Seedfast at your database and get valid, referentially intact data out, end to end