What is Referential Integrity? Definition and Examples
By Mikhail Shytsko, Founder at Seedfast · · Updated
Referential integrity is a relational-database property that ensures every foreign key value in a child row points to a primary key value that actually exists in the parent table. It is enforced by FOREIGN KEY constraints, and it is the rule that keeps the relationships between tables from drifting into orphan rows over time.
In one line, referential integrity means no child row may reference a parent that doesn't exist. That's the entire property. A orders.user_id value of 42 is only valid while users.id = 42 is still in the database.
To define referential integrity more formally: for every foreign key constraint, the set of values in the child column must be a subset of the values in the referenced parent column, at every moment the database is queryable. The textbook version uses "tuple" instead of "row" and defines it over relations rather than tables, but for a developer working in Postgres or MySQL, the working definition is the one above.
- Referential integrity is the property that every foreign key in a row points to a row that actually exists in the parent table — nothing more elaborate than that
- The database enforces referential integrity through
FOREIGN KEYconstraints; anything that produces data — your application, your migrations, your seed scripts — has to honor it on the way in. Same property, two responsibilities - The five constraint actions (
CASCADE,NO ACTION,RESTRICT,SET NULL,SET DEFAULT) decide what the database does when a parent row is updated or deleted while child rows still reference it - When referential integrity breaks, the symptoms are quiet: orphaned rows, joins that drop records, dashboards that under-count
- For test data, referential integrity is mostly a producer-side problem: insert order, optional vs. required relations, and cyclic foreign keys are where developers actually hit it
A small example. You have a users table and an orders table:
CREATE TABLE users (
id bigserial PRIMARY KEY,
email text NOT NULL UNIQUE
);
CREATE TABLE orders (
id bigserial PRIMARY KEY,
user_id bigint NOT NULL REFERENCES users(id),
placed_at timestamptz NOT NULL DEFAULT now()
);
The REFERENCES users(id) clause is what turns user_id into a foreign key. From that line forward, the database holds you to one rule: every value in orders.user_id must match the id of some row in users. That rule, applied across every row at every moment, is referential integrity.
What it isn't: data integrity in general. Data integrity is a broader umbrella that also covers domain constraints (age >= 0), uniqueness (email UNIQUE), and entity integrity (no row without a primary key). Referential integrity is specifically about the relationship between a child row and its parent. People conflate the two, but they're separate properties — a row can satisfy every domain constraint and still violate referential integrity by referencing a parent that was deleted last Tuesday.
The other distinction worth making early: the database is on the enforcing side of this property. Anything that produces data — an application, an ORM migration, a fixture loader, a schema-aware generator like Seedfast — operates on the producer side. They have to honor the same property the database is enforcing. That split runs through the rest of the article, and it's the source of most confusion when "referential integrity" shows up in an error message.
The database checks three operations against every foreign key:
INSERTinto the child — the database rejects an insert if the foreign-key value doesn't exist in the parent.UPDATEon the parent or child — if you change a child's foreign-key value, it must still resolve to a parent row. If you change a parent's primary key (rare, but legal in Postgres), the change has to propagate to the children, or the update is rejected.DELETEon the parent — you can't delete a parent row that still has children pointing at it, unless you've configured the constraint to do something else.
That last clause matters. The "do something else" is configured through the constraint's referential actions, declared on the foreign key:
CREATE TABLE orders (
id bigserial PRIMARY KEY,
user_id bigint NOT NULL
REFERENCES users(id)
ON DELETE RESTRICT
ON UPDATE CASCADE,
placed_at timestamptz NOT NULL DEFAULT now()
);
Postgres, MySQL/InnoDB, and SQL Server all support the same five actions, with minor syntax variations:
NO ACTION(the default) — the database raises an error and rolls back the transaction. Postgres defers the check until the end of the statement, which lets you do multi-row updates that are momentarily inconsistent and resolve them before commit.RESTRICT— same outcome asNO ACTIONin most cases, but the check fires immediately rather than at end-of-statement. Useful when you want failures to surface early.CASCADE— the database applies the same change to the children.ON DELETE CASCADEdeletes the orders when their user is deleted;ON UPDATE CASCADErewrites the foreign-key value when the parent's key changes.SET NULL/SET DEFAULT— the foreign key on the child rows is set toNULLor to the column's default, leaving the children orphaned-but-valid. Requires the column to be nullable, or to have a default that points to a real parent row.
Picking the right action is design work, not configuration. CASCADE fits when the child has no independent meaning — a pageviews row makes no sense without its session. RESTRICT fits when the child carries history you can't lose — an order belongs to a user for accounting purposes long after the user record is closed. SET NULL fits when the parent is metadata that may go away — an assigned_to user on a ticket. The default NO ACTION is the right starting point when you haven't decided yet.
Yes, PostgreSQL enforces referential integrity by default through FOREIGN KEY constraints, and the check fires at the end of each statement, not at the end of the transaction. That detail is the source of most surprises when teams move bulk inserts or seed scripts onto Postgres: a multi-row INSERT ... SELECT can momentarily reference rows that the same statement is about to create, and the constraint still resolves cleanly — but a sequence of separate INSERTs cannot.
A complete FOREIGN KEY declaration with cascading updates and deletes:
CREATE TABLE users (
id bigserial PRIMARY KEY,
email text NOT NULL UNIQUE
);
CREATE TABLE orders (
id bigserial PRIMARY KEY,
user_id bigint NOT NULL
REFERENCES users(id)
ON DELETE CASCADE
ON UPDATE CASCADE,
placed_at timestamptz NOT NULL DEFAULT now()
);
Postgres supports five ON DELETE referential actions: NO ACTION (the default), RESTRICT, CASCADE, SET NULL, and SET DEFAULT. They control what happens when a DELETE would orphan a child row.
NO ACTION— the default. Raises a foreign-key violation if any child still references the parent; the check is deferred to end-of-statement so multi-row updates can resolve themselves.RESTRICT— same outcome asNO ACTION, but the check fires immediately and cannot be deferred.CASCADE— deletes the matching child rows along with the parent; the relationship continues to satisfy referential integrity automatically.SET NULL— sets the child's foreign-key column toNULL, leaving the child row in place without a parent. Requires the column to be nullable.SET DEFAULT— sets the child's foreign-key column to its declaredDEFAULT. Requires that default to itself reference a real parent row, or you've just moved the violation one step downstream.
ON UPDATE accepts the same five actions and applies the same logic when a parent's primary key changes.
DEFERRABLE INITIALLY DEFERRED makes Postgres hold every foreign-key check on the constraint until COMMIT, so children and parents can be inserted in any order inside a single transaction. That's the escape hatch for seed scripts, bulk imports, and any transaction where parent and child are written together but the order isn't fixed — which describes most test-data generation.
CREATE TABLE orders (
id bigserial PRIMARY KEY,
user_id bigint NOT NULL
REFERENCES users(id)
ON DELETE CASCADE
DEFERRABLE INITIALLY DEFERRED,
placed_at timestamptz NOT NULL DEFAULT now()
);
With INITIALLY DEFERRED, Postgres holds the foreign-key check until COMMIT. You can insert children before parents inside the transaction; as long as every reference resolves by commit time, the database accepts the lot. If the constraint is declared DEFERRABLE INITIALLY IMMEDIATE instead, you can opt into deferral per transaction:
BEGIN;
SET CONSTRAINTS ALL DEFERRED;
-- insert orders and users in any order
COMMIT;
The constraint is still in force; you've only changed when it's evaluated. Constraints declared without DEFERRABLE cannot be deferred at all — SET CONSTRAINTS will reject them.
The practical consequence: bulk seeds in Postgres either need a topological sort of the foreign-key graph (parents first, children after) or DEFERRABLE constraints. There is no third option. This shows up the first time a fixture loader grows past a handful of tables — for the working-through, see database seeding and the circular-foreign-key seed case.
The three "integrity" properties travel together in textbooks and get conflated in practice. They are separate, and a row can satisfy any two while violating the third.
| Integrity type | What it constrains | Example violation |
|---|---|---|
| Referential integrity | Every foreign key in a child row points to a row that actually exists in the parent table. | orders.user_id = 42 while no row in users has id = 42. |
| Entity integrity | Every row has a non-null, unique primary key, so the row can be referenced unambiguously. | Two rows in users with id = 7, or a row with id = NULL. |
| Domain integrity | Every column value falls inside the column's declared domain — type, CHECK, NOT NULL, UNIQUE on non-key columns, and so on. | users.age = -3 under a CHECK (age >= 0), or users.email = NULL under email NOT NULL. |
Referential integrity is specifically the one that depends on the existence of another row. Entity and domain integrity are properties of a single row in isolation.
The dramatic version is "your data becomes corrupt". The accurate version is quieter:
- Orphaned rows. A child row points to a parent that no longer exists. The row is still in the table, still queryable, still backed up. It just has nowhere to go on the next
JOIN. Reports filter it out without saying so. Dashboards under-count. - Broken joins.
INNER JOINsilently drops rows whose foreign key resolves to nothing. The query runs fine; the result is just incomplete. The teams that catch this are usually the ones who notice their numbers don't add up across two reports. - Re-importable, but not really. A backup of a child table won't load into a constrained schema if its parent rows aren't loaded first. Referential integrity also constrains your import paths.
- Migrations that fail at the worst moment. A migration that adds
NOT NULL REFERENCESto an existing column will fail on any row whose value doesn't already point to a real parent. Most teams find this out in staging by accident, two days before a release — unless they review the migration against realistic data first.
In Postgres the symptom you'll see during seeding or testing is usually:
ERROR: insert or update on table "orders" violates foreign key constraint "orders_user_id_fkey"
DETAIL: Key (user_id)=(42) is not present in table "users".
Worth noticing what this means: the database is doing its job. It's the producer of the data — your seed script, your migration, your fixture loader — that put a child in the table before its parent. Which brings us to the part most articles skip.
Most developers don't first encounter the phrase "referential integrity" in production. They encounter it in test environments, during seeding or fixture loading, in the form of violates foreign key constraint errors. And that's where the framing usually goes wrong.
The instinct is to treat the error as a referential-integrity bug — to ask whether the constraint is "right". It usually is. The database is enforcing referential integrity correctly. The bug is on the producer side: the part of the stack writing rows in the wrong order, or with values that don't yet have parents. Referential integrity isn't something you turn on or off in a seeder; it's something the producer has to honor on the way in. The same is true for database seeding generally — and the database seeder tools that resolve FK order automatically are the ones that escape this whole class of error — and for the broader practice of test data management.
That distinction has practical consequences:
Insert order is the actual hard problem. If a schema has 30 tables and 60 foreign keys, a seed script that inserts in alphabetical or migration order will hit a constraint error on the first child whose parent hasn't been written yet. The fix is a topological sort over the foreign-key dependency graph: parents first, children after. For a small schema you can write this by hand. For anything past a tutorial app, it grows fragile fast — every schema change re-orders the graph. This is the failure mode behind most foreign key constraint errors during seeding.
Optional vs. required relations matter. A nullable foreign-key column means "this row may or may not have a parent". A posts table with author_id bigint NULL REFERENCES users(id) allows posts with no author — useful for system-generated posts, awkward for queries that assume a join. The choice of nullable vs. non-nullable is design work; once it's made, the producer has to populate the column accordingly.
Cyclic foreign keys exist. If companies.primary_contact_id REFERENCES employees(id) and employees.company_id REFERENCES companies(id), neither table can be inserted first without violating referential integrity. Postgres supports DEFERRABLE constraints to defer the check until the end of the transaction, but the producer still has to know to use them. We cover this case directly under circular foreign keys during seeding; it's a small enough corner that most articles skip it.
This producer-side work is what schema-aware test-data generators address. Seedfast reads your live schema, builds the foreign-key dependency graph, and inserts parent rows before child rows — including for cyclic and self-referential cases. That isn't the same statement as "Seedfast preserves referential integrity"; the database does that. Seedfast generates rows that satisfy the schema's constraints, in an order the database accepts, so you don't see the violates foreign key constraint error on the way in. If you've already moved on from hand-rolled fixtures to synthetic test data, this is the producer-side discipline the practice rests on.
For teams in regulated industries, the producer-side framing is a compliance question as much as a correctness one — generating from the schema rather than copying production helps keep PII out of dev environments. The same insert-order discipline shows up in test data for HIPAA workflows and GDPR-aligned test data.
Seeders maintain referential integrity through three techniques: topologically sorting the foreign-key graph so parents insert before children, declaring cyclic foreign keys DEFERRABLE INITIALLY DEFERRED so checks run at COMMIT, and generating primary keys (typically UUIDs) before any INSERT so foreign-key values are known on both sides of the relationship.
A working seeder doesn't ask the database to be lenient. It hands the database rows in an order, and with values, that already satisfy every foreign key. The three techniques in detail:
- Topological sort of the foreign-key graph. Treat tables as nodes and foreign keys as edges pointing from child to parent. Sort the graph so every parent is inserted before its children. For a schema with 30 tables and 60 foreign keys, this is the difference between a seed that runs cleanly and one that fails on the first child. The sort has to be re-run whenever the schema changes — adding one
REFERENCESclause can re-order a quarter of the graph. DEFERRABLEconstraints for cycles. Some foreign-key graphs aren't acyclic —companiesreferencesemployees(id)for a primary contact, andemployeesreferencescompanies(id)for an employer. No topological order exists. The fix isDEFERRABLE INITIALLY DEFERREDso both inserts can land in the same transaction and the constraint check waits until commit.- Generating primary keys upfront. If you generate UUIDs in the seeder before issuing any
INSERT, you know the foreign-key value before either row is written. The parent insert uses the same UUID the child insert will reference, and the dependency on database-assignedbigserialIDs disappears. This is what makes cross-table consistency in database seeding tractable at scale.
Seedfast does the topological sort over the live schema's foreign-key graph automatically, and falls back to deferred constraints for cycles — the producer-side discipline this section describes is the work the tool is doing on your behalf.
Foreign keys are only one of several schema features a data producer has to honor. CHECK constraints, generated columns, and partial unique indexes can all make a row type-correct yet still invalid — this breakdown of six Postgres schema features generators skip walks through the ones that bite most often.
A few corners the constraint-action table doesn't cover:
- Many-to-many bridge tables. A
posts_tagsrow holds two foreign keys, one topostsand one totags. Both must resolve. The producer has to insert both parents before any bridge row, and the bridge row is the canonical place where insert-order bugs surface in tests. - Self-referential foreign keys. A self-referential foreign key is one where the parent and the child are the same table — for example,
employees.manager_id REFERENCES employees(id)for an org chart, orcomments.parent_id REFERENCES comments(id)for threaded discussion. Producers have to insert the root rows first (the ones whose foreign key isNULL) and order their descendants by depth. The alternative is aDEFERRABLE INITIALLY DEFERREDconstraint, so the entire tree can be inserted in any order inside one transaction. - Circular foreign keys between two tables. When
companies.primary_contact_id REFERENCES employees(id)andemployees.company_id REFERENCES companies(id), neither table can be inserted first without violating referential integrity. There is no topological order. Both constraints have to be declaredDEFERRABLEand the inserts wrapped in a transaction that runsSET CONSTRAINTS ALL DEFERRED. The full walkthrough lives in circular foreign keys during seeding. - Composite foreign keys. A composite (multi-column) foreign key references a composite primary key —
FOREIGN KEY (tenant_id, user_id) REFERENCES users(tenant_id, id). Referential integrity holds across the tuple, not on either column alone. A child row is valid only when the whole tuple matches a row in the parent. Producers that generate columns independently break this without noticing. MATCH FULLvsMATCH SIMPLE. For composite foreign keys with nullable columns,MATCH SIMPLE(the default in Postgres) lets the row skip the check whenever any column in the foreign key isNULL.MATCH FULLrequires either all columns to be non-null and resolving to a parent, or all columns to beNULL.MATCH PARTIALis in the SQL standard but not implemented in Postgres. Switch toMATCH FULLwhen "partially-null" foreign keys would be a data bug.- Soft deletes vs.
CASCADE. If you mark rows deleted by setting adeleted_attimestamp instead of issuingDELETE,ON DELETE CASCADEnever fires. Children remain pointing at "deleted" parents. That's a separate design choice worth being explicit about; it isn't a referential-integrity failure, but it interacts with how you reason about orphans. - Deferrable constraints. Postgres lets you mark a foreign-key constraint
DEFERRABLE INITIALLY IMMEDIATE(orDEFERRED) so the check is postponed to commit. Useful for cyclic graphs and bulk loads. The constraint is still in force; you've just changed when it's evaluated. - Audit and compliance contexts. Industries with audit constraints — PCI DSS test data for cardholder data, SOC 2 test data for service-org controls — push the producer-side reasoning further: the producer has to honor referential integrity and avoid leaking real records into the schema at the same time.
These edge cases share a pattern: the database is doing the simple thing, and the producer of the data needs more nuance to feed it correctly.
Referential integrity in a database is the guarantee that every foreign key in a child table resolves to an existing primary key in the parent table. It's the property that keeps relational data from accumulating orphan rows — children pointing at parents that no longer exist. Relational databases enforce it through FOREIGN KEY constraints; producers of data (applications, migrations, seed scripts) have to honor it on the way in.
Referential integrity means that for every row that holds a reference to another row, that other row actually exists. Same idea expressed two ways: no orphan foreign keys, and no dangling references between tables. The database checks the property on every INSERT, UPDATE, and DELETE that touches a constrained column.
A foreign key is the mechanism; referential integrity is the property it enforces. A FOREIGN KEY is a SQL clause declaring "this column references that one". Referential integrity is the rule that follows: every value in the referencing column must match a value in the referenced column. Applications can enforce the property without explicit foreign keys, and foreign keys can be temporarily relaxed with DEFERRABLE, but in normal use the two travel together.
Yes. PostgreSQL enforces referential integrity by default through FOREIGN KEY constraints, and the check fires at the end of each statement, not at the end of the transaction. Bulk inserts whose foreign-key targets are written in the same transaction need either topological ordering or a DEFERRABLE INITIALLY DEFERRED constraint. The enforcement is on by default; only the timing is configurable.
You maintain referential integrity in seed data by inserting parent rows before children (via a topological sort of the foreign-key graph), deferring the constraint check to commit time for cycles, or generating primary keys upfront so foreign-key values are known on both sides before either INSERT runs. Schema-aware database seeder tools automate all three; hand-rolled fixture scripts have to re-encode the ordering on every schema change.
- Test Data Management — the broader practice of producing data that satisfies a schema without copying production
- Generate Test Data with AI — why AI coding agents break referential integrity on deep schemas, and the delegation pattern that fixes it
- Synthetic Test Data — schema-aware generation as the producer-side discipline this article describes
- Get started with Seedfast — point Seedfast at your database and watch the foreign-key graph resolve end to end