Changing a database schema without taking production down is a discipline of its own. Let's look at why plain ALTER TABLE statements are dangerous under live traffic, and how to roll out changes correctly.
Why ALTER TABLE blocks everything
Imagine you want to add a column to a table with tens of millions of rows. You write ALTER TABLE orders ADD COLUMN priority integer NOT NULL DEFAULT 0 — and production freezes for several minutes.
The reason: for most schema changes PostgreSQL takes an ACCESS EXCLUSIVE lock. This lock is the strictest one: it lets no one in — not SELECT, not INSERT, not UPDATE. Worse still, it queues up behind all current queries. If a long-running report is in flight at the moment of the migration, the ALTER TABLE waits for it, and behind it a queue of all new queries piles up.
Hence the first rule: every migration starts with SET LOCAL lock_timeout = '3s'. If the lock can't be acquired within 3 seconds, the migration fails with an error instead of hanging production.
BEGIN;
SET LOCAL lock_timeout = '3s';
ALTER TABLE orders ADD COLUMN priority integer;
COMMIT;
Breaking change and N-1 compatibility
Not all schema changes are equally dangerous.
Safe changes (can be applied without special measures):
ADD COLUMN ... NULL— a column without theNOT NULLconstraintADD COLUMN ... NOT NULL DEFAULT 'x'(PostgreSQL 11+, a constant value)CREATE TABLEADD CONSTRAINT ... NOT VALID
Dangerous changes (require a special approach):
DROP COLUMN,RENAME COLUMN- Changing a column's type (
ALTER TYPE) ADD COLUMN NOT NULLwithout a default value on a large table- Removing a value from an enum type
Dangerous changes are also called a breaking change — they can break code that is already running. And the problem is not only the lock, but also the timing of the deploy. A typical deploy: first the migration is applied, then the new version of the application starts. Between them there is a window where the old version of the code runs against the new schema. If the schema changed incompatibly, the old code will crash.
This rule is called N-1 compatibility: a migration must be compatible with the previous version of the code.
Expand-Contract: three releases instead of one
To make a dangerous change safely, you split it across several releases. This pattern is called Expand-Contract (expand — migrate — contract).
Release 1 — expand: add the new structure (column, table)
code writes to the old place, optionally to the new one too
Release 2 — migrate: move data from old to new (backfill)
code writes to both places, reads from the old one
Release 3 — read new: code reads from the new one, writes to both
Release 4 — contract: remove the old one, drop it from the code
Each release can be rolled back independently. If something goes wrong in release 2, you can return to release 1 without losing data.
How to add a NOT NULL column
PostgreSQL 11 and newer can add a column with a constant default value instantly — the value is stored "virtually" in the metadata rather than written into every row:
-- Instant even on 100M rows
ALTER TABLE orders ADD COLUMN priority integer NOT NULL DEFAULT 0;
But if the default value is dynamic (for example, NOW() or gen_random_uuid()), PostgreSQL can't store it virtually and will go rewrite every row. Here you need expand-contract:
- Add the column without
NOT NULL - Fill existing rows in batches (see the section on backfill)
- Set the constraint the safe way:
-- Step 3a: add a CHECK constraint without validating old rows
ALTER TABLE orders ADD CONSTRAINT ck_orders_priority_not_null
CHECK (priority IS NOT NULL) NOT VALID;
-- Step 3b: validate existing rows (does not block writes)
ALTER TABLE orders VALIDATE CONSTRAINT ck_orders_priority_not_null;
-- Step 3c: now SET NOT NULL is cheap — PG knows there are no NULLs
ALTER TABLE orders ALTER COLUMN priority SET NOT NULL;
-- Step 3d: drop the temporary CHECK
ALTER TABLE orders DROP CONSTRAINT ck_orders_priority_not_null;
CHECK NOT VALID executes instantly. VALIDATE takes a gentle SHARE UPDATE EXCLUSIVE lock, which does not interfere with normal queries.
How to rename a column
RENAME COLUMN can't be done in a single release. The old code will break immediately, because it will reference a column that no longer exists.
The safe sequence:
- Add the new column (
ADD COLUMN new_name <type>) - Code starts writing to both columns
- Move existing data in batches
- Code starts reading from the new column
- Code stops writing to the old one
- Drop the old column (
DROP COLUMN old_name)
For read-only scenarios there is a simpler option — a view wrapper:
ALTER TABLE orders RENAME COLUMN old_name TO new_name;
CREATE VIEW orders_v AS SELECT *, new_name AS old_name FROM orders;
The old code reads from the view and sees the old name, while the new code works directly against the table.
How to change a column's type
ALTER COLUMN ... SET DATA TYPE with a type cast rewrites the whole table — the result is the same as ALTER TABLE on millions of rows. An ACCESS EXCLUSIVE lock held for a long time.
Exceptions that work instantly (without a rewrite):
varchar→text(widening without a cast)varchar(50)→varchar(100)(widening only)
For everything else (for example, integer → bigint) you need expand-contract: add a shadow column of the new type, fill it with data, switch the code over.
How to add a foreign key
A plain ADD FOREIGN KEY locks both tables and validates all existing rows. On large tables, that's slow.
The safe way is two-step:
-- Step 1: create the FK without validating existing rows (instant)
ALTER TABLE order_item
ADD CONSTRAINT fk_order_item_order_id
FOREIGN KEY (order_id) REFERENCES orders(id)
NOT VALID;
-- Step 2: validate existing rows (does not block writes)
ALTER TABLE order_item VALIDATE CONSTRAINT fk_order_item_order_id;
NOT VALID means: new rows will be validated immediately, old ones at VALIDATE time.
How to add an index
CREATE INDEX without extra keywords takes a SHARE lock, which blocks INSERT/UPDATE/DELETE. On a large table, that's several minutes with no writes.
The solution is CREATE INDEX CONCURRENTLY. It builds the index in several passes without locking:
CREATE INDEX CONCURRENTLY ix_orders_status ON orders (status);
DROP INDEX also takes a hard lock — use DROP INDEX CONCURRENTLY.
If CREATE INDEX CONCURRENTLY was interrupted, it leaves an INVALID index behind. You need to drop it and create it again:
DROP INDEX CONCURRENTLY IF EXISTS ix_orders_status;
CREATE INDEX CONCURRENTLY ix_orders_status ON orders (status);
An important detail for migration tools (Liquibase, Flyway): CONCURRENTLY can't run inside a transaction. Changesets with indexes must have runInTransaction="false":
<changeSet id="20260507-add-status-index" runInTransaction="false">
<sql>CREATE INDEX CONCURRENTLY ix_orders_status ON orders (status);</sql>
<rollback>DROP INDEX CONCURRENTLY IF EXISTS ix_orders_status;</rollback>
</changeSet>
How to remove a value from an enum
PostgreSQL has no REMOVE VALUE FROM ENUM command. You can't remove a value natively.
The only way is to create a new type without the unwanted value and switch the column over to it:
CREATE TYPE order_status_v2 AS ENUM ('NEW', 'PAID', 'SHIPPED')(without the value being removed)ADD COLUMN status_v2 order_status_v2 NULL- Move data in batches:
UPDATE orders SET status_v2 = status::text::order_status_v2 - Switch the code over to reading from and writing to
status_v2 DROP COLUMN status,RENAME COLUMN status_v2 TO status,DROP TYPE order_status
Adding a value (ADD VALUE) is instant (PostgreSQL 12+), but you can't use the new value in the same transaction. It must be a separate changeset.
Moving data in batches
A large UPDATE in a single transaction is a bad idea: the transaction holds locks for its entire duration, accumulates WAL, and interferes with autovacuum.
The right approach is to update in small chunks with a commit between them:
DO $$
DECLARE rows_updated integer := 1;
BEGIN
WHILE rows_updated > 0 LOOP
UPDATE orders SET priority = 0
WHERE id IN (
SELECT id FROM orders
WHERE priority IS NULL
LIMIT 10000
);
GET DIAGNOSTICS rows_updated = ROW_COUNT;
COMMIT;
PERFORM pg_sleep(0.1);
END LOOP;
END $$;
For very large tables an SQL batch inside the migration still occupies a single connection and holds up the deploy. A more reliable option is a background job in the application code: it runs independently of the deploy, and it can be stopped and restarted. Use FOR UPDATE SKIP LOCKED to avoid conflicting with user queries:
@Component
public class BackfillPriorityJob {
private final JdbcTemplate jdbc;
public BackfillPriorityJob(JdbcTemplate jdbc) {
this.jdbc = jdbc;
}
@Scheduled(cron = "0 * * * * *")
public void backfillPriority() {
int updated;
do {
updated = jdbc.update("""
UPDATE orders SET priority = 0
WHERE id IN (
SELECT id FROM orders
WHERE priority IS NULL
LIMIT 10000
FOR UPDATE SKIP LOCKED
)
""");
} while (updated == 10000);
}
}
Rolling back migrations barely works
The <rollback> section in Liquibase or Flyway creates an illusion of safety. In practice, rolling back a migration on a production database is almost impossible:
- If a column was dropped, the data is lost
- If data integrity was violated, a rollback won't restore the state
- A single migration in an expand-contract chain can't be rolled back without breaking the whole chain
The real options when there's a problem:
- Forward-fix — write a new migration that corrects the situation
- Restore from a backup — if data was lost
This is exactly why expand-contract matters so much: each step is safe on its own and doesn't break the previous state.
squawk — a linter for migrations
squawk is a static analysis tool for SQL migrations. It finds dangerous patterns before they reach production:
ADD COLUMNwithDEFAULTon PostgreSQL below 11CREATE INDEXwithoutCONCURRENTLYADD FOREIGN KEYwithoutNOT VALIDALTER TYPEwith a table rewrite- column and table renames
squawk db/changelog/v0042.sql
It's worth adding squawk to a pre-commit hook and to CI — that way dangerous migrations won't pass review unnoticed.
In short
ALTER TABLEtakesACCESS EXCLUSIVE— it blocks everything. On a large table that's minutes with no service.- Start every migration with
SET LOCAL lock_timeout = '3s'— better to fail fast than to hang for a long time. - N-1 compatibility: a migration must also work with the old code.
- Expand-Contract — a way to make a dangerous change over 3–4 releases without stopping the service.
NOT NULLon an existing column — viaCHECK NOT VALID+VALIDATE+SET NOT NULL.ADD FOREIGN KEY— viaNOT VALID+VALIDATEseparately.- Indexes — always
CONCURRENTLYwithrunInTransaction="false". - You can't remove an enum value natively — you need a new type and a shadow column.
- A large
UPDATE— in batches of 10K rows withSKIP LOCKED, ideally as a background job. - Rolling back a migration in production doesn't work — design for a forward-fix.
What to read next
- PostgreSQL indexes — index types, GIN, BRIN, partial indexes.
- PostgreSQL locks — how
ACCESS EXCLUSIVEand other lock levels work. - PostgreSQL data types — enum, jsonb, uuid, and when to choose what.