PostgreSQL Backups: pg_dump, pg_basebackup and PITR

Java · Go · Node · Python

What logical and physical backups are in PostgreSQL, when to use pg_dump and when to use pg_basebackup, how point-in-time recovery (PITR) works, and which rules you must follow so that a backup can actually save you.

Published 27 июня 2026 ~7 min read Author: Вадим Викулин

← Back to the section

Picture this: on Friday evening someone accidentally drops the orders table. Or a migration ships that wipes out data. The only thing standing between you and disaster is a backup. Let's look at how it works in PostgreSQL.

Two approaches: logical and physical backup

PostgreSQL has two fundamentally different ways to create a backup.

Logical backup is a dump of your data as SQL commands or a structured format. The tool is pg_dump. You get a file from which you can recreate a specific database, schema or table. It is slower, but flexible: you can grab only the tables you need, or move data to a different PostgreSQL version.

Physical backup is a copy of the on-disk files that make up a PostgreSQL cluster. The tool is pg_basebackup. It is faster and suited for production, because it supports recovery to an exact point in time (more on that below). The downside — the PostgreSQL version at restore time must match.

	`pg_dump` (logical)	`pg_basebackup` (physical)
What it copies	tables, schemas, one DB	the whole cluster
Size	more compact (no bloated blocks)	same as on disk
Creation speed	slow	fast
Restore speed	slow (parsing SQL)	fast
Move to another PG version	yes	no
Point-in-time recovery	no	yes (with WAL archive)
When to use	debugging, migration, one-off tasks	production, replication

For production you usually set up a physical backup plus WAL archiving — this lets you recover to any moment. Logical dumps are used for development and debugging.

pg_dump: basic usage

The simplest option is to dump a database into a SQL file:

pg_dump -U user -h host mydb > mydb.sql
psql -U user -h host mydb_new < mydb.sql

This is handy for small databases. The file is human-readable and can be opened in a text editor.

Custom format — the recommended way

For serious use it is better to take the custom format (-Fc). It is more compact and allows parallel restore:

pg_dump -Fc -U user -h host mydb > mydb.dump
pg_restore -j 4 -d mydb_new mydb.dump   # 4 parallel workers

The -j 4 flag during restore noticeably speeds up the process on large databases — each worker handles its own tables.

Useful flags

pg_dump --schema=public              # only one schema
pg_dump --table=order_doc            # only one table
pg_dump --schema-only                # structure only, no data
pg_dump --data-only                  # data only, no structure
pg_dump --exclude-table=audit_log    # exclude a table
pg_dump --no-owner --no-acl          # for restore into another cluster

Roles and global settings

pg_dump copies only a single database. Roles, tablespaces and global settings are stored at the cluster level — they are copied by pg_dumpall:

pg_dumpall --globals-only > globals.sql   # roles, tablespaces
pg_dump mydb > mydb.sql                    # database data

Restore a database locally for debugging

A typical scenario: a bug reproduces for a user, and you need a dump of the production database for local debugging.

createdb -U postgres customer_debug
pg_restore -d customer_debug -j 4 customer-prod-2026-05-01.dump
psql customer_debug

Important: a production dump must be anonymized before handing it over — strip out personal data. This is a requirement of data-protection laws such as GDPR. Tools for this are postgresql_anonymizer and manual UPDATE scripts.

Never hand over a raw dump with real user data.

pg_basebackup: physical backup of a cluster

A physical backup creates an exact copy of the cluster files:

pg_basebackup -U replicator -h master -D /backup/2026-05-07 -Ft -z -P

-Ft — pack into tar.
-z — compress with gzip.
-P — show progress.

To restore, you unpack the archive into data_directory and start PostgreSQL. The PostgreSQL version must match the one the backup was made with — this is the limitation of the physical approach.

PITR: point-in-time recovery

The most powerful capability for production is Point-In-Time Recovery. The idea: take a physical backup and "replay" the WAL journal files on top of it up to the moment you need.

If someone accidentally deleted all orders at 14:30, and the backup was taken at 02:00 — PITR lets you recover to 14:29, losing just one minute rather than half a day of work.

For this you need to set up WAL archiving on the server:

archive_mode = on
archive_command = 'rsync %p backup-host:/wal-archive/%f'

This means: every filled WAL segment is copied to a separate server. That forms a continuous chain of changes.

When you need to recover to a specific moment:

# recovery.signal in data_directory + postgresql.conf:
restore_command = 'rsync backup-host:/wal-archive/%f %p'
recovery_target_time = '2026-05-07 14:29:00 MSK'

PostgreSQL takes the base backup, then applies the WAL files one by one and stops exactly at 14:29. This is a DBA operation — but it is important for a developer to understand the principle.

How long to keep backups

A typical retention scheme:

daily copies — 7 days;
weekly — 4 weeks;
monthly — 12 months;
continuous WAL archive — the last 7 days.

All backups must live separately from the server that holds the data — on S3, a NAS, or cloud storage. A copy on the same machine is no protection: if the disk dies, everything dies with it.

And the golden rule: a backup you have never tested is not a backup. Regularly verify that you can actually restore from it. Otherwise, at the moment of disaster, you will discover the file is corrupted.

Backups in a multi-tenant architecture

When a single database serves several customers, the strategy depends on how the data is separated:

All customers in shared tables (tenant_id column) — back up the whole database, and filter via COPY ... WHERE tenant_id = ? when needed.
Schema per customer (schema-per-tenant) — pg_dump --schema=tenant_X gives you a backup of a specific customer.
Database per customer (db-per-tenant) — each database has its own backup schedule.

What to do when "everything is broken"

The action plan for lost or corrupted data:

Stop the changes — no new INSERT, UPDATE, DELETE. Every change after the disaster makes recovery harder.
Call the DBA or the infrastructure owner.
Make a copy of the current state before restoring anything.
Recover via PITR to a point before the problem — if WAL archiving is set up.
If there is no PITR — the last good backup plus manual recovery of the lost data.

An important nuance: if a table was dropped inside an open transaction and it has not been committed yet — check pg_stat_activity. If the transaction is still alive, a ROLLBACK brings the data back without any backup at all.

Common mistakes

Backup on the same machine. If the server fails, you lose both the data and the backup. Always to separate storage.

Not testing restore. A backup only exists once you have actually restored from it. Without regular testing it is just a file.

A production dump without anonymization. Handing over real user data breaks the law. Before any transfer — strip out personal data.

--inserts for large tables. The --inserts mode generates one INSERT per row — that is tens of times slower than the standard COPY-based mode. Use it only when you truly need to (for example, compatibility with other database systems).

Restoring a physical backup into a different major PostgreSQL version. This does not work. To move between versions — only pg_dump (logical).

In short

Two kinds of backup: logical (pg_dump) — flexible, for debugging; physical (pg_basebackup) — fast, for production.
pg_dump -Fc produces a compact file; pg_restore -j 4 restores in parallel.
Physical backup + WAL archive = PITR — recovery to any moment in the past.
pg_dumpall --globals-only copies cluster roles and global settings.
Backups must go to separate storage — a local backup is no protection.
A backup without a tested restore is not a backup.
Anonymize a production dump before handing it over.