When a table grows to tens of gigabytes, problems begin: queries slow down, VACUUM runs for hours, and deleting old data turns into a multi-hour operation. In this article we'll look at two tools — partitioning and sharding — what they are, how they differ, and when to use each.
How PostgreSQL stores data and why a large table is a pain
When a query hits a table, PostgreSQL works with pages (8 KB each). An index helps find the needed pages quickly. But if the table takes up 50–100 GB, even the index becomes large — it doesn't fit entirely in RAM (shared_buffers), and reading turns into constant disk access.
Another pain point is cleaning up outdated row versions. PostgreSQL doesn't delete old versions right away; that's done by the background VACUUM process. On a large table it can run around the clock, unable to keep up with the flow of changes.
There are two strategies for solving these problems:
- Partitioning — cut one large table into several physical pieces within a single database on a single server.
- Sharding — spread data across several physical servers.
These are fundamentally different things. Partitioning is optimization within a single machine. Sharding is scaling across several machines.
How to find out how much space a table takes
Before optimizing anything, you need to measure. PostgreSQL has handy functions:
-- Size of the entire database
SELECT pg_size_pretty(pg_database_size('shop'));
-- → 42 GB
-- Size of the table together with indexes and TOAST
SELECT pg_size_pretty(pg_total_relation_size('product'));
-- → 12 GB
-- Just the rows themselves (without indexes)
SELECT pg_size_pretty(pg_relation_size('product'));
-- → 7 GB
-- Indexes only
SELECT pg_size_pretty(pg_indexes_size('product'));
-- → 5 GB
TOAST is separate storage for long values (texts, JSON). If pg_total_relation_size is much larger than the sum of pg_relation_size and pg_indexes_size, it means the table stores a lot of long fields.
To see all tables sorted by size:
SELECT
relname,
pg_size_pretty(pg_total_relation_size(schemaname || '.' || relname)) AS total,
pg_size_pretty(pg_relation_size(schemaname || '.' || relname)) AS table_only,
pg_size_pretty(pg_indexes_size(schemaname || '.' || relname)) AS indexes,
n_live_tup AS live_rows,
n_dead_tup AS dead_rows
FROM pg_stat_user_tables
ORDER BY pg_total_relation_size(schemaname || '.' || relname) DESC
LIMIT 20;
If dead_rows is comparable to live_rows — VACUUM isn't keeping up with deletions. If indexes is larger than table_only — you may have too many indexes.
Partitioning
Partitioning is when one logical table is physically cut into several parts called partitions. The application still writes to and reads from the product table, and PostgreSQL itself figures out which physical partition to put a row into and which one to read from.
This is called partition pruning — the planner "cuts off" unnecessary partitions and reads only those where the needed rows might be.
Partitioning solves the problems of a large table on a single server:
- indexes on each partition are smaller and fit into memory better;
- VACUUM works on partitions separately — faster and lighter;
- dropping an entire partition (
DROP TABLE product_2023) is an instant operation instead of a multi-hourDELETE; - a query for a specific month reads only the needed partitions, not the whole table.
RANGE — by range
The most popular option is to cut by dates. Logs, orders, transactions — anything that accumulates over time fits well into RANGE partitions.
CREATE TABLE product (
id BIGSERIAL,
category_id BIGINT,
price NUMERIC(10, 2) NOT NULL,
name TEXT NOT NULL,
created_at TIMESTAMPTZ NOT NULL,
PRIMARY KEY (id, created_at) -- the partition key must be part of the PRIMARY KEY
) PARTITION BY RANGE (created_at);
CREATE TABLE product_2026_q1 PARTITION OF product
FOR VALUES FROM ('2026-01-01') TO ('2026-04-01');
CREATE TABLE product_2026_q2 PARTITION OF product
FOR VALUES FROM ('2026-04-01') TO ('2026-07-01');
The query SELECT * FROM product WHERE created_at >= '2026-04-01' reads only product_2026_q2. The other partitions aren't even opened.
LIST — by specific values
When data is divided by a finite set of values — for example, by country, region, or status.
CREATE TABLE product (
id BIGSERIAL,
category_id BIGINT NOT NULL,
price NUMERIC(10, 2) NOT NULL,
name TEXT NOT NULL,
PRIMARY KEY (id, category_id)
) PARTITION BY LIST (category_id);
CREATE TABLE product_sweets PARTITION OF product FOR VALUES IN (1);
CREATE TABLE product_meat PARTITION OF product FOR VALUES IN (2);
CREATE TABLE product_dairy PARTITION OF product FOR VALUES IN (3);
CREATE TABLE product_other PARTITION OF product DEFAULT;
The query WHERE category_id = 1 reads only product_sweets. The planner skips the rest.
HASH — even distribution
When there's no natural scale and you just need to evenly distribute the load across partitions. PostgreSQL takes a hash of the key value and distributes rows by the formula hash(id) % N.
CREATE TABLE product (
id BIGSERIAL,
category_id BIGINT,
price NUMERIC(10, 2) NOT NULL,
name TEXT NOT NULL,
PRIMARY KEY (id)
) PARTITION BY HASH (id);
CREATE TABLE product_p0 PARTITION OF product FOR VALUES WITH (MODULUS 8, REMAINDER 0);
CREATE TABLE product_p1 PARTITION OF product FOR VALUES WITH (MODULUS 8, REMAINDER 1);
-- ... product_p7
The query WHERE id = 12345 reads only one partition. But a query without a filter on id still reads all eight.
The DEFAULT partition
A typical failure with RANGE partitioning: a new quarter arrives, the partition hasn't been created yet, and the INSERT fails with the error no partition of relation found for row. Data is lost, and the application spews errors.
The safeguard is the DEFAULT partition: it catches all rows that didn't land in any declared partition.
CREATE TABLE product_default PARTITION OF product DEFAULT;
Now the INSERT won't fail. But DEFAULT is insurance, not a place to store data. If a lot of rows accumulate there, then when you try to ATTACH PARTITION a new range, PostgreSQL will first scan the DEFAULT partition under a lock, and that can take a long time.
Working with DEFAULT correctly:
- Create new partitions in advance — several periods ahead (via cron or
pg_partman). - Set up an alert if anything shows up in DEFAULT — that means the auto-creation broke.
- Before each
ATTACH, make sure DEFAULT is empty.
How to choose a partition key
Three mandatory conditions:
- The key is present in most queries — otherwise the planner can't cut off the extra partitions and will read all of them.
- Data is distributed evenly — if 95% of rows land in one partition, there's no point.
- The key almost never changes — changing the key physically moves the row from one partition to another, and on hot updates that's expensive.
One more restriction: the partition key must be part of the PRIMARY KEY and every UNIQUE index. That's exactly why the examples above use PRIMARY KEY (id, created_at), not just PRIMARY KEY (id).
Sharding
Sharding is when data is spread across several physical servers. Each server (shard) stores only part of the data. The application or a proxy layer knows which shard to go to for each specific query.
Sharding is needed when a single server can no longer cope: it's short on RAM, IOPS, or disk throughput. Partitioning won't solve this — the data is still physically on one machine.
Sharding strategies
By hash — the simplest. shard = hash(id) % N. Data is distributed evenly. The problem: when adding a new server, you need to move a significant portion of the data. This is solved with consistent hashing, which minimizes the number of rows moved.
By range — data with id from 0 to 10M on one shard, from 10M to 20M on another. It's easy to add new servers, but the last range is always "hot" — all new records go there.
By tenant — each client (tenant) gets its own shard. Convenient for SaaS: a large client gets a dedicated server, and data isolation is simple. Requires a routing table.
By geography — data for users in Europe, Asia, and the US is stored on servers in the corresponding regions. This reduces latency and helps with requirements to store data in specific countries.
How to choose a shard key
Everything that applies to the partition key, plus two additional requirements:
- Related data should live on the same shard — if
productis sharded bycategory_idandcategoryis also sharded bycategory_id, then a JOIN between them stays local within the shard. Otherwise every JOIN turns into a distributed query. - Transactions should fit within a single shard — distributed transactions across several servers are slow and cope poorly with network failures.
The cost of sharding
Sharding adds complexity that partitioning doesn't have:
- Distributed transactions — slow and unreliable. In practice they're avoided through design: one operation = one shard.
- JOINs across shards — expensive, and almost always a sign of a wrong key choice. You have to denormalize data or synchronize it asynchronously.
- Global uniqueness —
BIGSERIALdoesn't work: counters on different shards diverge. You need UUIDs or special identifier generators. - Adding a new server — requires moving data. This is slow and needs special infrastructure (Citus, Vitess).
- Routing — the application or proxy must know which shard to send a query to.
Partitioning vs sharding
| Partitioning | Sharding | |
|---|---|---|
| Where the data is | Different tables in one database | Different servers |
| When it's needed | Table > 50–100 GB, index and VACUUM slowdowns | A single server can no longer handle it |
| Transparent to the application | Yes — one logical table | No — routing is needed |
| Transactions across segments | Ordinary local ones | Distributed — expensive |
| JOINs across segments | Cheap | Expensive |
| Changing the key | Expensive but possible | Practically impossible |
| Tools | pg_partman, scripts | Citus, Vitess, manual work |
Practical rule: partition first, shard only when you've hit the ceiling of a single server. Partitioning solves 80% of large-table problems and is incomparably easier to manage.
A useful trick for those planning to shard in the future: partition by the same key you later want to shard by. Then the move to sharding is a matter of transferring ready-made partitions to other servers, without rewriting queries.
In short
- Partitioning — one table, several physical pieces on a single server. Sharding — data on several servers.
- RANGE — by range (dates, numbers). LIST — by specific values. HASH — even distribution.
- The partition key must be part of the PRIMARY KEY and all UNIQUE indexes.
- The partition key should be present in most queries, otherwise the planner reads all partitions.
- The DEFAULT partition is insurance against data loss and is normally empty. New partitions are created in advance (via cron or
pg_partman). - Sharding makes JOINs across shards and transactions across several shards expensive — this is avoided through design.
- A global
BIGSERIALdoesn't work with sharding — you need UUIDs or special generators.
What to read next
- ACID and isolation levels in PostgreSQL — how the transactional model changes with sharding.
- Covering Index in PostgreSQL — how indexes are structured and why index size matters.