Partitioning and Sharding in PostgreSQL

Java · Go · Node · Python

In plain terms: when a table no longer fits in memory or on disk and what to do about it. How partitioning works (RANGE, LIST, HASH), why you need a DEFAULT partition, and how sharding differs.

Published 18 мая 2026 updated 27 июня 2026 ~9 min read Author: Вадим Викулин

← Back to the section

When a table grows to tens of gigabytes, problems begin: queries slow down, VACUUM runs for hours, and deleting old data turns into a multi-hour operation. In this article we'll look at two tools — partitioning and sharding — what they are, how they differ, and when to use each.

How PostgreSQL stores data and why a large table is a pain

When a query hits a table, PostgreSQL works with pages (8 KB each). An index helps find the needed pages quickly. But if the table takes up 50–100 GB, even the index becomes large — it doesn't fit entirely in RAM (shared_buffers), and reading turns into constant disk access.

Another pain point is cleaning up outdated row versions. PostgreSQL doesn't delete old versions right away; that's done by the background VACUUM process. On a large table it can run around the clock, unable to keep up with the flow of changes.

There are two strategies for solving these problems:

Partitioning — cut one large table into several physical pieces within a single database on a single server.
Sharding — spread data across several physical servers.

These are fundamentally different things. Partitioning is optimization within a single machine. Sharding is scaling across several machines.

How to find out how much space a table takes

Before optimizing anything, you need to measure. PostgreSQL has handy functions:

-- Size of the entire database
SELECT pg_size_pretty(pg_database_size('shop'));
-- → 42 GB

-- Size of the table together with indexes and TOAST
SELECT pg_size_pretty(pg_total_relation_size('product'));
-- → 12 GB

-- Just the rows themselves (without indexes)
SELECT pg_size_pretty(pg_relation_size('product'));
-- → 7 GB

-- Indexes only
SELECT pg_size_pretty(pg_indexes_size('product'));
-- → 5 GB

TOAST is separate storage for long values (texts, JSON). If pg_total_relation_size is much larger than the sum of pg_relation_size and pg_indexes_size, it means the table stores a lot of long fields.

To see all tables sorted by size:

SELECT
    relname,
    pg_size_pretty(pg_total_relation_size(schemaname || '.' || relname)) AS total,
    pg_size_pretty(pg_relation_size(schemaname || '.' || relname))       AS table_only,
    pg_size_pretty(pg_indexes_size(schemaname || '.' || relname))        AS indexes,
    n_live_tup   AS live_rows,
    n_dead_tup   AS dead_rows
FROM pg_stat_user_tables
ORDER BY pg_total_relation_size(schemaname || '.' || relname) DESC
LIMIT 20;

If dead_rows is comparable to live_rows — VACUUM isn't keeping up with deletions. If indexes is larger than table_only — you may have too many indexes.

Partitioning

Partitioning is when one logical table is physically cut into several parts called partitions. The application still writes to and reads from the product table, and PostgreSQL itself figures out which physical partition to put a row into and which one to read from.

This is called partition pruning — the planner "cuts off" unnecessary partitions and reads only those where the needed rows might be.

Partitioning solves the problems of a large table on a single server:

indexes on each partition are smaller and fit into memory better;
VACUUM works on partitions separately — faster and lighter;
dropping an entire partition (DROP TABLE product_2023) is an instant operation instead of a multi-hour DELETE;
a query for a specific month reads only the needed partitions, not the whole table.

RANGE — by range

The most popular option is to cut by dates. Logs, orders, transactions — anything that accumulates over time fits well into RANGE partitions.

CREATE TABLE product (
    id          BIGSERIAL,
    category_id BIGINT,
    price       NUMERIC(10, 2) NOT NULL,
    name        TEXT NOT NULL,
    created_at  TIMESTAMPTZ NOT NULL,
    PRIMARY KEY (id, created_at)   -- the partition key must be part of the PRIMARY KEY
) PARTITION BY RANGE (created_at);

CREATE TABLE product_2026_q1 PARTITION OF product
    FOR VALUES FROM ('2026-01-01') TO ('2026-04-01');

CREATE TABLE product_2026_q2 PARTITION OF product
    FOR VALUES FROM ('2026-04-01') TO ('2026-07-01');

The query SELECT * FROM product WHERE created_at >= '2026-04-01' reads only product_2026_q2. The other partitions aren't even opened.

LIST — by specific values

When data is divided by a finite set of values — for example, by country, region, or status.

CREATE TABLE product (
    id          BIGSERIAL,
    category_id BIGINT NOT NULL,
    price       NUMERIC(10, 2) NOT NULL,
    name        TEXT NOT NULL,
    PRIMARY KEY (id, category_id)
) PARTITION BY LIST (category_id);

CREATE TABLE product_sweets PARTITION OF product FOR VALUES IN (1);
CREATE TABLE product_meat   PARTITION OF product FOR VALUES IN (2);
CREATE TABLE product_dairy  PARTITION OF product FOR VALUES IN (3);
CREATE TABLE product_other  PARTITION OF product DEFAULT;

The query WHERE category_id = 1 reads only product_sweets. The planner skips the rest.

HASH — even distribution

When there's no natural scale and you just need to evenly distribute the load across partitions. PostgreSQL takes a hash of the key value and distributes rows by the formula hash(id) % N.

CREATE TABLE product (
    id          BIGSERIAL,
    category_id BIGINT,
    price       NUMERIC(10, 2) NOT NULL,
    name        TEXT NOT NULL,
    PRIMARY KEY (id)
) PARTITION BY HASH (id);

CREATE TABLE product_p0 PARTITION OF product FOR VALUES WITH (MODULUS 8, REMAINDER 0);
CREATE TABLE product_p1 PARTITION OF product FOR VALUES WITH (MODULUS 8, REMAINDER 1);
-- ... product_p7

The query WHERE id = 12345 reads only one partition. But a query without a filter on id still reads all eight.

The DEFAULT partition

A typical failure with RANGE partitioning: a new quarter arrives, the partition hasn't been created yet, and the INSERT fails with the error no partition of relation found for row. Data is lost, and the application spews errors.

The safeguard is the DEFAULT partition: it catches all rows that didn't land in any declared partition.

CREATE TABLE product_default PARTITION OF product DEFAULT;

Now the INSERT won't fail. But DEFAULT is insurance, not a place to store data. If a lot of rows accumulate there, then when you try to ATTACH PARTITION a new range, PostgreSQL will first scan the DEFAULT partition under a lock, and that can take a long time.

Working with DEFAULT correctly:

Create new partitions in advance — several periods ahead (via cron or pg_partman).
Set up an alert if anything shows up in DEFAULT — that means the auto-creation broke.
Before each ATTACH, make sure DEFAULT is empty.

How to choose a partition key

Three mandatory conditions:

The key is present in most queries — otherwise the planner can't cut off the extra partitions and will read all of them.
Data is distributed evenly — if 95% of rows land in one partition, there's no point.
The key almost never changes — changing the key physically moves the row from one partition to another, and on hot updates that's expensive.

One more restriction: the partition key must be part of the PRIMARY KEY and every UNIQUE index. That's exactly why the examples above use PRIMARY KEY (id, created_at), not just PRIMARY KEY (id).

Sharding

Sharding is when data is spread across several physical servers. Each server (shard) stores only part of the data. The application or a proxy layer knows which shard to go to for each specific query.

Sharding is needed when a single server can no longer cope: it's short on RAM, IOPS, or disk throughput. Partitioning won't solve this — the data is still physically on one machine.

Sharding strategies

By hash — the simplest. shard = hash(id) % N. Data is distributed evenly. The problem: when adding a new server, you need to move a significant portion of the data. This is solved with consistent hashing, which minimizes the number of rows moved.

By range — data with id from 0 to 10M on one shard, from 10M to 20M on another. It's easy to add new servers, but the last range is always "hot" — all new records go there.

By tenant — each client (tenant) gets its own shard. Convenient for SaaS: a large client gets a dedicated server, and data isolation is simple. Requires a routing table.

By geography — data for users in Europe, Asia, and the US is stored on servers in the corresponding regions. This reduces latency and helps with requirements to store data in specific countries.

How to choose a shard key

Everything that applies to the partition key, plus two additional requirements:

Related data should live on the same shard — if product is sharded by category_id and category is also sharded by category_id, then a JOIN between them stays local within the shard. Otherwise every JOIN turns into a distributed query.
Transactions should fit within a single shard — distributed transactions across several servers are slow and cope poorly with network failures.

The cost of sharding

Sharding adds complexity that partitioning doesn't have:

Distributed transactions — slow and unreliable. In practice they're avoided through design: one operation = one shard.
JOINs across shards — expensive, and almost always a sign of a wrong key choice. You have to denormalize data or synchronize it asynchronously.
Global uniqueness — BIGSERIAL doesn't work: counters on different shards diverge. You need UUIDs or special identifier generators.
Adding a new server — requires moving data. This is slow and needs special infrastructure (Citus, Vitess).
Routing — the application or proxy must know which shard to send a query to.

Partitioning vs sharding

	Partitioning	Sharding
Where the data is	Different tables in one database	Different servers
When it's needed	Table > 50–100 GB, index and VACUUM slowdowns	A single server can no longer handle it
Transparent to the application	Yes — one logical table	No — routing is needed
Transactions across segments	Ordinary local ones	Distributed — expensive
JOINs across segments	Cheap	Expensive
Changing the key	Expensive but possible	Practically impossible
Tools	`pg_partman`, scripts	Citus, Vitess, manual work

Practical rule: partition first, shard only when you've hit the ceiling of a single server. Partitioning solves 80% of large-table problems and is incomparably easier to manage.

A useful trick for those planning to shard in the future: partition by the same key you later want to shard by. Then the move to sharding is a matter of transferring ready-made partitions to other servers, without rewriting queries.

In short

Partitioning — one table, several physical pieces on a single server. Sharding — data on several servers.
RANGE — by range (dates, numbers). LIST — by specific values. HASH — even distribution.
The partition key must be part of the PRIMARY KEY and all UNIQUE indexes.
The partition key should be present in most queries, otherwise the planner reads all partitions.
The DEFAULT partition is insurance against data loss and is normally empty. New partitions are created in advance (via cron or pg_partman).
Sharding makes JOINs across shards and transactions across several shards expensive — this is avoided through design.
A global BIGSERIAL doesn't work with sharding — you need UUIDs or special generators.