An index helps you find the rows you need quickly. But after PostgreSQL has found them in the index, it usually goes one more time — to the table itself — to read the column values. A covering index lets you avoid that: all the data you need is right there in the index, and there's no need to touch the table.
Why PostgreSQL fetches data twice
A standard index in PostgreSQL is built as a tree (B-tree). The leaves of the tree store only the key columns and the row's address in the table (ctid). When PostgreSQL finds a row through the index, it reads that address and goes to the table for the remaining values.
This is called a heap fetch — an access to the heap (the heap is what the row storage in PostgreSQL is called). If the table is large and the data is scattered across many pages, each such access is a separate I/O operation.
An example scenario: a product listing. The query often looks like this:
SELECT name, price FROM product WHERE category_id = 1;
There's an index on category_id, PostgreSQL quickly finds the right rows, but then it goes to the table for name and price. On millions of rows, this is noticeable.
What an index-only scan is
An index-only scan is a mode in which PostgreSQL answers the query entirely from the index, without touching the table. There's one condition: every column the query needs (both in the WHERE and in the SELECT) must be in the index.
The problem with the naive approach: if you add name and price to the index key, they'll end up in every level of the tree, including the internal nodes. The tree becomes physically wider, takes up more space, and gets slower to search.
INCLUDE solves this elegantly: the additional columns are placed only in the leaves of the tree, not in the tree itself.
INCLUDE vs an extended key
Let's look at two versions of the same index:
-- Option 1: name and price go into the key — into every level of the tree
CREATE INDEX product_category_name_price_idx
ON product (category_id, name, price);
-- Option 2: name and price only in the leaves — the tree stays narrow
CREATE INDEX product_category_idx
ON product (category_id) INCLUDE (name, price);
The difference is fundamental:
Key columns (category_id, name, price) | Covering (category_id) INCLUDE (name, price) | |
|---|---|---|
| Where the extra columns live | Tree + leaves | Leaves only |
| Tree size | Larger | Smaller |
| Affects sorting | Yes — ORDER BY category_id, name with no sort | No |
Participates in UNIQUE | Yes — over the whole set | No — only over the key part |
Filtering by name | Index Cond | Filter (slower) |
Conclusion: if name and price are needed only in the SELECT, and not in the WHERE or the ORDER BY — use INCLUDE. If you also filter or sort by them, add them to the key.
How to check that the index-only scan works
EXPLAIN (ANALYZE, BUFFERS)
SELECT name, price FROM product WHERE category_id = 1;
A good result looks like this:
Index Only Scan using product_category_idx on product
Index Cond: (category_id = 1)
Heap Fetches: 0
Heap Fetches: 0 — PostgreSQL didn't go to the table. That's the goal.
Why Heap Fetches can be non-zero
There's an important subtlety here. An index-only scan works only when PostgreSQL is certain that the data in the index is up to date. That certainty comes from a special structure — the visibility map: it marks the table pages in which all rows are visible to any transaction.
The problem: the visibility map is updated by VACUUM. On tables with frequent updates and deletes, autovacuum may not keep up. Then PostgreSQL is forced to peek at the table anyway — to check whether the row is visible to the current transaction. As a result, Heap Fetches ≠ 0, and the whole point of the covering index is lost.
Diagnosis is simple: look at the number in the Heap Fetches line. If it's comparable to the number of rows returned, autovacuum isn't keeping up.
The fix is to make autovacuum more aggressive for that specific table:
ALTER TABLE product SET (
autovacuum_vacuum_scale_factor = 0.05,
autovacuum_vacuum_insert_scale_factor = 0.1
);
UNIQUE + INCLUDE — the most common case
A typical task: you need name to be unique within a category, but you always want to return price too when selecting.
Without INCLUDE, you have to choose:
-- Bad option: extend the uniqueness — but then the same name with a different price
-- becomes two different records. The logic is broken.
CREATE UNIQUE INDEX ON product (category_id, name, price);
-- Awkward option: two indexes instead of one — double the write load
CREATE UNIQUE INDEX ON product (category_id, name);
CREATE INDEX ON product (category_id, name, price);
With INCLUDE, one index does both jobs:
CREATE UNIQUE INDEX product_category_name_uq
ON product (category_id, name) INCLUDE (price);
Uniqueness is checked over (category_id, name). price is available for an index-only scan and doesn't participate in the uniqueness check. This is the most indisputable use of INCLUDE.
Limitations
INCLUDEis supported by B-tree, GiST, SP-GiST. GIN does not support it — and GIN is used for full-text search and JSONB.- You can't put expressions or functions in
INCLUDE:INCLUDE (lower(name))won't work. Only plain table columns. - If the table is small and fits entirely in PostgreSQL's memory — the heap pages are hot anyway, and the gain from a covering index is minimal.
SELECT *makes an index-only scan pointless: you can't put every column into the index.
In short
- A regular index finds rows, but for the column values it goes to the table.
INCLUDElets you put the needed columns in the index leaves and avoid that trip. INCLUDEplaces columns only in the leaves of the B-tree — the tree stays narrow and searches stay fast.- Key columns participate in sorting, uniqueness, and filtering. Columns in
INCLUDEare only for returning in theSELECT. - The goal is an index-only scan (
Heap Fetches: 0). It works when the visibility map marks pages as fully visible. - If
Heap Fetchesgrows — autovacuum isn't keeping up; tuneautovacuum_vacuum_scale_factorfor the table. UNIQUE (a, b) INCLUDE (c)is the only way to returncwithout extending the uniqueness.- GIN indexes don't support
INCLUDE.
What to read next
- ACID and isolation levels — about MVCC and the visibility map that an index-only scan depends on.
- Full-text search in PostgreSQL — GIN indexes, which covering doesn't support.
- Partitioning in PostgreSQL — how indexes work in partitioned tables.