A query runs slowly, but it's not clear why. Add an index? Rewrite the JOIN? Increase memory? One command answers all these questions — EXPLAIN ANALYZE. It shows exactly what PostgreSQL did while running the query and how much time it spent on each step.
How to run it
The minimal option is to just add EXPLAIN ANALYZE before the query:
EXPLAIN ANALYZE SELECT * FROM orders WHERE status = 'PAID';
The best option is to add BUFFERS so you can see how much data came from cache and how much from disk:
EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM orders WHERE status = 'PAID';
An important point: EXPLAIN ANALYZE actually runs the query. This is needed for an accurate measurement. If you want to check the plan for an UPDATE or DELETE without applying the changes, wrap it in a transaction:
BEGIN;
EXPLAIN ANALYZE UPDATE orders SET status = 'X' WHERE id = 123;
ROLLBACK;
How to read the output
Here is a sample output:
Hash Join (cost=12.34..567.89 rows=1000 width=64) (actual time=1.234..56.789 rows=987 loops=1)
Hash Cond: (a.id = b.a_id)
Buffers: shared hit=1234 read=56
-> Seq Scan on a
-> Hash
-> Seq Scan on b
What each part means:
cost=START..TOTAL— PostgreSQL's estimate in arbitrary units. These are not milliseconds.rows=N— how many rows PostgreSQL expected to get.actual time=START..TOTAL— the real time in milliseconds.actual rows=N loops=M— how many rows the node actually returned and how many times it ran.Buffers: shared hit=X read=Y—hitis data from cache,readis disk access.
You need to read the plan bottom-up. The lower nodes are the inner operations that run first. The root is the final result.
The first thing to look at is the discrepancy between rows= (estimate) and actual rows= (fact). If the estimate is ten or more times smaller than the real number of rows, PostgreSQL is building the plan on wrong data. The ANALYZE command on the table helps — it updates the statistics.
The second important point is loops. If a node ran many times, the real time needs to be multiplied: loops=10000 × actual time=0.5..1.2 means 12 seconds inside a single Nested Loop.
Table scan nodes
PostgreSQL chooses how to read a table based on the size of the result set, the presence of indexes and the cost settings.
Seq Scan — a full pass
Seq Scan on orders
Filter: (status = 'PAID')
Rows Removed by Filter: 850000
PostgreSQL reads the whole table and discards the unneeded rows. This is fine for small tables and for queries that select most of the rows. It's bad when Rows Removed by Filter is millions of times larger than the result — that's a sign you need an index on the filter field.
Index Scan — a pass over the index
Index Scan using ix_orders_status on orders
Index Cond: (status = 'PAID')
PostgreSQL walks the B-tree index, then reads the needed rows from the table. Pay attention to the difference between Index Cond: and Filter:. If a condition ended up in Filter: rather than Index Cond:, that means the index on this field is not being used as a search key — and the rows are still being scanned one by one.
Index Only Scan — index only, no table
Index Only Scan using ix_orders_status_created on orders
Index Cond: (status = 'PAID')
Heap Fetches: 12
If all the needed columns are in the index, PostgreSQL may not touch the table at all. This is the fastest option.
If Heap Fetches is greater than zero, PostgreSQL still goes to the table — the visibility map is stale and doesn't allow skipping the check. VACUUM on the table helps.
Bitmap Scan — a two-phase read
Bitmap Heap Scan on orders
Recheck Cond: (status = 'PAID')
-> Bitmap Index Scan on ix_orders_status
A two-phase process: first a bitmap of the pages that contain the needed rows is built, then those pages are read in order. This is more efficient than random reads for a medium-sized result set. A Bitmap also lets you combine several indexes (BitmapAnd, BitmapOr).
Recheck Cond: means the condition is re-checked — this happens when the bitmap stores only pages, not specific rows (a lossy bitmap due to lack of memory).
Joining tables
Nested Loop — a loop inside a loop
Nested Loop (rows=100 loops=1)
-> Index Scan on a (rows=100)
-> Index Scan on b (rows=1 loops=100)
Index Cond: (b.a_id = a.id)
For each row from the outer table, a match is looked up in the inner one. This works well when the outer result is small and the inner table has an index on the join key.
A common trap: loops=1000000 at actual time=0.5 — that's 500 seconds inside a single node. If the inner table is large and has no index, a Nested Loop degrades.
Hash Join — a hash table in memory
Hash Join
Hash Cond: (a.id = b.a_id)
-> Seq Scan on a
-> Hash
-> Seq Scan on b
The smaller table is loaded into a hash table in memory, then for each row from the larger one a fast lookup is done. Good for joining two large tables without suitable indexes.
If the output shows Batches: 2 or more — the hash table didn't fit in memory and PostgreSQL spilled to disk. Increasing work_mem for the session helps:
SET work_mem = '64MB';
EXPLAIN (ANALYZE, BUFFERS) SELECT ...;
Merge Join — merging two sorted streams
Merge Join
Merge Cond: (a.id = b.a_id)
-> Sort ...
-> Index Scan on ix_b_a_id
Both streams are sorted by the join key and merged like a zipper. Efficient for large volumes of data when both streams are already sorted (there's a suitable index) or the sort cost is justified.
Sorting and grouping
Sort — how to tell that data is going to disk
Sort
Sort Key: created_at DESC
Sort Method: external merge Disk: 8192kB
external merge Disk: means the sort didn't fit in memory and went to disk. This is slow. There are two solutions: increase work_mem, or add an index with the needed sort order so the Sort node disappears from the plan.
If the sort is in memory, you'll see quicksort or top-N heapsort (for ORDER BY ... LIMIT N).
Grouping
Aggregate— simple aggregates:count(*),sum().HashAggregate—GROUP BYvia a hash table in memory. Fast, as long as the data fits.GroupAggregate—GROUP BYafter sorting. Used whenHashAggregatedoesn't fit inwork_mem.
Parallel plan
Gather
Workers Planned: 2
Workers Launched: 2
-> Parallel Seq Scan on orders
PostgreSQL can split the work across several parallel workers. The count is set by the max_parallel_workers_per_gather parameter (2 by default). Small tables are not parallelized — the threshold is controlled by min_parallel_table_scan_size (8 MB by default).
Tools for complex plans
When a plan has five or more levels, reading it in text form is inconvenient. There are specialized tools for this:
- explain.depesz.com — color-highlights the nodes, shows the most expensive steps.
- explain.dalibo.com (PEV2) — a graphical plan tree right in the browser.
pg_stat_statements— an extension for collecting query statistics in production, without a manualEXPLAIN.
For PEV2 it's convenient to get the plan in JSON format:
EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON) SELECT ...;
In short
EXPLAIN (ANALYZE, BUFFERS)— the standard option: you can see the real time and IO.EXPLAIN ANALYZEruns the query; for DML useBEGIN ... ROLLBACK.- Read the plan bottom-up — the leaves run first.
- A node's real time =
actual time × loops— don't forget to multiply. - A discrepancy of 10x or more between
rows=andactual rows=→ runANALYZEon the table. - A large
Buffers: read=→ data is coming from disk, you need an index or more cache. - A condition in
Filter:instead ofIndex Cond:→ the index isn't working as a search key. Heap Fetches > 0on an Index Only Scan → runVACUUM.Batches > 1in a Hash Join orexternal merge Disk:in a Sort → increasework_mem.- A Nested Loop with a large
loops→ check for an index on the join key.
What to read next
- PostgreSQL index types — what should appear in the plan with the right index.
- Composite indexes — how an Index Only Scan depends on the index composition.
- Selectivity — why
rows=differs fromactual rows=. - VACUUM and autovacuum — how to eliminate
Heap Fetches. - Monitoring and slow queries —
pg_stat_statementsfor production.