When you need to search over text, the first impulse is to reach for Elasticsearch. But PostgreSQL can search over text out of the box, without a separate service. Let's look at how it works and when it's good enough.
Why plain LIKE doesn't work
The most obvious approach is WHERE body LIKE '%buyer%'. It has two drawbacks.
First — speed. When searching with a leading %, PostgreSQL can't use a regular index and has to scan every row. On a thousand records it's unnoticeable; on a million it's a disaster.
Second — grammar. The query LIKE '%buyer%' won't find the row «buying products», because the word form is different. Users type words in various forms, and LIKE doesn't account for that.
Full-text search solves both problems: it works through an index and understands stemming — reducing words to their root.
tsvector and tsquery — two key types
PostgreSQL stores processed text in a special type called tsvector. It's not just text but a list of lexemes — words reduced to their root form with positions recorded.
SELECT to_tsvector('english', 'Buyers choose products in the catalog');
-- 'buyer':1 'catalog':6 'choos':2 'product':3
«Buyers» became «buyer» and «choose» became «choos» — that's the root that any form of the word maps to. The stop-words «in» and «the» are dropped as insignificant.
A search query is stored in the tsquery type:
SELECT to_tsquery('english', 'buyer & catalog');
-- 'buyer' & 'catalog'
Match checking is done with the @@ operator:
SELECT to_tsvector('english', 'Buyers choose products in the catalog')
@@ to_tsquery('english', 'buyer & catalog');
-- true
How to store tsvector in a table
Computing to_tsvector() on every query is slow and has no index. The right approach is to store the vector separately and index it.
Generated column (PostgreSQL 12+)
CREATE TABLE article (
id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
title text NOT NULL,
body text NOT NULL,
search_doc tsvector GENERATED ALWAYS AS (
setweight(to_tsvector('english', coalesce(title, '')), 'A') ||
setweight(to_tsvector('english', coalesce(body, '')), 'B')
) STORED
);
CREATE INDEX ix_article_search_doc ON article USING gin (search_doc);
GENERATED ALWAYS AS ... STORED means PostgreSQL recomputes the column on every INSERT and UPDATE. You don't need to do it by hand.
setweight('A') marks tokens from the title as more important — this affects ranking. Weights A, B, C, D are available, from highest to lowest.
Trigger (PostgreSQL before 12)
CREATE TRIGGER article_search_update
BEFORE INSERT OR UPDATE ON article
FOR EACH ROW EXECUTE FUNCTION
tsvector_update_trigger(search_doc, 'pg_catalog.english', title, body);
How to search
A basic query with ranking:
SELECT id, title, ts_rank(search_doc, q) AS rank
FROM article, to_tsquery('english', 'buyer & product') q
WHERE search_doc @@ q
ORDER BY rank DESC
LIMIT 20;
ts_rank returns a number — the higher, the better the match. With an equal number of matches, documents where the words appeared in high-weight fields rank higher (A > B > C > D).
plainto_tsquery — for user input
to_tsquery requires proper syntax with the operators &, |, !. If arbitrary user input lands there, the query may fail with an error.
For user input, use plainto_tsquery — it interprets words joined by AND without any special syntax:
WHERE search_doc @@ plainto_tsquery('english', $1);
-- 'buy a new product' → 'buy' & 'new' & 'product'
websearch_to_tsquery — Google-style
Available since PostgreSQL 11. It understands a minus sign for excluding words and quotes for phrase search:
WHERE search_doc @@ websearch_to_tsquery('english', $1);
-- 'buyer -child "new catalog"'
Stemming configuration
PostgreSQL ships with several full-text search configurations:
SELECT cfgname FROM pg_ts_config;
-- simple, english, russian, german, ...
english— English-language stemming. «buyers», «buying», «buy» become one token, and search finds all forms.simple— lowercasing only, no stemming applied. Useful for codes, identifiers, SKUs.- other languages have their own configurations (
russian,german, and so on).
For English-language content you almost always want the english configuration.
Synonyms
If you want a search for «postgres» to also find «postgresql» and «pg», you can create a synonym dictionary:
CREATE TEXT SEARCH DICTIONARY my_synonyms (
template = synonym,
synonyms = 'my_synonyms'
);
CREATE TEXT SEARCH CONFIGURATION en_extended (COPY = english);
ALTER TEXT SEARCH CONFIGURATION en_extended
ALTER MAPPING FOR word, asciiword
WITH my_synonyms, english_stem;
In the file $SHAREDIR/tsearch_data/my_synonyms.syn:
postgresql postgres
postgres postgres pg
GIN vs GiST
Two index types are suitable for full-text search:
| GIN | GiST | |
|---|---|---|
| Size on disk | larger | smaller |
| Search speed | faster | slower |
| Update speed | slower | faster |
| False positives | none | yes (rechecks) |
In the vast majority of cases GIN is the choice — it's faster on search, and that usually matters more.
GIN accumulates changes in a buffer and flushes them in batches (fastupdate = on by default). For systems with rare inserts and frequent reads, the buffer can be turned off:
ALTER INDEX ix_article_search_doc SET (fastupdate = off);
Match highlighting
ts_headline highlights the found words directly in the text:
SELECT
id,
title,
ts_headline('english', body, q,
'StartSel=<mark>, StopSel=</mark>, MaxFragments=2, MaxWords=20')
AS snippet
FROM article, websearch_to_tsquery('english', $1) q
WHERE search_doc @@ q
ORDER BY ts_rank(search_doc, q) DESC
LIMIT 20;
MaxFragments=2 — show no more than two fragments; MaxWords=20 — the length of each fragment in words.
pg_trgm — fuzzy search and substrings
Full-text search won't help if you need to:
- find a product by part of its SKU (
LIKE '%ABC%'), - or correct a typo in a name (for example, «Robertsen» instead of «Robertson»).
For that there's the pg_trgm extension. It breaks a string into three-letter groups (trigrams) and builds an index over them:
CREATE EXTENSION pg_trgm;
CREATE INDEX ix_customer_name_trgm
ON customer USING gin (full_name gin_trgm_ops);
-- substring search with an index
SELECT * FROM customer WHERE full_name ILIKE '%robert%';
-- search with tolerance for typos
SELECT * FROM customer
WHERE similarity(full_name, 'robertsen') > 0.4
ORDER BY similarity(full_name, 'robertsen') DESC
LIMIT 10;
A good combination: FTS for long text (article body, description), pg_trgm for short fields with typos (names, brands, codes). On fields longer than 100 characters pg_trgm loses its value — FTS is better there.
Pagination via keyset
OFFSET + LIMIT is slow on deep pages: PostgreSQL still computes and sorts all matches, then discards the requested count. On the 10th page of 50 results that's wasted work.
Instead of OFFSET, use keyset pagination on the pair (rank, id):
SELECT id, title, ts_rank(search_doc, q) AS rank
FROM article, websearch_to_tsquery('english', $1) q
WHERE search_doc @@ q
AND (ts_rank(search_doc, q), id) < ($prev_rank, $prev_id)
ORDER BY rank DESC, id DESC
LIMIT 20;
$prev_rank and $prev_id are the values of the last record from the previous page.
When PostgreSQL FTS is enough, and when you need Elasticsearch
PostgreSQL FTS handles searching over articles, products, comments, and tickets well — up to about 10 million documents and a load of up to 100 queries per second.
Elasticsearch is worth considering if you need:
- volumes significantly larger than 10 million documents with complex ranking,
- automatic language detection in multilingual content,
- facets, aggregations, analytics over search results,
- sophisticated typo tolerance.
Common mistakes
Computing on the fly. Writing WHERE to_tsvector('english', body) @@ ... without a stored column means the query doesn't use an index and scans the whole table.
The simple configuration for natural-language text. Without stemming «buyer» and «buyers» are different words, and search won't find what you need.
No weights. If you don't use setweight, the title and body have the same priority — a document where the word appears only in the body will rank above a document with it in the title.
to_tsquery for user input. If a user enters a special character, the query fails with an error. Use plainto_tsquery or websearch_to_tsquery.
OFFSET on deep pages. Slow and inefficient — use keyset pagination.
In short
- PostgreSQL FTS works through an index and understands grammar — unlike
LIKE. tsvectoris the indexable representation of text,tsqueryis the search query,@@is the match operator.- The
englishconfiguration provides stemming: different forms of the same word count as one token. - Store
tsvectorin a generated column (GENERATED ALWAYS AS ... STORED) and index it with GIN. setweightsets field priorities — the title matters more than the body, and that's factored into ranking.- For user input, use
plainto_tsqueryorwebsearch_to_tsquery, notto_tsquery. ts_headlinehighlights matches in the text.pg_trgmcomplements FTS: fuzzy search and substring search on short fields.- For pagination, use keyset on
(rank, id), not OFFSET. - FTS is enough up to ~10M documents and ~100 RPS; for larger volumes, use Elasticsearch.
What to read next
- Search: PostgreSQL FTS or Elasticsearch — a detailed comparison and the criteria for choosing.
- Index types in PostgreSQL — GIN, GiST, and others in detail.
- JSONB in PostgreSQL — the GIN index works on jsonb too.