← Back to the section

Sometimes the shape of the data is not known in advance. Different products in a catalog have different attributes. Different events in an audit log have different fields. Adding a separate column for each case is inconvenient. It is exactly for cases like these that PostgreSQL can store JSON right inside a column.

json and jsonb — what's the difference

PostgreSQL offers two types: json and jsonb. The difference is fundamental.

json stores the text as-is — literally a string with curly braces. On every query PostgreSQL re-parses that string, which is slow. On the upside, json preserves the order of keys and any duplicates (rarely needed, but occasionally important).

jsonb stores the data in a parsed binary form — as a ready-made structure. Reading and searching through it is fast. Key order is not preserved, and duplicates are dropped (the last one wins).

In practice you almost always want jsonb. The exception is when it is essential to preserve the input string bit-for-bit (for example, for a cryptographic signature).

The main rule: column or JSON key

Before you tuck a field away in JSONB, it is worth asking one question: will this field be used for filtering, sorting, or JOINs?

If yes — it is a column, not a JSON key. A column is indexed, typed, and fast. A JSON key without a special index means a full table scan.

A common mistake is to dump everything into JSONB and then wonder why the queries are slow:

-- bad: email and status are fields that get searched constantly
CREATE TABLE customer (
    id   bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    data jsonb NOT NULL
);

SELECT * FROM customer WHERE data->>'email' = 'ivan@example.com'; -- full scan
SELECT * FROM customer WHERE data->>'status' = 'ACTIVE';          -- full scan

The right approach is to keep the main fields as columns, and put only rarely-used-in-conditions data into JSONB:

-- good: main fields as columns, flexible attributes in jsonb
CREATE TABLE customer (
    id         bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    email      citext      NOT NULL UNIQUE,
    status     varchar(20) NOT NULL,
    created_at timestamptz NOT NULL DEFAULT now(),
    metadata   jsonb       NOT NULL DEFAULT '{}'::jsonb
);

When JSONB really helps

Event log

The ORDER_PAID event carries one set of fields, USER_REGISTERED — another. You are not going to create hundreds of columns for every possible event type.

CREATE TABLE event_log (
    id           bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    aggregate_id uuid        NOT NULL,
    event_type   varchar(50) NOT NULL,
    occurred_at  timestamptz NOT NULL,
    payload      jsonb       NOT NULL
);

The event_type column is for filtering and indexing. The polymorphic content goes into payload.

Attributes of products in different categories

Sneakers have a size and a shoe-last width. Laptops have memory capacity and a screen diagonal. None of this needs to be stored as columns — 200 nullable fields are awkward both for queries and for reading the code.

CREATE TABLE product (
    id            bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    sku           varchar(50) NOT NULL UNIQUE,
    name          text NOT NULL,
    price         numeric(15, 2) NOT NULL,
    attributes    jsonb NOT NULL DEFAULT '{}'::jsonb
);

-- {"size": 42, "material": "leather"}
-- {"diagonal": 15.6, "ram_gb": 16}

Integration configurations

Every notification channel has its own settings. Storing them in a single JSONB column is much simpler than building a separate table for each type.

CREATE TABLE integration_channel (
    id     bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    kind   varchar(20) NOT NULL,   -- 'SMTP', 'TELEGRAM', 'WEBHOOK'
    config jsonb NOT NULL
);

Snapshot of an external API response

The structure is dictated by an external service — and it can change. JSONB captures "what the API returned" without forcing a schema change every time something changes.

Core operators

Working with JSONB in queries is convenient thanks to a handful of operators:

SELECT
    payload -> 'customer'           AS customer_obj,   -- returns jsonb
    payload ->> 'event_type'        AS type,           -- returns text
    payload #> '{order, items, 0}'  AS first_item,     -- path → jsonb
    payload #>> '{order, total}'    AS total_str       -- path → text
FROM event_log;

The -> operator gives a nested object (type jsonb), ->> gives a string (type text). For a nested path, use #> and #>>.

Check whether a document contains the required keys or a subset:

-- the document contains the subset
SELECT * FROM event_log WHERE payload @> '{"event_type": "ORDER_PAID"}'::jsonb;

-- the column contains a key
SELECT * FROM product WHERE attributes ? 'material';

-- contains at least one of the keys
SELECT * FROM product WHERE attributes ?| array['material', 'size'];

-- contains all of the listed keys
SELECT * FROM product WHERE attributes ?& array['material', 'size'];

Without an index, a JSONB search is a full scan. There are two options.

A GIN index is a good fit when you filter by different keys or nested objects via the @> operator. The jsonb_path_ops variant is smaller and faster than the standard GIN, but only works with @>:

CREATE INDEX event_log_payload_gin
    ON event_log USING gin (payload jsonb_path_ops);

-- now this runs fast
SELECT * FROM event_log WHERE payload @> '{"event_type": "ORDER_PAID"}';

A functional index is cheaper than GIN and better when you always search by one specific key:

CREATE INDEX event_log_event_type
    ON event_log ((payload ->> 'event_type'));

-- speeds up point lookups
SELECT * FROM event_log WHERE payload ->> 'event_type' = 'ORDER_PAID';

If you constantly filter by the same key, a functional index is preferable. GIN is needed when the conditions are varied or the structure is nested.

What you should not store in JSONB

JSONB is for small structured documents. A few things should not go in there:

  • Binary data (images, files) — use bytea or external storage for those.
  • Long text (articles, descriptions) — better use text; PostgreSQL will place large values efficiently on its own.
  • base64 data — this is binary content encoded into text with extra overhead.

If a JSONB field grows to tens of kilobytes per row, that is a signal to reconsider the design.

Working with JSONB in application code

The driver serializes the object to JSON before sending it to the database and deserializes it on read. It is better to work with typed objects — that way the structure is visible and it is clear what is required.

import com.fasterxml.jackson.databind.JsonNode;
import com.fasterxml.jackson.databind.ObjectMapper;
import org.jooq.Converter;
import org.jooq.JSONB;

public class JsonbConverter implements Converter<JSONB, JsonNode> {

    private final ObjectMapper mapper = new ObjectMapper();

    @Override
    public JsonNode from(JSONB db) {
        try { return db == null ? null : mapper.readTree(db.data()); }
        catch (Exception e) { throw new IllegalStateException(e); }
    }

    @Override
    public JSONB to(JsonNode node) {
        try { return node == null ? null : JSONB.valueOf(mapper.writeValueAsString(node)); }
        catch (Exception e) { throw new IllegalStateException(e); }
    }
}

public record EventPayload(UUID customerId, String eventType, Instant occurredAt, Map<String, Object> details) {}
import (
    "context"
    "encoding/json"
    "github.com/jackc/pgx/v5/pgxpool"
)

type EventPayload struct {
    CustomerID string         `json:"customerId"`
    EventType  string         `json:"eventType"`
    OccurredAt string         `json:"occurredAt"`
    Details    map[string]any `json:"details"`
}

func insertEvent(ctx context.Context, pool *pgxpool.Pool, p EventPayload) error {
    raw, err := json.Marshal(p)
    if err != nil {
        return err
    }
    _, err = pool.Exec(ctx,
        `INSERT INTO event_log (aggregate_id, event_type, occurred_at, payload)
         VALUES ($1, $2, now(), $3)`,
        p.CustomerID, p.EventType, raw,
    )
    return err
}

func loadPayload(ctx context.Context, pool *pgxpool.Pool, id int64) (EventPayload, error) {
    var raw []byte
    err := pool.QueryRow(ctx, `SELECT payload FROM event_log WHERE id = $1`, id).Scan(&raw)
    if err != nil {
        return EventPayload{}, err
    }
    var p EventPayload
    return p, json.Unmarshal(raw, &p)
}
import { Pool } from 'pg';

interface EventPayload {
  customerId: string;
  eventType: string;
  occurredAt: string;
  details: Record<string, unknown>;
}

const pool = new Pool();

async function insertEvent(p: EventPayload): Promise<void> {
  await pool.query(
    `INSERT INTO event_log (aggregate_id, event_type, occurred_at, payload)
     VALUES ($1, $2, now(), $3)`,
    [p.customerId, p.eventType, JSON.stringify(p)],
  );
}

async function loadPayload(id: number): Promise<EventPayload> {
  const { rows } = await pool.query<{ payload: EventPayload }>(
    `SELECT payload FROM event_log WHERE id = $1`,
    [id],
  );
  return rows[0].payload; // pg deserializes jsonb into an object automatically
}
from dataclasses import dataclass, asdict
from typing import Any
import psycopg
from psycopg.rows import dict_row


@dataclass
class EventPayload:
    customer_id: str
    event_type: str
    occurred_at: str
    details: dict[str, Any]


async def insert_event(conn: psycopg.AsyncConnection, p: EventPayload) -> None:
    await conn.execute(
        """
        INSERT INTO event_log (aggregate_id, event_type, occurred_at, payload)
        VALUES (%s, %s, now(), %s)
        """,
        (p.customer_id, p.event_type, asdict(p)),  # psycopg serializes the dict automatically
    )


async def load_payload(conn: psycopg.AsyncConnection, event_id: int) -> EventPayload:
    async with conn.cursor(row_factory=dict_row) as cur:
        await cur.execute("SELECT payload FROM event_log WHERE id = %s", (event_id,))
        row = await cur.fetchone()
        return EventPayload(**row["payload"])

Common mistakes

Everything in a single data jsonb column. It seems flexible, but in practice it means the loss of typing, indexes, and readability. A year later nobody knows what keys are in there and what is required.

A field that gets searched sits in JSONB without an index. Every such query is a full scan. Email, status, identifier — those are columns.

JSONB instead of migrations. Adding new "fields" through JSON keys to avoid writing ALTER TABLE is tempting but dangerous. The schema diverges between services, the data stops being validated, and nobody knows the schema.

Documents that are too large. JSONB is not meant for megabyte-sized objects. If a document grows, it is time to split it into columns or move it to separate storage.

In short

  • You almost always want jsonb, not json — binary storage, fast search, GIN support.
  • If a field is used for filtering, sorting, or JOINs — it is a column, not a JSON key.
  • JSONB is a good fit for: event logs, polymorphic attributes, integration configurations, external API snapshots.
  • -> returns jsonb, ->> returns text; @> checks for subset containment.
  • GIN with jsonb_path_ops speeds up @>; a functional index is cheaper for a single specific key.
  • Binary data, long text, and base64 do not belong in JSONB.
  • A document should not grow beyond a few kilobytes per row.
  • Arrays and range types in PostgreSQL — when an array beats JSONB for lists.
  • Indexes in PostgreSQL — more on GIN and other index types.
  • String types in PostgreSQL — when to use text instead of JSONB.