← Back to the section

If you're moving to PostgreSQL from MySQL or Oracle, the first thing that surprises you is that here people almost always just write text, with no number in parentheses. Let's figure out why.

Where varchar(255) came from

In MySQL and older versions of Oracle, string types really did differ in how they were stored. varchar(255) physically took up a different amount of space on disk than text. That's why developers got used to specifying a length "just in case".

In PostgreSQL it doesn't work that way. Here text, varchar(n), and char(n) are stored identically — through the same internal mechanism. varchar(255) is just text with an extra check "no longer than 255 characters" before every write. The speed is the same, the disk space is the same.

Bottom line: varchar(255) in PostgreSQL is an outdated habit that adds a constraint without any technical benefit.

By default — text

For most string fields, the right type is text:

CREATE TABLE customer (
    id        bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    full_name text NOT NULL,
    email     text NOT NULL,
    bio       text
);

Why text is better than varchar(255):

  • If tomorrow you need to store a string longer than 255 characters — no table migration is required.
  • In older PostgreSQL versions, extending varchar(n) rewrote the whole table. With text there's no such problem.
  • It explicitly signals: the length is not governed by a domain rule.

When you do need to specify a length

It makes sense to set a length on a column when it's dictated by a real standard, not by a feeling that "there shouldn't be more than a thousand characters here".

Examples of justified constraints:

phone_e164   varchar(15)  NOT NULL,  -- E.164: at most 15 digits
country_code char(2)      NOT NULL,  -- ISO 3166-1 alpha-2: exactly 2 letters
currency     char(3)      NOT NULL,  -- ISO 4217: exactly 3 letters
inn          varchar(12)  NOT NULL,  -- INN: 10 digits (legal entity) or 12 (individual)

A counterexample — when the length is picked "out of thin air":

full_name    varchar(255),   -- why 255? what standard?
description  varchar(1000)   -- why 1000?

For full_name and description, the right choice is text, and the length constraint should be moved into the application code, where it's easy to change and test:

// Jakarta Validation
public record CreateCustomerCommand(
    @NotBlank @Size(max = 200) String fullName,
    @Size(max = 5000)          String description
) {}

That way the business constraint stays where it belongs — in the application logic. If the business decides that 300 characters are now allowed, you change one line of code, not the database schema.

char(n) — for fixed standards

char(n) is a fixed-length type. If the string is shorter than n, PostgreSQL pads it with spaces on the right.

CREATE TABLE foo (code char(5));
INSERT INTO foo VALUES ('AB');
SELECT '[' || code || ']' FROM foo;  -- '[AB   ]'

This creates unexpected problems:

  • length(code) returns 5, not 2.
  • When serialized to JSON, you get "AB " with trailing spaces.
  • Comparisons may behave in ways you don't expect.

When char(n) is appropriate:

  • char(2) — a country code per ISO 3166-1 (always exactly 2 letters).
  • char(3) — a currency code per ISO 4217 (always exactly 3 letters).

In all other cases — text or varchar(n).

Suppose you need to store an email and search by it case-insensitively — IVAN@EXAMPLE.COM and ivan@example.com should find the same record.

A regular index on email won't help here: the query WHERE LOWER(email) = LOWER(...) doesn't use the index on the original field and will scan the entire table.

There are two working approaches.

Approach 1: the citext extension

citext is a special PostgreSQL type that automatically ignores case when comparing:

CREATE EXTENSION IF NOT EXISTS citext;

CREATE TABLE account (
    id    bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    email citext NOT NULL UNIQUE
);

INSERT INTO account (email) VALUES ('ivan@example.com');
SELECT * FROM account WHERE email = 'IVAN@EXAMPLE.COM';  -- will find it

Pro: the code is cleaner, the index works automatically. Con: some drivers can't map citext and return it as text — you need to check the behavior of your specific driver.

Approach 2: text + a functional index

The field stays text, but the index is built on the result of lower():

CREATE TABLE account (
    id    bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    email text NOT NULL
);

CREATE UNIQUE INDEX account_email_lower_uk ON account (lower(email));

Then the query has to be written explicitly:

SELECT * FROM account WHERE lower(email) = lower('IVAN@EXAMPLE.COM');

Pro: works without extensions, portable. Con: every query must use lower() — if you forget, the index won't be used.

The UTF8 encoding

A PostgreSQL cluster has an encoding that is set when the database is created. To check the current one:

SHOW server_encoding;  -- we expect UTF8

If the encoding is not UTF8 (for example, SQL_ASCII or WIN1251), you run into problems with multilingual content and emoji. This happens on old installations created by outdated instructions.

UTF8 is the only correct encoding for a new cluster.

TOAST: long strings are stored automatically

If a table has a field with long text — an article, a product description, a biography — you don't need to move it into a separate table by hand.

PostgreSQL does this automatically through the TOAST mechanism: values longer than ~2 KB are physically stored separately from the main table row. For a query that doesn't touch the large field, PostgreSQL doesn't read it — and that speeds things up.

CREATE TABLE article (
    id      bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    slug    text NOT NULL UNIQUE,
    title   text NOT NULL,
    body    text NOT NULL    -- long text — TOAST handles it on its own
);

Splitting into article + article_body only makes sense if measurements reveal a real bottleneck. Without measurements — trust TOAST.

In short

  • In PostgreSQL, text, varchar(n), and char(n) are stored identically — only the semantics differ.
  • varchar(255) in PostgreSQL is an outdated habit from MySQL/Oracle; by default, write text.
  • Set a length (varchar(n)) when there's a domain standard: E.164, ISO 3166-1, and the like.
  • char(n) pads the string with spaces — use it only for strictly fixed codes from standards.
  • Case-insensitive search: citext or a functional index (lower(field)).
  • A plain LOWER(field) in WHERE without a functional index means a full table scan.
  • The cluster encoding should be UTF8.
  • TOAST moves long text into separate storage automatically — don't do it by hand without measurements.
  • Numbers and precision — bigint, numeric, float.
  • Time and time zones — timestamptz.
  • UUID and identifiers — the uuid type, not char(36).
  • JSONB — when it's justified, when it's not — for structured data.