← Back to the section

The database schema is the foundation. When the wrong types are chosen, problems pile up unnoticed: data loses its meaning after a server change, money drifts by a few cents, indexes stop working, and adding a value to an enumeration turns into a multi-release task.

Below are the mistakes that show up most often in real production schemas.

varchar(255) out of habit

In MySQL and Oracle the string length affects storage. Developers who came from those databases bring varchar(255) with them into PostgreSQL.

In PostgreSQL it buys you nothing. text and varchar(n) are stored identically — the length in varchar(n) is just a CHECK constraint, not an optimization.

The right way: use text for strings without a business limit on length. If there is a limit — write it explicitly and meaningfully: varchar(20) for a country code, not a magic 255.

timestamp without a time zone for business time

timestamp stores a "bare" time with no attachment to a time zone. A year later nobody remembers which zone the server was running in at the moment of the write. An order placed at 23:30 on Friday can turn into Sunday after the service is moved to a different host.

The right way: timestamptz — PostgreSQL converts the value to UTC on write and back on read. The zone is pinned down, and the data stays correct across any migration.

varchar(36) for UUID

A UUID as a string takes 36 bytes (with hyphens), doesn't work case-insensitively, and has no format check. You can store 'ABCD-...' and 'abcd-...' side by side.

The right way: the uuid type in PostgreSQL is 16 bytes, with no hyphens in its internal representation, and validates automatically on write.

float / real / double for money

Floating-point numbers can't precisely represent most decimal fractions. 0.1 + 0.2 in them is not equal to 0.3. On a single operation it's unnoticeable, but over a long chain of calculations the error accumulates — and shows up during reconciliation with the bank.

The right way: numeric(precision, scale) — exact decimal arithmetic with no rounding errors. For example, numeric(19, 4) for an amount in rubles.

serial / bigserial in a new schema

serial and bigserial are syntactic sugar from early versions of PostgreSQL. They create a sequence and attach it to the column in a non-obvious way: with pg_dump the link can be lost, and privileges on the sequence have to be granted separately.

The right way: since PostgreSQL 10+ use GENERATED ALWAYS AS IDENTITY. The behavior is the same, but the link to the column is explicit and is carried over correctly on dump.

-- deprecated variant
id bigserial PRIMARY KEY

-- modern variant
id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY

smallint 0/1 or char(1) Y/N instead of boolean

This shows up in schemas migrated from databases where boolean wasn't supported. PostgreSQL has it — it takes 1 byte, is convenient in SQL expressions, and reads well in queries.

The right way: use boolean. A query like WHERE is_active = true reads without decoding.

PG ENUM for a frequently changing list

An enumeration via CREATE TYPE ... AS ENUM works fine while the list of values is stable. But you cannot remove a value from a PG ENUM — only add one. Renaming is hard. Adding attributes (for example, a display name or a sort order) is impossible.

The right way: if the enumeration can grow, be renamed, or require attributes — use a reference table with an FK. It's a bit more code, but the flexibility pays off.

JSONB as a "flexible schema" for core fields

JSONB is convenient, so it sometimes gets used where plain columns are needed: metadata->>'email' instead of just email. The result is lost type safety, harder indexes, and no way to enforce required fields.

The right way: if a field is filtered or sorted on — it's a column. JSONB is justified for polymorphic data (different sets of attributes on different rows), optional fields, or rarely used JSON documents.

An array where a table should be

Storing order line items in jsonb[] or int[] seems convenient: everything in one row. But that's a table flipped on its side. You can't create an FK on array elements, you can't add attributes to an element, and updating a single element is hard.

The right way: objects with identity (line items, participants, attachments) belong in a separate table with an FK. An array is appropriate only for scalar values without identity: tags, locale codes, lists of strings.

valid_from / valid_to instead of a range type

Storing an interval as two separate columns seems obvious, but it's a trap: you can't add a non-overlap constraint at the database level without complex triggers, and a race condition on insert remains.

The right way: PostgreSQL supports range types — tstzrange, daterange, int4range, and others. Together with EXCLUDE USING gist they guarantee non-overlap at the database level:

ALTER TABLE price_periods
  ADD CONSTRAINT no_overlap
  EXCLUDE USING gist (product_id WITH =, valid_period WITH &&);

The money type

money in PostgreSQL is tied to the session locale — the same value reads differently depending on the settings. There is no currency code in it.

The right way: numeric(p, s) for the amount plus a separate currency char(3) column (ISO 4217 code, for example RUB, USD).

A type without a time zone on the application side

Even if the schema uses timestamptz, the driver may send a value with no time-zone attachment — and PostgreSQL interprets it in the zone of the current session. On a UTC server and on a developer's machine with a local zone, the very same line of code will produce different values in the database.

The right way: use a type with a time zone on the application side:

  • Java: Instant or OffsetDateTime (not LocalDateTime)
  • Go: time.Time (it always carries a zone)
  • Node.js: pass the date through pg with parseInputDatesAsUTC: true
  • Python: datetime with tzinfo (not a "naive" datetime)

Direct calls to system time and the UUID generator in code

If production code calls time.Now(), Instant.now(), uuid.New(), and the like directly, tests become non-deterministic: pinning down the time or the identifier without platform-level mocks is impossible.

The right way: wrap them in a service layer — ClockService, UuidGenerator, or an equivalent. In tests, substitute a deterministic implementation that returns fixed values.

UUID v4 for a primary key

UUID v4 is fully random. On insert, PostgreSQL is forced to find the right B-tree page, which may already have been evicted from the cache. On large tables this turns into constant random I/O and poor page packing.

The right way: UUID v7 is monotonic — it starts with a timestamp. Inserts go sequentially, pages pack densely, and the cache works more efficiently.

In short

  • text instead of varchar(255) — in PostgreSQL they are stored identically.
  • timestamptz for any business time — not timestamp.
  • numeric(p, s) for money — not float, not money.
  • uuid as a type — not varchar(36).
  • boolean — not smallint / char(1).
  • GENERATED ALWAYS AS IDENTITY in new schemas — not serial.
  • Enumerations that can grow — a reference table, not a PG ENUM.
  • JSONB — for polymorphic and optional data, not for core fields.
  • Objects with identity — a separate table, not an array.
  • Intervals — range types with EXCLUDE, not a pair of valid_from / valid_to columns.
  • A type with a time zone on the application side for timestamptz columns.
  • System time and UUID — through an isolatable service layer, not directly.
  • UUID v7 for primary keys — not v4.
  • String types in PostgreSQL
  • Time and time zones in PostgreSQL
  • UUID and identifiers in PostgreSQL
  • Numbers and precision in PostgreSQL
  • Enum, boolean, and enumerations in PostgreSQL
  • JSONB in PostgreSQL
  • Arrays and range types in PostgreSQL
  • Composite indexes in PostgreSQL
  • Zero-downtime migrations