Metrics — prometheus/client_golang, RED/USE и стандартные labels

Опирается на правила: R-OBS-MTR-1 … R-OBS-MTR-7 и R-OBS-MTR-X1 … R-OBS-MTR-X4 из Observability Style Guide → раздел 2. Metrics.

Важно знать

prometheus/client_golang + promauto — единственный инструментарий метрик; promauto авто-регистрирует коллекторы в prometheus.DefaultRegisterer.

Отдельный management-порт — /metrics, /health/live, /health/ready на отдельном *http.Server; бизнес-порт не expose-ится в scraper.

Стандартные labels service/env/version — через prometheus.Labels один раз при старте, не в каждой метрике.

RED для HTTP — через chi-middleware: CounterVec + HistogramVec, path = chi route pattern, не raw URL.

USE для ресурсов — через collectors.NewGoCollector() + collectors.NewProcessCollector(...).

Бизнес-метрики — promauto.NewCounterVec / promauto.NewHistogram в пакете домена.

Labels — низкая cardinality: status_class (3 значения), payment_method (CARD/SBP) ОК; user_id/order_id → OOM в Prometheus.

/metrics без auth публично — нарушение; только internal сеть через сетевую политику.

Метрики — основа понимания поведения сервиса под нагрузкой. RED (Rate, Errors, Duration) для request-driven потоков, USE (Utilization, Saturation, Errors) для ресурсов, плюс бизнес-метрики. В Go-стеке инструментарий — prometheus/client_golang с promauto, без Micrometer-прослойки.

Подключение

R-OBS-MTR-1: management-сервер с /metrics поднимается отдельно от бизнес-роутера:

// internal/platform/metrics/server.go
package metrics

import (
    "net/http"

    "github.com/prometheus/client_golang/prometheus/promhttp"
)

func StartManagement(addr string) *http.Server {
    mux := http.NewServeMux()
    mux.Handle("/metrics", promhttp.Handler())
    mux.HandleFunc("/health/live", liveHandler)
    mux.HandleFunc("/health/ready", readyHandler)
    return &http.Server{Addr: addr, Handler: mux}
}

В main.go запускаем оба сервера через errgroup:

// cmd/server/main.go
businessSrv := &http.Server{Addr: cfg.Addr, Handler: otelhttp.NewHandler(router, "order-service")}
managementSrv := metrics.StartManagement(cfg.ManagementAddr) // например, :9090

g, ctx := errgroup.WithContext(ctx)
g.Go(func() error { return businessSrv.ListenAndServe() })
g.Go(func() error { return managementSrv.ListenAndServe() })
if err := g.Wait(); err != nil {
    log.ErrorContext(ctx, "server_stopped", slog.String("error", err.Error()))
}

Prometheus scraper тянет /metrics каждые 15 секунд, складывает в TSDB. Бизнес-порт не expose-ируется в scraper.

Стандартные labels

R-OBS-MTR-2: labels service/env/version — через prometheus.Labels один раз при инициализации, не вручную на каждой метрике:

// internal/platform/metrics/common.go
package metrics

import (
    "github.com/prometheus/client_golang/prometheus"
    "github.com/prometheus/client_golang/prometheus/promauto"
)

var commonLabels = prometheus.Labels{
    "service": env.ServiceName, // из конфига при старте
    "env":     env.AppEnv,
    "version": env.Version,
}

CounterVec с commonLabels через MustCurryWith:

var ordersCreatedTotal = promauto.NewCounterVec(prometheus.CounterOpts{
    Name: "orders_created_total",
    Help: "Orders successfully created",
}, []string{"service", "env", "version", "payment_method"})

// инициализация с фиксированными common-labels:
var ordersCreated = ordersCreatedTotal.MustCurryWith(commonLabels)

// использование — только бизнес-label:
ordersCreated.With(prometheus.Labels{"payment_method": string(cmd.PaymentMethod)}).Inc()

Так в каждом With-вызове не нужно повторять service/env/version. Одна инициализация — чистый код везде.

RED для HTTP — chi-middleware

R-OBS-MTR-3: Rate/Errors/Duration через middleware; path берём из chi route context — не raw URL, иначе /orders/123 и /orders/456 создадут разные time series:

// internal/platform/middleware/metrics.go
package middleware

import (
    "net/http"
    "time"

    "github.com/go-chi/chi/v5"
    chimiddleware "github.com/go-chi/chi/v5/middleware"
    "github.com/prometheus/client_golang/prometheus"
    "github.com/prometheus/client_golang/prometheus/promauto"
)

var (
    httpRequestsTotal = promauto.NewCounterVec(prometheus.CounterOpts{
        Name: "http_requests_total",
        Help: "Total HTTP requests by method, path and status class",
    }, []string{"method", "path", "status_class"})

    httpRequestDurationSeconds = promauto.NewHistogramVec(prometheus.HistogramOpts{
        Name:    "http_request_duration_seconds",
        Help:    "HTTP request latency",
        Buckets: prometheus.DefBuckets,
    }, []string{"method", "path", "status_class"})
)

func Metrics(next http.Handler) http.Handler {
    return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
        ww := chimiddleware.NewWrapResponseWriter(w, r.ProtoMajor)
        start := time.Now()
        next.ServeHTTP(ww, r)

        path := chi.RouteContext(r.Context()).RoutePattern()
        status := statusClass(ww.Status())
        httpRequestsTotal.WithLabelValues(r.Method, path, status).Inc()
        httpRequestDurationSeconds.WithLabelValues(r.Method, path, status).Observe(time.Since(start).Seconds())
    })
}

func statusClass(code int) string {
    switch {
    case code < 400:
        return "success"
    case code < 500:
        return "client_error"
    default:
        return "server_error"
    }
}

Монтирование — после chi.RouteContext-middleware, иначе RoutePattern() вернёт пустую строку:

r := chi.NewRouter()
r.Use(RequestID)
r.Use(otelhttp.Middleware("order-service"))
r.Use(Metrics) // после otelhttp — span уже есть
r.Use(chimiddleware.Logger)

PromQL-запросы для дашборда:

# Rate — RPS по path
sum(rate(http_requests_total[5m])) by (path, method)

# Errors — доля 5xx
sum(rate(http_requests_total{status_class="server_error"}[5m])) by (path)
  / sum(rate(http_requests_total[5m])) by (path)

# Duration p95
histogram_quantile(0.95,
  sum by (le, path) (rate(http_request_duration_seconds_bucket[5m]))
)

USE для ресурсов

R-OBS-MTR-4: Go runtime и process-метрики через стандартные collectors:

// internal/platform/metrics/setup.go
import (
    "github.com/prometheus/client_golang/prometheus"
    "github.com/prometheus/client_golang/prometheus/collectors"
)

func RegisterCollectors() {
    prometheus.MustRegister(
        collectors.NewGoCollector(),      // goroutines, GC паузы, heap
        collectors.NewProcessCollector(collectors.ProcessCollectorOpts{}), // CPU, FD
    )
}

Для pgx-пула — кастомный Collector на основе pgxpool.Pool.Stat():

// internal/platform/metrics/pgx_collector.go
type pgxPoolCollector struct {
    pool       *pgxpool.Pool
    acquired   *prometheus.Desc
    idle       *prometheus.Desc
    totalConns *prometheus.Desc
}

func NewPgxPoolCollector(pool *pgxpool.Pool, service string) *pgxPoolCollector {
    labels := prometheus.Labels{"service": service}
    return &pgxPoolCollector{
        pool:       pool,
        acquired:   prometheus.NewDesc("pgx_pool_acquired_conns", "Acquired connections", nil, labels),
        idle:       prometheus.NewDesc("pgx_pool_idle_conns", "Idle connections", nil, labels),
        totalConns: prometheus.NewDesc("pgx_pool_total_conns", "Total connections", nil, labels),
    }
}

func (c *pgxPoolCollector) Describe(ch chan<- *prometheus.Desc) {
    ch <- c.acquired
    ch <- c.idle
    ch <- c.totalConns
}

func (c *pgxPoolCollector) Collect(ch chan<- prometheus.Metric) {
    stat := c.pool.Stat()
    ch <- prometheus.MustNewConstMetric(c.acquired, prometheus.GaugeValue, float64(stat.AcquiredConns()))
    ch <- prometheus.MustNewConstMetric(c.idle, prometheus.GaugeValue, float64(stat.IdleConns()))
    ch <- prometheus.MustNewConstMetric(c.totalConns, prometheus.GaugeValue, float64(stat.TotalConns()))
}

Ключевые метрики USE из коллекторов:

Метрика	Что показывает
`go_goroutines`	saturation горутин
`go_gc_duration_seconds`	GC паузы
`go_memstats_heap_inuse_bytes`	utilization heap
`process_open_fds`	open file descriptors
`pgx_pool_acquired_conns`	активные DB connections
`pgx_pool_total_conns`	размер пула (saturation)

Бизнес-метрики

R-OBS-MTR-5: каждый bounded context декларирует свои метрики рядом с UseCase-хендлерами:

// internal/order/metrics.go
package order

import (
    "github.com/prometheus/client_golang/prometheus"
    "github.com/prometheus/client_golang/prometheus/promauto"
)

var (
    ordersCreatedTotal = promauto.NewCounterVec(prometheus.CounterOpts{
        Name: "orders_created_total",
        Help: "Orders successfully created",
    }, []string{"payment_method"})

    orderAmountRub = promauto.NewHistogram(prometheus.HistogramOpts{
        Name:    "order_amount_rub",
        Help:    "Order amount in rubles",
        Buckets: []float64{100, 500, 1000, 5000, 10000, 50000},
    })

    orderConfirmFailedTotal = promauto.NewCounterVec(prometheus.CounterOpts{
        Name: "order_confirm_failed_total",
        Help: "Order confirmation failures by reason",
    }, []string{"reason"})
)

Применение в UseCase-хендлере:

// internal/order/usecase/create_order.go
func (h *CreateOrderHandler) Handle(ctx context.Context, cmd CreateOrderCommand) (*Order, error) {
    ctx, span := otel.Tracer("order").Start(ctx, "CreateOrder")
    defer span.End()

    order, err := Order.Create(cmd)
    if err != nil {
        return nil, fmt.Errorf("create order: %w", err)
    }

    if err := h.orders.Save(ctx, order); err != nil {
        span.RecordError(err)
        return nil, fmt.Errorf("save order: %w", err)
    }

    ordersCreatedTotal.WithLabelValues(string(cmd.PaymentMethod)).Inc()
    orderAmountRub.Observe(float64(order.AmountMinor) / 100)
    return order, nil
}

Метрики для product и customer по той же схеме:

// internal/product/metrics.go
var (
    productCacheHitsTotal = promauto.NewCounter(prometheus.CounterOpts{
        Name: "product_cache_hits_total",
        Help: "Product lookups served from cache",
    })
    productCacheMissesTotal = promauto.NewCounter(prometheus.CounterOpts{
        Name: "product_cache_misses_total",
        Help: "Product lookups that required DB fetch",
    })
)

// internal/customer/metrics.go
var (
    customerRegistrationsTotal = promauto.NewCounterVec(prometheus.CounterOpts{
        Name: "customer_registrations_total",
        Help: "Customer registrations by channel",
    }, []string{"channel"})
)

Типы коллекторов:

Counter — монотонно растущий: orders_created_total, payment_failed_total.
Gauge — текущее значение: размер очереди, активные сессии. promauto.NewGauge.
Histogram — распределение: суммы заказов, время обработки. promauto.NewHistogram.
CounterVec / HistogramVec — с label-измерениями: по методу оплаты, по каналу.

Имена метрик — snake_case с единицей

R-OBS-MTR-6: соглашение Prometheus — snake_case, единица в суффиксе:

orders_created_total             — Counter (суффикс _total обязателен)
payment_duration_seconds         — Histogram (время в секундах)
order_amount_rub                 — Histogram (единица валюты)
product_cache_hits_total         — Counter
pgx_pool_acquired_conns          — Gauge (без _total — не Counter)

orderCreatedCount                — нарушение: camelCase, нет _total
paymentTime                      — нарушение: нет единицы
orderAmount                      — нарушение: нет единицы

Низкая cardinality в labels

R-OBS-MTR-7: label value — категория, не уникальный идентификатор. Prometheus хранит отдельный time series на каждую комбинацию label values. Миллионы значений → миллионы time series → OOM.

Допустимые значения:

// ХОРОШО — несколько десятков уникальных значений максимум
ordersCreatedTotal.WithLabelValues("CARD").Inc()     // payment_method: CARD/SBP/CRYPTO
httpRequestsTotal.WithLabelValues("GET", "/orders", "success").Inc() // chi route pattern, не /orders/abc123
orderConfirmFailedTotal.WithLabelValues("insufficient_stock").Inc()  // reason: фиксированный набор

Недопустимые значения:

// ПЛОХО — миллионы time series, OOM
ordersCreatedTotal.WithLabelValues(cmd.OrderID).Inc()       // уникальный UUID
productCacheHitsTotal.WithLabelValues(r.RequestURI).Inc()  // raw URL: /products/SKU-123456

Для трассировки отдельных объектов (order_id, customer_id) — distributed tracing через OTel spans, не метрики. Span хранится один раз, не в TSDB.

Что запрещено

Антипаттерн	Правило	Что взамен
`user_id` / `order_id` / `request_id` как label value	`R-OBS-MTR-X1`	бизнес-категории с низкой cardinality (channel, payment_method)
`app=foo` вместо `service=foo`; нестандартные label-имена	`R-OBS-MTR-X2`	`prometheus.Labels{"service": ..., "env": ..., "version": ...}` единожды
`prometheus.NewCounterVec` без `.Register(prometheus.DefaultRegisterer)`	`R-OBS-MTR-X3`	`promauto.NewCounterVec` — регистрирует автоматически
`/metrics` на бизнес-порту без сетевой защиты	`R-OBS-MTR-X4`	отдельный management-сервер за сетевой политикой / VPN
raw URL (`/orders/abc123`) как label value в path	`R-OBS-MTR-7`	chi route pattern `/orders/{id}` через `chi.RouteContext`
`orderCreatedCount` (camelCase, нет _total)	`R-OBS-MTR-6`	`orders_created_total`
Histogram без единицы (`paymentTime`)	`R-OBS-MTR-6`	`payment_duration_seconds`

Куда дальше

Конфигурация — management-порт, slog-уровень в runtime, APP_ENV.
Context propagation — request_id и user_id в context.Context, горутины без разрыва ctx.
Health checks — liveness/readiness на management-порту, TTL-кеш probe.
Logging — slog JSON/Text, структурные поля, PII-правила.
SLO и алерты — multi-window burn-rate alerts, error budget по RED-метрикам.
Tracing — OTel spans для high-cardinality observability вместо label values.

Нормативные формулировки всех правил — раздел 2. Metrics.