SLOs & observability
The SLO catalogue, metrics, and dashboards.
By the end you’ll be able to
- Read the three default SLOs shipped by `@databridge/metrics-rollup`.
- Know how an SLO is evaluated against a rolled-up metric.
- Find the SLOs admin surface and the metrics endpoint.
DataBridge emits a steady stream of operational metric samples — `rule_run_ms` (how long a rule run took), `assistant_tokens_total` (how many tokens the assistant spent), `adapter_rows_total` (how many rows an adapter moved), and so on. `@databridge/metrics-rollup` turns that raw stream into something a dashboard can read: it buckets timestamped samples into fixed 5-minute / 1-hour / 24-hour windows and computes count / sum / min / max / avg / last per bucket.
An SLO is defined over one metric and one window. It names an `aggregate` (`avg`, `max`, `min`, `last`, `sum`), a `comparator` (`lt`, `lte`, `gt`, `gte`) and a `threshold`. `evaluateSlo` rolls the metric up over the SLO's window, reduces the buckets with the aggregate, and compares the result against the threshold. No samples at all is treated as vacuously `healthy`.
Three default SLOs ship in `DEFAULT_SLOS` (`packages/metrics-rollup/src/index.ts`): `rule-run-latency-p-ok` — `avg(rule_run_ms)` over `1h` `< 2000` ("average rule-run under 2s"); `assistant-spend-cap` — `sum(assistant_tokens_total)` over `24h` `< 5_000_000` ("daily assistant tokens under cap"); and `adapter-throughput` — `sum(adapter_rows_total)` over `1h` `>= 1` ("at least some adapter throughput hourly"). Customers can layer their own SLOs on top.
The metrics surface is documented in `docs/OPERATOR_GUIDE.md` §4.3: `GET /metrics` returns Prometheus exposition text (v0.0.4), and the key counters are `hesa_submissions_total`, `hesa_violations_total`, `hesa_signoffs_total` and `hesa_submissions_rejected_total`, plus the `hesa_last_submission_submittable` gauge. The wider observability story includes the `@databridge/observability-core` exporters (`observability-exporter-otlp-json`, `observability-exporter-prometheus`).
Dashboards are validated by `pnpm dashboards:check`, which is part of the freshness gate. If a dashboard references a metric name that no longer exists in code, the gate fails — that is what keeps observability honest as the surface evolves.
Walkthrough
- Open SLOs
1.Open the SLOs surface
The admin SLO browser lists the configured SLOs and their current evaluation. Walk through it once to see the shape.
- Open admin console
2.Tour the admin home
From the admin home you can hop across to webhooks, marketplace and waivers — every operator surface lives behind one nav.
- Open audit log
3.Check the audit log
The audit log is the other half of observability: a tamper-evident record of every meaningful action.
Your turn
Open the admin SLOs surface and confirm you can see the configured SLOs evaluating.
Hint: Use the 'Open the SLOs surface' step above.