Expand description
Prometheus counters surface - minimum viable set for alerting.
Three counters (auto-emit on event, monotonic) + one info gauge for
cp_build_info. No state-as-label gauges, no master/detail panel
drivers - those proved unworkable in Grafana’s table model and were
stripped (see fleet’s nixfleet-events.json which now reads from
Loki instead). What stays is the alerting surface:
nixfleet_compliance_failure_events_total{control_id, host}- per-control, per-host. Cardinality bounded by the closed compliance set (~16 controls) × hosts.nixfleet_runtime_gate_error_events_total- unlabeled. One global counter for the “agent couldn’t measure compliance” class.nixfleet_gate_block_total{gate}- one increment pergates::evaluate_for_hostblock.gatediscriminator is one of the kebab-case gate kinds (channel-edges / wave-promotion / host-edge / disruption-budget / compliance-wave). Drivesrate(...{gate="compliance-wave"}[5m]) > 0style alerts.nixfleet_cp_build_info{version, git_commit}=1- one series. Standard pattern (cf.kube_pod_info) for tracking the deployed CP version across scrapes. Re-emitted every render since the values are compile-time constants.
When the metrics feature is disabled, all functions in this module
are no-ops and neither dep is compiled in.
The exporter recorder is process-global and idempotent - first
install_recorder() wins. Tests can spin multiple test servers
without colliding.
idle_timeout deliberately NOT set: counters are cumulative and
must NEVER reset; the previous version applied idle eviction to
gauges, but with no gauges in this slim surface, idle eviction is
moot. cp_build_info is the only gauge and it’s re-emitted every
scrape via record_build_info().
Statics§
Functions§
- install_
recorder - Install the process-global Prometheus recorder. Idempotent - safe to call from each test’s server-spawn helper.
- record_
build_ info cp_build_info{version, git_commit}=1- the deployed CP version. Constants resolve at compile time; re-emit each scrape so it always renders.- record_
compliance_ event - Increment on
ComplianceFailureevent arrival in/v1/agent/report. Bounded labels: hosts × controls. No-op whenmetricsfeature off. - record_
gate_ block - Increment when
gates::evaluate_for_hostreturnsSome(GateBlock)at the dispatch endpoint.gate_kindis the kebab-case discriminator (channel-edges / wave-promotion / host-edge / disruption-budget / compliance-wave). - record_
runtime_ gate_ error - Increment on
RuntimeGateErrorevent arrival in/v1/agent/report.