Expand description
Per-kind probe runners (RFC-0007 §3.1). Each runner consumes a
ProbeDecl + returns a RunnerOutcome. Uniform strict-mode
semantics: any runtime error → ProbeStatus::Fail with a
failure_reason string. Per RFC-0007 §6 there is no Unknown or
“swallowed error” class.
Runners are pure (modulo I/O and the system clock) — they don’t
emit events; the probe worker handles event emission + state
tracking. Each runner is Send + 'static so it can be tokio::spawn’d.
Modules§
- evidence
- Evidence probe runner (RFC-0007 §3.1 + §7). READ-ONLY consumer of the local collector unit’s signed evidence file.
- exec
- Exec probe runner (RFC-0007 §3.1). Pass iff exit code 0 within
timeoutSecswallclock. Argv runs as the agent’s user; declare absolute paths to avoid PATH surprises. - http
- HTTP probe runner (RFC-0007 §3.1).
GET <url>withtimeoutSecswallclock budget; Pass iff response status matchesexpectStatus. Error classes that count as Fail (RFC-0007 §6 uniform strict mode): - tcp
- TCP probe runner (RFC-0007 §3.1). Pass iff
connect_timeout_secsTCP connect succeeds againsthost:port.hostdefaults to127.0.0.1if absent.
Structs§
- Control
Override Decl - Single entry in
controlOverrides/controls(RFC-0007 §3.4 per-control granularity).modeis the effective mode for the control;reasonis operator-facing audit rationale, surfaced in event_log + dashboards. - Probe
Decl - On-disk probe declaration. Loaded from
/etc/nixfleet/agent/health-checks.json(rendered fromlib/mk-fleet.nix:effectiveHealthChecksby_agent.nix). - Runner
Outcome - Output of one runner invocation.
Constants§
- FAILURE_
REASON_ MAX_ LEN - LOADBEARING: per-failure cap on
failure_reasonstring length keeps the wire body bounded. Without truncation, runners can emit arbitrarily long stderr / response bodies that inflate the outbound queue’s JSON payloads and event-log row sizes. Runners pass their failure-reason strings throughtruncate_reasonbefore constructing aRunnerOutcome::Fail. - MIN_
INTERVAL_ SECS - LOADBEARING: floor on probe interval guards against a misconfigured
0/1-second probe DOSing the host. Operator-declared
intervalSecondsvalues below this are rounded up at the worker layer (crate::runtime::workers::probe::spawnclamps viainterval_seconds.max(MIN_INTERVAL_SECS)). A weaker.max(1)floor would still let a 1-second HTTP probe issue 60 reqs/min against an operator-unintended backend.
Functions§
- default_
connect_ 🔒timeout_ secs - default_
evidence_ 🔒path - default_
expect_ 🔒status - default_
interval_ 🔒seconds - default_
timeout_ 🔒secs - run
- Dispatch on
decl.kind. Unknown kinds fail closed. - truncate_
reason - Truncate to
FAILURE_REASON_MAX_LENchars; appends"...[truncated]"when truncation fires. UTF-8 safe: bumpsendback to the prior char boundary if a multibyte sequence would be split.