Expand description
POST /v1/agent/heartbeat — agent liveness + drift detection
(RFC-0005 §4.3). Replaces the v0.1 POST /v1/agent/checkin flow.
The agent posts a minimal envelope (hostname, optional rollout_id, optional current_closure). CP:
- Authenticates via mTLS (existing
require_cn_layer). - Checks cert CN’s machine_id against body hostname (FORBIDDEN on mismatch — same shape as /v1/agent/events).
- Forwards a
HeartbeatReceivedinput to the reducer with a oneshot reply. - The reducer updates
last_heartbeat_at(in-memory) and compares the agent’scurrent_closureagainst the CP-mirror’scurrent_closurefield on the matching host_rollout_records row. Mismatch → reply containslast_event_seqfor Replay-From. - Response is 200 with optional
X-Nixfleet-Replay-From: <seq>header.
The 5-second reducer-reply timeout is generous: a healthy reducer processes a heartbeat input in microseconds. A timeout here signals reducer wedge (stuck applier? deadlock?) — log at error, return 503, agent retries.