async fn apply_bootstrap_snapshot(
host_states: &mut HashMap<RolloutId, HostRolloutState>,
ctx: &ApplierCtx<'_>,
snapshot: HostRolloutSnapshot,
)Expand description
LIFT #3 + LIFT #4: apply a CP-supplied HostRolloutSnapshot to the agent’s in-memory reducer cache, then emit the worker re-priming effects the rehydrated state demands.
Snapshot-shape, not event-replay — the canonical state lives on CP,
the agent’s HostRolloutState is a reconstructable cache. Called from
two entry points: the boot-recovery handshake before workers spawn
(recovery.rs), and the steady-state heartbeat worker after CP
signals a fresh snapshot (workers/heartbeat.rs). Both paths share
this function so worker re-priming is consistent.
LOADBEARING: the merge is asymmetric. Canonical fields (state, target_closure, dispatch/activation timestamps, last_event_seq) always come from the snapshot. Agent-local-only fields that the wire snapshot does NOT carry (probes, probe_observed_first_at, probe_failure_first_at, failed_at, converged_at, etc.) are preserved from the existing entry when one is present, defaulted when not.
probe_failure_first_at in particular MUST survive a warm
heartbeat rehydration: LIFT #5 makes CP return bootstrap_rollouts
on every steady-state heartbeat (~60s cadence), and clobbering the
sustained-failure timer on each tick prevents Soaking → Failed
from ever firing (HEALTH_FAILURE_THRESHOLD_SECS = 120s, so a
60s clobber starves the timer indefinitely).
LOADBEARING: every non-Pending rehydration emits effects via
nixfleet_state_machine::rehydration_effects and routes them through
apply_effect — the same channel workers consume during ordinary
transitions. Without this, probe runners (and any future worker that
caches per-rollout state) keep tickers tagged with stale rollout_ids
from a prior process incarnation; the reducer rejects the resulting
events with LocalProbeResult not legal from state Converged.