NixFleet
Declarative NixOS fleet management. Three layers, one wire protocol, no daemons on the agent path.
What this book is
Curated guides: architecture, contracts, the operator cookbook, troubleshooting, the RFC set. Hand-written Markdown composed from the canonical sources under docs/{design,reference,operations,rfcs}/; this mdbook/ tree contains only thin includes so there is one source of truth per topic.
If you are new to the codebase, read Architecture first, then RFC-0003 for the wire protocol, then the relevant section of Contracts for whatever subsystem you are touching.
Building locally
nix run .#docs # build docs/mdbook/book/
nix run .#docs-serve # serve + open in a browser
Editing
Source files live under docs/{design,reference,operations,rfcs}/ - the mdbook/src/ directory contains only {{#include}} wrappers. Edit the canonical file in its docs/<area>/ location, rerun nix run .#docs, the book picks up the change.
nixfleet architecture: declarative, signed, sovereign
Design principle. The control plane is a caching router for signed declarative intent. It holds no secrets, forges no trust, and can be rebuilt from empty state without data loss.
Every structural decision below serves that inversion of trust. In today’s nixfleet, the control plane is the source of truth - compromise it, and the fleet follows wherever it points. In this design, truth lives in git and in signing keys; the control plane only moves already-signed artifacts around. Destroying the control plane is an outage, not a breach. Rebuilding it from the flake and the signed artifacts in storage gives you back the same fleet.
This document consolidates the v0.2 design: the spine, the RFCs, the Rust/Nix boundary, the content-addressing generalization, and the supporting homelab infrastructure into a single architecture with a single build order.
1. Components
Each component below has a defined role, a defined owner, and a defined trust property. Components only interact through versioned, typed boundaries.
1.1 The flake (source of truth)
Git-tracked, hosted on a self-run Forgejo instance on the coordinator. Contains:
nixosConfigurations.<host>- per-host NixOS modules.fleetflake output - produced bymkFleet { ... }per RFC-0001; describes hosts, tags, channels, rollout policies, edges, disruption budgets.age.secrets.<name>- secrets encrypted per-recipient at rest, declared alongside the fleet.nixfleet.compliance.controls.<name>- typed controls with staticevaluateand runtimeprobeprojections.
Trust role: primary trust root for intent. A commit that passes review IS the desired state. No other place in the system can claim “the fleet should be X” without a corresponding commit.
Framework Nix surface (mkFleet, mkHost, hostSpec, scopes)
The framework exposes two builders: nixfleet.lib.mkFleet (declarative fleet topology — hosts + channels + rollout policies; the typical operator path) and nixfleet.lib.mkHost (the underlying per-host primitive, also usable directly for one-off setups). The mkFleet wrapper iterates hosts and calls mkHost per host with hostName, platform, and fleetResolved pre-bound; built configurations surface as fleet.nixosConfigurations. Operators using mkHost directly pass fleetResolved themselves (or omit it on one-off rigs where probe topology isn’t load-bearing).
mkHost takes a typed hostSpec identity record plus a list of consumer modules and returns a nixosSystem or darwinSystem; it does not impose a fleet/org/role DSL above hostSpec. An auto-discovered set of service modules under modules/scopes/ self-activate via services.<name>.enable options gated by lib.mkIf. Adding a new scope requires no mkHost change; inactive scopes cost zero at evaluation. Roles, when used, are scope bundles defined in consuming fleets that set enable defaults with lib.mkDefault; the framework itself has no “role” concept.
hostSpec carries identity and locale data only — hostname, primary user, home directory, timezone, locale, platform marker, root access keys. Behaviour belongs to scopes.
The agent and control plane are themselves NixOS service modules (services.nixfleet-agent, services.nixfleet-control-plane), not opinionated profiles. Host operators stay in charge of firewall, persistence, and TLS posture; framework concerns stay in the services.* namespace, with secrets wired through the consumer’s chosen backend (agenix, sops, vault). Fleet repos extend hostSpec with their own opinionated capability flags (isGraphical, isDev, theme) by declaring additional options in plain NixOS modules passed via nixosArgs.modules on a host — the NixOS module system merges option declarations, so consumer extensions compose with framework-defined options without modifying the framework.
1.2 Continuous integration (the intent-signing oracle)
Runs on the coordinator (Hercules CI agent, or Forgejo Actions with a self-hosted runner). On every commit to a watched branch:
- Evaluates the flake; builds every host’s closure.
- Runs static compliance gates (
type = staticcontrols evaluated against eachconfig). Failure aborts the pipeline; no release is produced. - Pushes closures to attic, which signs them with its ed25519 private key.
- Produces
fleet.resolved.json(RFC-0001 §4.1 projection) and signs it with the CI release key. - Updates channel pointers (
stable,edge-slow, …) to the new git ref, committing the signed artifact set.
Trust role: converts reviewed-and-merged commits into signed releases. CI key lives in an HSM, ideally on the coordinator with a TPM-backed keyslot. Rotation is a documented procedure, not an incident response.
1.3 Attic binary cache
Runs on the coordinator. Stores every closure CI produces, content-addressed by sha256, signed with its own ed25519 key. Clients verify signatures against a pinned public key embedded in their NixOS config.
Trust role: self-verifying content store. A compromised attic host cannot forge closures: the signing key is the trust root, not the host. An attacker who steals attic’s disk learns what closures have been built; they cannot inject malicious ones into any host.
1.4 Control plane (the router)
Rust/Axum service, SQLite for operational state, mTLS for all incoming connections. What it does:
- Polls the git forge for channel-ref updates (or receives webhooks).
- Fetches the signed
fleet.resolved.jsonfor each channel rev; verifies the CI signature; if it doesn’t verify, refuses to reconcile. - Runs the reconciler (RFC-0002 §4 decision procedure) on each tick.
- Serves agent check-ins (RFC-0003): tells each host its current target closure hash, current rollout membership, expected probes.
- Records observed state (last check-in, current generation, probe results) as a cache of what agents have reported.
What it does not do:
- Hold any secret material (all secrets are agenix-encrypted in the flake).
- Sign anything that a host is asked to trust (closures -> attic; intent -> CI; probe outputs -> hosts).
- Store anything that cannot be recomputed from git + attic + agent check-ins.
Trust role: router. Compromise yields at worst a denial of service (refuse to propagate updates) or a replay attack (point hosts at stale-but-valid closures). Cannot inject code, cannot read secrets, cannot forge compliance evidence.
Destroying the control plane and rebuilding from scratch: re-pull fleet.resolved from git, re-fetch channel refs, let agents check in on their next poll cycle. Operational state reconstructs within one reconcile tick per channel.
Scaling envelope
The CP’s SQLite handle is wrapped in tokio::sync::Mutex<rusqlite::Connection>. WAL mode is enabled, so reads proceed while a write is in flight at the file level, but every operation that goes through the mutex serializes on the mutex itself. The current factoring is sized for fleets of O(100) hosts checking in at the configured polling cadence (default 60s with jitter); past ~150 hosts, dispatch bursts and report ingestion start to contend on the mutex and p99 dispatch latency can rise above one polling cycle. The bound is conservative, not load-tested, and intentionally invisible to operators today beyond the host-count log emitted on snapshot prime.
The path past the bound is a connection pool (deadpool-sqlite - same rusqlite::Connection surface, tokio-native async fn get()), scoped to when measurable contention appears: fleet size > 150, p99 dispatch_for_host exceeding the polling cycle in steady state, or operator-visible queueing in the journal. Migration is a wrapper swap plus an await per use site - same SQL, same schema, same behaviour, multi-connection on the inside. The mutex is the v0.2 commitment; the pool is the v0.3 trigger.
1.5 Agent (the actuator)
Rust daemon running on every managed host. Single-binary, minimal dependencies. What it does:
- Polls the control plane over mTLS at the channel’s declared cadence.
- On a new target: fetches the closure from attic (not from the control plane), verifies attic’s signature, verifies the hash.
- Decrypts host-scoped secrets from the flake using the host’s private ed25519 (SSH host key).
- Runs
nixos-rebuild switch. Opens the magic-rollback confirm window. - On post-activation boot: phones home with
bootId+ probe results. On silence past the window: auto-rollback. - Reports current generation + probe outcomes at next check-in.
Self-switch resilience. When the new generation changes the agent itself, switch-to-configuration switch must complete after systemd stops the agent’s own cgroup. The agent’s apply path is fire-and-forget: the switch is queued in a detached transient systemd unit (systemd-run --unit=nixfleet-switch) before activation begins, so systemd stopping the agent does not kill the in-flight activation. The agent does not wait on the child; it polls /run/current-system until the symlink matches the desired generation, with a bounded timeout. If the agent is killed mid-poll, the new agent re-runs at startup and reconciles state by reading the active generation. The same mechanism handles rollback. The carve-out: switch inhibitors (dbus, systemd, kernel, init swaps) trip an inline pre-check that downgrades to nix-env --set only and posts ActivationDeferred, leaving the new generation to activate on next reboot - see ./contracts.md §I.7.
What it does not do:
- Accept arbitrary commands from the control plane. The vocabulary is only “your target is closure
sha256-X”. Not “run this shell snippet”, ever. - Trust the control plane’s closure recommendation without signature verification against attic’s pinned key.
- Hold long-lived credentials beyond its mTLS client cert (short-lived, auto-rotating) and its SSH host key (machine-lifetime).
Trust role: local decision-maker. The agent is the last line of defense against a compromised control plane. If signatures don’t verify, it refuses. If the magic-rollback window closes silently, it reverts. Every decision is made with information the agent can independently verify.
1.6 Compliance framework (enforceable evidence)
nixfleet-compliance repo. Controls declared as typed units with two projections:
evaluate :: config -> { passed, evidence }- pure, runs at CI time. Violations fail static gate; no release produced.probe :: { command, expectedShape, schemaVersion }- descriptor consumed by the agent post-activation. Output is canonicalized and signed by the host’s key, producing non-repudiable evidence.
Every control belongs to one or more frameworks (ANSSI-BP-028, NIS2, DORA, ISO 27001). A channel’s compliance.frameworks list enforces the union of controls.
Trust role: turns NixOS configuration into auditable, content-addressed evidence. The chain: host key signs probe output -> closure hash pins what was running -> git commit pins what was intended. An auditor verifies the whole chain without trusting the control plane, the CI runner, or the operator.
1.7 Secrets (zero-knowledge ferrying)
agenix-style: secrets encrypted per-recipient in git. Recipients are host SSH pubkeys, declared in fleet.nix under secrets.<name>.recipients. Ciphertext ships as part of the closure or as separate content-addressed blobs. Decryption happens on the target host, using its private SSH host key, into tmpfs only.
Trust role: eliminates the control plane from the secret path entirely. A fully-public flake repo combined with good host key hygiene gives you the same secrecy guarantees as a locked-down vault. Rotation = re-encrypt + commit + redeploy.
1.8 Test fabric (microvm.nix)
In-flake fixture. Each scenario declares N microvms (cloud-hypervisor, shared Nix store via virtiofs), a stub control plane, and an expected action plan. Exercises: clean rollout, canary rollback on probe failure, agent offline during rollout, host key rotation, cert revocation, compromised-control-plane simulation (swap signing key, verify hosts refuse).
Runs in nix flake check on PR for small scenarios (10 hosts); nightly for larger (50).
Trust role: the only honest way to know the protocol is correct. Every state machine in RFC-0002 must have fixture coverage. No transition lands without a test that exercises it. The reconciler is a pure function (§2 below); there’s no excuse for not testing it exhaustively.
2. The Nix / Rust boundary
Nix owns evaluation. mkFleet, selector algebra, compliance control declarations, secret recipient lists. Produces signed artifacts at CI time. Never called at runtime.
Rust owns execution. Reconciler, state machines, agent protocol, activation, probe running, CLI. Takes signed artifacts as input; never evaluates Nix.
Boundaries. Three typed, versioned contracts:
fleet.resolved.json- Nix -> Rust, via CI, signed.- Compliance probe descriptors - Nix -> Rust, embedded in closures, schema-versioned.
- Agent/control-plane wire protocol - Rust ↔ Rust, versioned in header.
Crossing a boundary always means a version check and a signature verification (where applicable). Nothing is trusted by proximity.
3. The main flow
The happy path, one commit from push to all hosts converged:
1. operator ─── git push ──────────────▶ Forgejo
│
2. Forgejo ─── webhook ────────────────▶ CI
│
3. CI evaluates flake -> builds closures per host
CI runs static compliance gate
CI pushes closures -> attic (signs)
CI produces fleet.resolved.json (signs)
CI updates channel pointer, commits
│
4. control plane polls/receives ◀───── git ref change
verifies fleet.resolved signature
reconciler emits action plan for new rollout
│
5. agent (workstation, canary wave) long-polls ─▶ control plane
control plane responds on /v1/agent/dispatch:
target = sha256-X, rollout R, wave 0
│
6. agent fetches sha256-X from attic
verifies attic signature, verifies hash
decrypts host-scoped secrets locally
activates -> confirm window opens
│
7. agent boots new generation
runs runtime probes, signs outputs with host key
posts ActivationCompleted to /v1/agent/events
control plane reduces the event into rollout state
│
8. soak elapses -> wave 0 promoted -> wave 1 begins
attic-01 receives dispatch; same sequence
│
9. wave 1 converges -> rollout Converged
channel's lastRolledRef updated to new rev
Nothing in this flow requires trusting the control plane with anything it shouldn’t have. The control plane knows: which hosts exist, which closure hash each should run, which rollouts are in flight, what check-ins have happened. It does not know: what’s in the closures, what’s in the secrets, whether the probe outputs were forged (it can verify via host keys, but it could not fabricate them).
4. The trust flow
Independent of the operational flow, trace where trust originates and where it’s verified. This is the diagram that should stay true forever:
trust origins (signing keys, offline, rotatable):
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ CI release key │ │ attic cache key│ │ org root key │
│ (signs fleet. │ │ (signs closures│ │ (signs bootstrap│
│ resolved) │ │ │ │ tokens) │
└────────┬────────┘ └────────┬────────┘ └────────┬────────┘
│ │ │
│ │ │
trust per-host (derived, short-lived):
│ │ │
│ ┌────────┴────────┐ │
│ │ host SSH key │ │
│ │ (signs probe │ │
│ │ outputs, │ │
│ │ decrypts │ │
│ │ secrets) │ │
│ └────────┬────────┘ │
│ │ │
│ ┌────────┴────────┐ │
│ │ agent mTLS cert│ │
│ │ (short-lived, │ │
│ │ derived from │ │
│ │ host key at │ │
│ │ enrollment) │◀───────────┘
│ └─────────────────┘
│
verification happens everywhere (runtime, cheap):
agents verify attic signatures on every closure fetch.
agents verify CI signatures on every fleet.resolved fetch (if fetched directly).
control plane verifies CI signatures before reconciling new revisions.
control plane verifies agent mTLS certs on every check-in.
auditors verify host-key signatures on probe outputs post-hoc.
Four keys. Everything else is derived. Compromise of any derived credential has a bounded blast radius because the roots are separate.
5. The failure cases
The design earns its keep when things go wrong. Walking through the scenarios:
Control plane host is compromised (attacker has root on the VM hosting Axum/SQLite). Attacker cannot: read secrets, forge closures, inject malicious code. Can: refuse to serve updates (DoS), serve stale-but-valid targets (replay). Mitigation: agents refuse to accept targets older than a configurable freshness window signed by CI.
Attic cache host is compromised. Attacker cannot forge closures (signing key is the trust root). Can: delete closures (hosts fall back to building locally if builders are present, else stall). Can: learn what closures exist (metadata leak). Disk loss is recoverable from CI artifacts.
CI runner is compromised. Serious - attacker can sign releases. Mitigation: CI key in HSM, CI runner in restricted environment, signing operation requires hardware confirmation. Detection: anomalous release signatures (signed outside normal CI run time) trip alerts. Recovery: revoke CI key, re-sign from clean environment, all agents refuse old-key artifacts.
Host is compromised (root on the target machine). Attacker can: read secrets decrypted for that host, forge probe outputs signed with that host’s key. Cannot: affect other hosts, modify the control plane’s view of the fleet. Detection: probe outputs from a compromised host might show inconsistencies that trigger runtime gates. Mitigation: TPM-backed host keys make key extraction hard; short-lived agent mTLS certs limit persistence.
Operator is compromised / malicious. If they have git commit access: can push any config. Mitigation: protected branches, mandatory review, CI static compliance gate catches obviously-bad configs (SSH password auth, disabled firewall, etc.) before release. Post-hoc: git history is the audit log.
Network partition mid-rollout. Agents cache last known desired state, continue operating. Magic rollback handles post-activation failures locally. Rollout pauses until partition heals; disruption budgets prevent cascade.
6. CP-resident state by recovery profile
Every SQLite table the CP keeps falls into one of two recovery classes. The classification is load-bearing for done-criterion #1 of §7: rebuilding the CP from empty state must restore the fleet’s desired-state guarantees within one reconcile cycle, not just “approximately reach steady state eventually”.
-
Soft state — recoverable from agent inputs on the next checkin cycle, or acceptable as a one-window operational regression:
token_replay— bootstrap nonces with 24h TTL. Loss extends the replay window by up to one TTL. Bounded; no breach.pending_confirms— in-flight activation deadlines. Loss could force the agent into an unnecessary local rollback when its confirm POST hits a 410. Mitigated by orphan-confirm recovery: when the agent’s reportedclosure_hashmatches the verified target, the handler synthesises a confirmed row and returns 204 instead of 410.host_rollout_state— per-host soak markers. Loss restarts soak windows from zero. Mitigated by agent-attestedlast_confirmed_at: the agent persists the moment of its most recent successful confirm and echoes it on every checkin; the CP repopulateslast_healthy_sincefrom the attestation, clamped tomin(now, attested).host_reports— SQLite-backed. Hydrated at boot; outstandingComplianceFailure/RuntimeGateErrorevents survive CP restarts so the wave-promotion gate stays armed across the unlock window. Soft only because individual late-arriving reports retry on the next checkin.
-
Hard state — must come from signed artifacts pre-existing in git or from operator-declared trust roots:
cert_revocations— agent-cert revocation list. Loss is a security regression — previously-revoked certs become valid again. Mitigated by the signedrevocations.jsonsidecar: operator commits revocations to the fleet repo, CI signs the artifact with the sameciReleaseKeythat signsfleet.resolved.json, the CP fetches + verifies + replays on every reconcile tick. Recovery from empty is “one tick later, table populated from the signed artifact.”trust.json— the trust roots themselves. Sourced from the flake at build time; rebuildable as long as the flake survives. A deferred TPM-bound issuance CA is tracked as future work.
The principle: the CP holds nothing whose loss creates a security regression on rebuild, and nothing whose loss creates more than a one-window operational regression. Orphan-confirm recovery, the last_confirmed_at attestation, and the signed revocations sidecar are what make it true.
7. When is it actually done
Four falsifiable statements. If any is false, the design hasn’t landed:
- Destroying the control plane’s database and rebuilding from empty state results in full fleet visibility within one reconcile cycle, with zero operator intervention beyond restarting the service. Strict reading: every CP-resident table either repopulates from agent inputs (soft state —
token_replay,pending_confirms,host_rollout_state) or from a signed artifact in git (hard state —cert_revocationsvia the signedrevocations.jsonsidecar,trust.jsonvia the flake). See §6 for the per-table classification. - An auditor can be handed a host’s hostname + a date range, and - without access to the control plane - produce a cryptographically-verifiable statement of “on this date, this host ran closure sha256-X, which was built from commit Y, and passed compliance controls Z₁..Zₙ with signed probe outputs matching the declared schemas”.
- The control plane’s disk contents, stolen in their entirety, yield zero plaintext secret material.
- A deliberately-corrupted closure pushed to attic (bypassing CI) is rejected by every agent; a deliberately-modified
fleet.resolvedserved by the control plane is rejected by the control plane’s own signature verification.
If all four hold, the slogan is true. If not, find the gap and close it before calling the framework done.
8. Source tree map
nixfleet/
├── flake.nix <- entry point, inputs, flake-parts wiring
├── Cargo.toml <- Rust workspace root
├── crane-workspace.nix <- Nix wrapper around crane for Rust builds
│
├── README.md, CHANGELOG.md, etc. <- consumer-facing docs (root meta-files)
├── SECURITY.md, CONTRIBUTING.md, CODE_OF_CONDUCT.md, LICENSE-*
│
├── contracts/ <- schemas. Top-level so import-tree skips
│ ├── host-spec.nix │ them. They declare options; impls
│ ├── persistence.nix │ satisfy them. NO mechanism here.
│ └── trust.nix ↓
│
├── impls/ <- pluggable contract implementations,
│ ├── persistence/impermanence.nix
│ ├── keyslots/tpm/
│ ├── gitops/forgejo.nix
│ └── secrets/default.nix ↑ exposed as flake.scopes.<family>.<impl>
│
├── lib/ <- public API (mkHost, mkFleet, ...)
│ ├── default.nix │ wired entry: imports flake inputs
│ ├── mk-fleet.nix │ pure entry: just nixpkgs lib
│ ├── mk-host.nix │
│ └── mk-vm-apps.nix ↓
│
├── modules/ <- flake-parts modules (auto-imported by
│ ├── flake-module.nix │ import-tree, except _-prefixed files)
│ ├── apps.nix │ These declare flake outputs:
│ ├── formatter.nix │ flake.lib, .scopes, .nixosModules
│ ├── options-doc.nix │ perSystem.apps, .packages, .checks
│ ├── rust-packages.nix │ .devShells, .formatter
│ │
│ ├── core/ <- minimal NixOS/Darwin glue
│ │ ├── _nixos.nix │ hostSpec -> standard options,
│ │ └── _darwin.nix ↓ flake-mode nix prereqs.
│ │
│ ├── scopes/nixfleet/ <- framework runtime services
│ │ ├── _agent.nix │ systemd unit for the agent
│ │ ├── _agent-darwin.nix │ launchd unit for the agent (macOS)
│ │ ├── _control-plane.nix │ systemd unit for the CP
│ │ ├── _cache.nix │ binary-cache client wiring
│ │ ├── _microvm-host.nix │ microvm host (bridge, NAT, dnsmasq)
│ │ ├── _operator.nix │ workstation tools (mint-token, etc.)
│ │ └── _trust-json.nix ↓ shared helper: build trust.json
│ │
│ └── tests/ <- flake-parts entries that register
│ ├── eval.nix │ the checks that the test fabric runs
│ ├── harness.nix │
│ ├── _agent-v2-trust.nix │
│ ├── _cp-v2-trust.nix │
│ └── _trust-options.nix ↓
│
├── crates/ <- the Rust workspace
│ ├── nixfleet-proto/ <- shared types (boundary contracts)
│ ├── nixfleet-canonicalize/ <- JCS canonicalizer (lib + bin)
│ ├── nixfleet-reconciler/ <- pure decision engine (lib only)
│ ├── nixfleet-agent/ <- per-host actuator daemon
│ ├── nixfleet-control-plane/ <- Axum HTTP server + reconcile loop
│ ├── nixfleet-cli/ <- operator workstation tools
│ ├── nixfleet-release/ <- CI release pipeline orchestrator
│ └── nixfleet-verify-artifact/ <- offline verifier for auditors
│
├── tests/ <- test code, fixtures, harness
│ ├── fixtures/ │ Static QEMU references
│ ├── harness/ │ microvm.nix scenarios
│ └── lib/mk-fleet/ ↓ positive + negative eval fixtures
│
└── docs/ <- human-readable docs
├── README.md │ navigation index
├── design/ │ this file + contracts.md + source-layout.md
├── reference/ │ harness.md + per-crate overviews
├── operations/ │ disaster-recovery + operator-cookbook + troubleshooting
├── rfcs/ │ RFC-0001 ... RFC-0012
└── mdbook/ ↓ composed book; `{{#include}}` wrappers
Convention: _*.nix is skipped by import-tree. Files like _agent.nix are imported explicitly by lib/mk-host.nix. This is why agent/CP modules end up in every host’s module list while test modules under modules/tests/ only register via their non-prefixed siblings.
9. The Nix layer
10.1 Flake wiring
flake.nix is the entry point. Three jobs:
- Declares inputs -
nixpkgs,darwin,home-manager,flake-parts,import-tree,disko,microvm,crane,lanzaboote,treefmt-nix,nixos-anywhere,nixos-hardware,impermanence. - Picks the system matrix -
x86_64-linux,aarch64-linux,aarch64-darwin,x86_64-darwin. - Calls
flake-parts.lib.mkFlakewith./modules/auto-imported byimport-tree.
outputs = inputs:
inputs.flake-parts.lib.mkFlake { inherit inputs; } (
(inputs.import-tree ./modules)
// { systems = [ "x86_64-linux" "aarch64-linux" "aarch64-darwin" "x86_64-darwin" ]; }
);
import-tree walks modules/, skips _*.nix, returns an attrset of flake-parts modules; mkFlake merges them. This decomposition is why outputs (apps, packages, checks, devShells, lib, scopes) live in five small files (flake-module.nix, apps.nix, formatter.nix, options-doc.nix, rust-packages.nix) rather than one monolith.
nixpkgs is pinned to nixos-unstable; the framework re-pins consumers via follows, so a fleet’s effective nixpkgs = the framework’s. impermanence is required only by fleets that import flake.scopes.persistence.impermanence; inert otherwise.
10.2 Public API (lib/)
Four exports: mkHost, mkFleet, mkVmApps, plus mergeFleets and withSignature. Wiring in lib/default.nix:
{ inputs, lib }: let
mkFleetImpl = import ./mk-fleet.nix { inherit lib; };
in {
mkHost = import ./mk-host.nix { inherit inputs lib; };
mkVmApps = import ./mk-vm-apps.nix { inherit inputs; };
inherit (mkFleetImpl) mkFleet mergeFleets withSignature;
}
mkFleet is pure (just needs lib), so the canonicalize binary and eval-only tests can import lib/mk-fleet.nix directly without dragging in flake inputs. mkHost and mkVmApps need inputs because they build actual systems / spawn QEMU.
mkHost - the primary API (lib/mk-host.nix)
One function. Returns a NixOS or Darwin system, ready for nixos-rebuild / darwin-rebuild.
mkHost {
hostName = "my-server"; # required
platform = "x86_64-linux"; # selects nixosSystem vs darwinSystem
stateVersion = "24.11"; # NixOS only
hostSpec = { userName = "deploy"; rootSshKeys = [ "ssh-ed25519 ..." ]; };
modules = [ ... ]; # consumer modules
isVm = false; # if true, inject test fixtures
extraInputs = {}; # consumer inputs to make visible
}
Internally:
- Picks
nixpkgs.lib.nixosSystemordarwin.lib.darwinSystembased onplatform. - Auto-injects framework modules:
contracts/host-spec.nix,contracts/persistence.nix,modules/core/_nixos.nixor_darwin.nix, all sixmodules/scopes/nixfleet/_*.nix. (Darwin gets only the agent-darwin and core-darwin modules.) - Sets
hostSpecdefaults (mkDefault-wrapped so consumer overrides win). - Forces
hostSpec.hostName = hostNameexactly (never overrideable). - Merges consumer’s
moduleslast.
Every framework service module is auto-injected but disabled by default. Zero cost unless the host opts in (services.nixfleet-agent.enable = true; etc.). The framework deliberately exposes one builder; no fleet/org/role taxonomy.
mkFleet - the fleet topology (lib/mk-fleet.nix)
Consumes a fleet description and produces fleet.resolved - the canonical projection that CI signs and the control plane consumes. Five major parts:
hosts- atomic units. Each declares system, configuration, tags, channel.tags- flat, non-hierarchical groupings.channels- release trains. Each pinsrolloutPolicy,freshnessWindow,signingIntervalMinutes,reconcileIntervalMinutes,compliance.frameworks.rolloutPolicies- named strategies. Each declareswaves(selector + soakMinutes), ahealthGate, anonHealthFailureaction.edges+disruptionBudgets- DAG ordering and concurrent-change limits.
Selector algebra: tags, tagsAny, hosts, channel, all, not, and. No wildcards; resolves at eval time.
mkFleet runs invariant checks - every host’s channel exists, every channel’s policy exists, edges form a DAG, freshnessWindow ≥ 2 × signingIntervalMinutes, every selector resolves to ≥1 host. Compliance failures in enforce mode block the build before signing. Output is fleet.resolved with null placeholders for signedAt, ciCommit, closureHash - filled by nixfleet-release at CI time.
mergeFleets strict-merges multiple fleet inputs (collisions throw); withSignature stamps meta after CI builds.
mkVmApps - local VM lifecycle (lib/mk-vm-apps.nix)
Returns five flake apps: build-vm, start-vm, stop-vm, clean-vm, test-vm. Linux-only. The 37-line composer is thin; platform abstraction lives in lib/vm-platform.nix, shared bash in lib/vm-helpers.sh, per-app scripts in lib/vm-scripts/. State under ~/.local/share/nixfleet/vms/.
Flake-output modules (modules/*.nix)
modules/flake-module.nix- exportsflake.lib,flake.nixosModules.nixfleet-core,flake.scopes.<family>.<impl>.modules/apps.nix- declares perSystem apps. Most importantly,validate- the single test-suite entry (nix run .#validate -- --allruns format, eval, host builds, Rust tests, VM scenarios). Also exposes the agent / CP / cli / canonicalize / verify-artifact / release binaries.modules/formatter.nix-nix fmtvia treefmt-nix (Alejandra + shfmt + deadnix).modules/options-doc.nix- generates the Markdown options reference.modules/rust-packages.nix- wires crane to build the workspace, exports docs-site, declaresdevShells.default.
10.3 Contracts
Pure schemas under contracts/. They declare options; they implement nothing. Kept top-level (not under modules/) so import-tree doesn’t treat them as flake-parts modules and leak assertions into flake-level scope. The cross-reference for every boundary-crossing artifact is ./contracts.md.
hostSpec - universal identity (contracts/host-spec.nix)
Every host has one. Identity (hostname, primary user, home dir), locale (timezone, locale, keyboard layout), access (root password file, root SSH keys), networking hints, secrets-backend hints, platform marker (isDarwin). The agent reads hostSpec.userName; persistence reads it for ownership; core reads hostSpec.hostName and stamps it into networking.hostName.
hostSpec carries identity only; behaviour is via scope enable options. Fleets extend hostSpec with their own options via plain NixOS modules.
persistence - what survives reboots (contracts/persistence.nix)
options.nixfleet.persistence = {
enable = lib.mkEnableOption "system-level persistence";
persistRoot = lib.mkOption { type = str; default = "/persist"; };
directories = lib.mkOption { type = listOf (either str (attrsOf anything)); default = []; };
files = lib.mkOption { type = listOf (either str (attrsOf anything)); default = []; };
};
Baseline contributions (/etc/nixos, /etc/NetworkManager/system-connections, /var/lib/systemd, /var/lib/nixos, /var/log, /etc/machine-id) are added regardless of impl. Other modules contribute their own paths (agent -> /var/lib/nixfleet, CP -> /var/lib/nixfleet-cp, secrets -> /etc/ssh/ssh_host_ed25519_key). The active impl reads the merged list.
trust - the four roots (contracts/trust.nix)
The most security-critical contract:
options.nixfleet.trust = {
ciReleaseKey = mkOption { type = ciReleaseKeySlotType; ... }; # typed (algorithm + public)
cacheKeys = mkOption { type = listOf str; ... }; # opaque, for nix's trusted-public-keys
orgRootKey = mkOption { type = keySlotType; ... }; # bare-string ed25519 (pinned)
};
Three roots declared in the flake; the fourth root - the per-host SSH key - is intrinsic to each host (generated by stock OpenSSH on first boot). Each KeySlot has current, previous, rejectBefore. The ciReleaseKey slot is typed to support both ed25519 and ecdsa-p256 (TPMs commonly support P-256 but not ed25519). The orgRootKey is pinned to ed25519 - bootstrap-token signing only, never reaches the CP. cacheKeys is forwarded verbatim to nix.settings.trusted-public-keys. Serialised to JSON at build time (see _trust-json.nix below) and read at runtime.
10.4 Pluggable impls (flake.scopes.*)
The kernel/opinion split: framework declares contracts and ships one impl per family. Sibling impls are alternatives. Registered in modules/flake-module.nix:
flake.scopes = {
persistence.impermanence = ../impls/persistence/impermanence.nix;
keyslots.tpm = ../impls/keyslots/tpm;
gitops.forgejo = import ../impls/gitops/forgejo.nix;
gitops.gitea = import ../impls/gitops/forgejo.nix; # API identical
secrets = ../impls/secrets;
};
-
persistence.impermanence(impls/persistence/impermanence.nix) - btrfs-rootwipe-on-boot. initrd moves@roottoold_roots/<timestamp>, creates fresh empty@root; upstreamimpermanencethen bind-mounts paths from/persist/...back. Old snapshots pruned at default 30-day retention. Two impl-specific options:rootDevice,oldRootsRetentionDays. -
keyslots.tpm(impls/keyslots/tpm/) - first-boot TPM key generation, idempotent re-export after impermanence wipe.tpm2_createprimary+tpm2_evictcontrolto a persistent handle (default0x81010001); exports public key to/var/lib/nixfleet-tpm-keyslot/; installs atpm-signshell wrapper. Configurable:handle,algorithm(defaultecdsa-p256),exportPubkeyDir,signWrapperName. Does not handle disk encryption. -
gitops.forgejo/.gitea(impls/gitops/forgejo.nix) - pure data, a URL builder. Returns{ artifactUrl; signatureUrl }for a Forgejo or Gitea host. Wire intoservices.nixfleet-control-plane.channelRefsSource. -
secrets(impls/secrets/default.nix) - backend-agnostic identity-path manager. Declares where decryption identities live (identityPaths.{hostKey, userKey, extra}); ensures the SSH host key exists at first boot; adds those paths to the persistence contract; computesresolvedIdentityPaths(read-only introspection hook). Does NOT wrap agenix / sops / vault - your fleet wires those itself.
Consumer pattern (mkFleet wraps the per-host mkHost call):
# fleet-repo/flake.nix
let fleet = nixfleet.lib.mkFleet {
hosts.web-01 = {
system = "x86_64-linux";
channel = "stable";
tags = [];
nixosArgs = {
hostSpec = { userName = "deploy"; rootSshKeys = [ "ssh-ed25519 ..." ]; };
modules = [
nixfleet.scopes.persistence.impermanence
nixfleet.scopes.secrets
nixfleet.scopes.keyslots.tpm
./hardware/web-01.nix
({ ... }: {
services.nixfleet-agent = { enable = true; controlPlane.url = "https://cp.example.com:8080"; };
})
];
};
};
channels.stable = {
rolloutPolicy = "all-at-once";
signingIntervalMinutes = 60;
freshnessWindow = 1440;
};
rolloutPolicies.all-at-once = {
strategy = "all-at-once";
waves = [{ selector.all = true; soakMinutes = 0; }];
};
};
in { nixosConfigurations = fleet.nixosConfigurations; }
10.5 Runtime service modules (modules/scopes/nixfleet/)
All underscore-prefixed (skipped by import-tree) and explicitly imported by lib/mk-host.nix. Each defaults to enable = false.
_agent.nix - Linux agent service
Key options: enable, controlPlaneUrl, machineId, pollInterval (60s default), trustFile (materialised from nixfleet.trust), tls.{caCert, clientCert, clientKey}, bootstrapTokenFile, stateDir (/var/lib/nixfleet-agent), complianceGate.mode, package (escape hatch for harness/vendor). Activation: materialises trust.json via environment.etc; installs Type=simple, Restart=always, RestartSec=30, NoNewPrivileges=true; contributes /var/lib/nixfleet to nixfleet.persistence.directories.
_agent-darwin.nix - macOS agent
Same schema plus sshHostKeyFile (default /etc/ssh/ssh_host_ed25519_key) and tags (passed via NIXFLEET_TAGS env). Differences: launchd instead of systemd (KeepAlive, RunAtLoad, ThrottleInterval=10); 15-second sleep in ExecStart to defend two boot races (NTP not synced -> rustls cert “not yet valid”; agenix not yet decrypted -> cert files missing); launchctl kickstart -k in postActivation forces clean restart even on unchanged plist; environment.etc.<...>.text instead of .source because Darwin’s flake-source symlinks are unreliable.
_control-plane.nix - CP service
Richest module. Key options:
| Option | Default | Purpose |
|---|---|---|
listen | 0.0.0.0:8080 | TLS bind |
tls.{cert, key, clientCa} | required | mTLS server material |
artifactPath / signaturePath | /var/lib/nixfleet-cp/fleet/releases/fleet.resolved.json{,.sig} | local signed artifact |
trustFile | /etc/nixfleet/cp/trust.json | materialised from nixfleet.trust |
freshnessWindowMinutes | 1440 (24h) | max accepted age of meta.signedAt |
confirmDeadlineSecs | 360 | magic-rollback deadline |
fleetCaCert, fleetCaKey | required for issuance | for /v1/enroll and /v1/agent/renew |
auditLogPath | /var/lib/nixfleet-cp/issuance.log | append-only cert-issuance log |
dbPath | /var/lib/nixfleet-cp/state.db | SQLite |
closureUpstream | null | optional binary cache for /v1/agent/closure/<hash> |
rolloutsDir | null | pre-signed rollout manifests on disk (primary) |
rolloutsSource.{artifactUrlTemplate, signatureUrlTemplate, tokenFile} | null | on-demand HTTP fallback when rolloutsDir misses |
channelRefsSource.{artifactUrl, signatureUrl, tokenFile} | null | upstream poll for fleet.resolved |
revocationsSource.{artifactUrl, signatureUrl, tokenFile} | null | upstream poll for revocations.json sidecar |
strict | false | refuse to start if tls.clientCa or revocationsSource is unset |
package | self | escape hatch |
Long-running systemd service (Type=simple) with ProtectSystem=strict, PrivateTmp=true, etc. The CP does not use a systemd timer - it has its own internal 30-second reconcile loop. systemd.tmpfiles.rules auto-bootstraps observed.json to an empty skeleton on first deploy.
_cache.nix - binary-cache client
Trivial: declares services.nixfleet-cache.{cacheUrl, publicKey}; appends to nix.settings.substituters and nix.settings.trusted-public-keys. Format-agnostic.
_microvm-host.nix - microVM host wiring
Bridges, NAT, dnsmasq DHCP. Default bridge nixfleet-br0, 10.42.0.1/24. The microVMs themselves are defined by your fleet via upstream microvm.vms.
_operator.nix - workstation tools
Adds nixfleet-cli (nixfleet, with subcommands mint-token, derive-pubkey, mint-operator-cert) to environment.systemPackages. Optional orgRootKeyFile exposed via NIXFLEET_OPERATOR_ORG_ROOT_KEY. Crucially: the org root private key is encrypted to the operator user only; the CP never decrypts it (it only verifies token signatures with the public half declared in config.nixfleet.trust.orgRootKey.current).
_trust-json.nix - shared trust serialiser
Helper imported by _agent.nix, _control-plane.nix, _agent-darwin.nix. Builds the JSON payload for /etc/nixfleet/{agent,cp}/trust.json. schemaVersion = 1 is required per RFC-0010 §1.5 - binaries refuse to start on unknown versions.
Core glue (modules/core/)
_nixos.nix: flake-only nix.nixPath, experimental-features, hostName/timeZone/locale/keyMap/xkb from hostSpec, root SSH keys + hashed password file, imports contracts/trust.nix. _darwin.nix is even smaller - system.stateVersion, system.primaryUser, disables verifyNixPath, marks hostSpec.isDarwin = true. Core was deliberately trimmed to mechanism-only; everything else lives in scopes.
10. The Rust layer
11.1 Crate map
Eight crates. Three boundary (types, canonicalisation, decision engine); five binaries. Dependency direction: proto -> canonicalize -> reconciler -> consumers. No cross-deps among consumers.
┌─────────────────────────────────────────────┐
│ nixfleet-proto │
│ (boundary types: FleetResolved, wire, │
│ trust, revocations, rollout manifest) │
└────────────────────┬────────────────────────┘
│
┌──────────────────┼─────────────────┐
▼ ▼ ▼
┌────────────────────┐ ┌────────────┐ ┌──────────────────┐
│ nixfleet- │ │ used by │ │ used by │
│ canonicalize │ │ everyone │ │ everyone │
│ (JCS, RFC 8785) │ └────────────┘ └──────────────────┘
└─────────┬──────────┘
│
▼
┌────────────────────┐
│ nixfleet- │
│ reconciler │
│ (verify_artifact, │
│ reconcile fn, │
│ evidence verify) │
└─┬──────────────────┘
│
┌─────┴──────┬──────────────┬──────────────┬──────────────┐
▼ ▼ ▼ ▼ ▼
┌──────┐ ┌────────┐ ┌──────────┐ ┌──────────┐ ┌────────────────┐
│agent │ │ control│ │ release │ │ cli │ │verify-artifact │
└──────┘ └────────┘ └──────────┘ └──────────┘ └────────────────┘
per-host Axum + CI build operator offline auditor
actuator SQLite pipeline tools tool
11.2 Boundary crates
nixfleet-proto - shared types
Canonical definitions for every artifact and message. Modules:
fleet_resolved.rs-FleetResolved,Host,Channel,RolloutPolicy,Wave,DisruptionBudget,Edge,Meta,Compliance,HealthGate,OnHealthFailureenum.agent_wire.rs-CheckinRequest/Response,EvaluatedTarget,ConfirmRequest,ReportRequest,ReportEvent. ConstantPROTOCOL_MAJOR_VERSION = 1(headerX-Nixfleet-Protocol).enroll_wire.rs-BootstrapToken,TokenClaims,EnrollRequest/Response,RenewRequest/Response.revocations.rs-Revocations,RevocationEntry.rollout_manifest.rs-RolloutManifest,HostWave,fleetResolvedHash(anchor against mix-and-match).trust.rs-TrustConfig,KeySlot,TrustedPubkey.compliance.rs+evidence_signing.rs- typed signed payloads for every evidence event.
Conventions: optional fields use Option<T> with #[serde(default)] but no skip_serializing_if - null is present, important for JCS byte stability across Nix -> Rust round-trips. No #[serde(deny_unknown_fields)] - contracts evolve additively. Object key sorting + deterministic number formatting is the canonicalize crate’s job, not serde’s.
nixfleet-canonicalize - JCS
Library + tiny binary. The library is one function:
#![allow(unused)]
fn main() {
pub fn canonicalize(input: &str) -> Result<String> {
let value: serde_json::Value = serde_json::from_str(input)?;
serde_jcs::to_string(&value)
}
}
Every signer and every verifier feeds artifacts through this. Pinned serde_jcs 0.2, single source of truth. The binary is cat-style for use in CI sign hooks and tests.
nixfleet-state-machine - pure per-host reducer
Per-host rollout reducer with no I/O, no clock reads, no allocations of side-effects (effects are returned as data). Single entry point: step(state, event, now, policy) -> Result<(state, Vec<Effect>), TransitionError>. The same crate runs on both sides — the agent’s runtime drives local state from worker output (Local* events); the CP-side mirror synthesises the same state from inbound wire AgentEvents (Remote* events). Dependency list is part of the safety contract (no tokio / reqwest / rusqlite); CI verifies via cargo tree. Spec lives in RFC-0005 §3 (the 6-state machine) and RFC-0006 §3 (functional-core / imperative-shell pattern); RFC-0008 adds a parallel rollout-level reducer in src/rollout/.
nixfleet-reconciler - pure decision engine
The brain of the control plane, but as a pure library. No I/O, no state, no side effects. Two main exports:
#![allow(unused)]
fn main() {
pub fn verify_artifact(
artifact_bytes: &[u8],
signature_bytes: &[u8],
trusted_keys: &[&TrustedPubkey],
now: DateTime<Utc>,
freshness_window: Duration,
reject_before: Option<DateTime<Utc>>,
) -> Result<FleetResolved, VerifyError>
}
Steps: parse -> re-canonicalise (assert byte-for-byte match) -> verify signature against each trusted key (ed25519 or ecdsa-p256, algorithm tag from meta.signatureAlgorithm) -> freshness check (now - meta.signedAt < freshness_window) -> reject_before check (compromise switch) -> schemaVersion == 1. Returns parsed FleetResolved or detailed VerifyError (10 variants). Same path is used for Revocations and RolloutManifest via the SignedSidecar trait. Rollout manifests get an extra step: recompute SHA-256(canonical(manifest)) and assert it equals the advertised rolloutId (content addressing).
#![allow(unused)]
fn main() {
pub fn reconcile(
fleet: &FleetResolved,
observed: &Observed,
now: DateTime<Utc>,
) -> Vec<Action>
}
Inputs: verified fleet, Observed snapshot (channel refs, host states, active rollouts, compliance failures), current time. Output: a list of Actions (OpenRollout, DispatchHost, PromoteWave, ConvergeRollout, HaltRollout, SoakHost, ChannelUnknown, Skip, WaveBlocked).
Internal modules: host_state.rs (per-host shape; HostRolloutState itself now lives in nixfleet-state-machine per RFC-0005/0009 — reconciler + CP-side runtime consume that crate’s reducer), rollout_state.rs (RolloutState + advance_rollout()), budgets.rs (disruption budget enforcement - currently scaffolded), edges.rs (DAG ordering - reserved for future), verify.rs (verify_artifact, verify_rollout_manifest, verify_revocations, SignedSidecar trait, compute_canonical_hash), evidence.rs (verify_canonical_payload for host-signed compliance evidence using OpenSSH ed25519 pubkeys), manifest.rs (project_manifest, compute_rollout_id_for_channel).
11.3 Runtime binaries
nixfleet-agent - per-host actuator
Long-running daemon. Flags set by the NixOS module: --control-plane-url, --machine-id, --poll-interval, --trust-file, --ca-cert, --client-cert, --client-key, --bootstrap-token-file, --state-dir, --compliance-mode.
Main loop: load trust -> enrol if no cert + bootstrap token present -> build mTLS client -> run_boot_recovery() (handles fire-and-forget self-switch convergence) -> loop every poll_interval: POST /v1/agent/checkin; if response.target set, fetch + verify rollout manifest, pre-realise (nix-store --realise <closure> with cache_keys signature verify), activate (systemd-run --unit=nixfleet-switch -- switch-to-configuration switch on Linux, setsid -c on Darwin - both detached so they survive agent self-restart during NixOS reload), poll /run/current-system every 2s up to 300s, post-verify basename == expected, run compliance gate if enabled, POST /v1/agent/confirm, clear last_dispatched. On failure: POST /v1/agent/report with signed evidence. If cert TTL <50%: POST /v1/agent/renew.
Key modules: comms.rs (mTLS reqwest, 10s connect, 30s per-request), activation.rs (three-stage validation, fire-and-forget launch, lock coordination via /run/nixos/switch-to-configuration.lock, ActivationOutcome enum), enrollment.rs (CSR generation + enrol + 50% TTL renew), checkin_state.rs (last_confirmed_at + last_dispatched), compliance.rs (Pass / Failures / Skipped / GateError; auto mode -> Permissive if collector present, Disabled if absent), evidence_signer.rs (loads /etc/ssh/ssh_host_ed25519_key, JCS-canonicalises, ed25519-signs, base64), freshness.rs, manifest_cache.rs (content-address verification), recovery.rs (run_boot_recovery()), host_facts/ (Linux reads boot_id from /proc/sys/kernel/random/boot_id; Darwin uses hardware UUID).
What it never does: accept arbitrary commands (vocabulary is target = sha256-X); trust a CP-recommended closure without cache-key verification; hold long-lived credentials beyond 30-day mTLS cert + machine-lifetime SSH host key.
nixfleet-control-plane - Axum + SQLite + reconcile loop
Long-running HTTPS server. Two subcommands: serve and tick (one-shot, for tests).
Routes (under /v1/ with protocol-version middleware):
GET /healthz -> { ok, version, last_tick_at }
GET /v1/whoami -> { cn, issuedAt }
POST /v1/enroll -> 30-day cert from bootstrap token
POST /v1/agent/renew -> re-issue cert from existing mTLS identity
POST /v1/agent/checkin -> { target?, revocations? }
POST /v1/agent/confirm -> marks host_dispatch_state row confirmed
POST /v1/agent/report -> ingests telemetry events
GET /v1/agent/closure/{hash} -> proxies to binary cache (optional)
GET /v1/channels/{name} -> channel metadata
GET /v1/hosts -> { hostname: { online, current_generation } }
GET /v1/rollouts/{rolloutId} -> manifest JSON (mTLS-gated)
GET /v1/rollouts/{rolloutId}/sig -> manifest signature bytes
mTLS enforced at TLS handshake when --client-ca set. Agent routes authenticate solely via verified client cert (CN matches request hostname). No admin routes in the open kernel - fine-grained operator RBAC is intentionally out of scope and belongs in a sibling commercial-extensions repository.
State:
- In-memory (
RwLock):host_checkins: HashMap<hostname, HostCheckinRecord>,channel_refs: HashMap<channel, git_ref>, rollout manifest cache,last_tick_at. - SQLite (
/var/lib/nixfleet-cp/state.db, refinery-managed migrations):token_replay(24h TTL) - soft state.cert_revocations- hard state, replayed from signedrevocations.jsonsidecar every reconcile tick.host_dispatch_state(hostname PK, rollout_id, channel, wave, target_closure_hash, target_channel_ref, dispatched_at, confirm_deadline, confirmed_at, state ∈ {pending,confirmed,rolled-back,cancelled}) - operational, one row per host.dispatch_history(id PK, hostname, rollout_id, channel, wave, target_closure_hash, target_channel_ref, dispatched_at, terminal_state ∈ {converged,rolled-back,cancelled}, terminal_at) - audit log; one row per dispatch event.host_rollout_state(rollout_id, hostname, host_state, last_healthy_since, updated_at) - soak-window tracking, repopulated from agent-attestedlast_confirmed_aton rebuild.host_reports(event_id, hostname, received_at, event_kind, rollout, signature_status, report_json) - telemetry.
- Filesystem:
artifact_path,signature_path,observed_path.
Reconcile loop (every 30s) reads inputs, calls verify_artifact(), projects Observed from in-memory checkins + SQLite, calls reconcile(), processes the resulting Vec<Action> against SQLite (UPSERT host_dispatch_state + INSERT dispatch_history on dispatch, update host_rollout_state, etc.).
Background tasks: reconcile_loop (30s), channel_refs_poll (60s - full verify_artifact on fetched bytes, update in-memory map), revocations_poll (60s - same trust pipeline; replay into cert_revocations table on every tick), rollback_check_loop (10s - scan state='pending' AND confirm_deadline < now, mark rolled-back, stamp dispatch_history), prune_timer (delete old token_replay, archive old host_reports). All share a tokio::sync::CancellationToken plumbed from main; signal::ctrl_c() triggers axum_server::Handle::graceful_shutdown (25s drain) followed by cancellation fan-out; drain_background_tasks gathers JoinHandles with a 30s deadline.
On-demand HTTP source - rollouts_source: fetches a rollout manifest lazily when GET /v1/rollouts/<rolloutId> misses --rollouts-dir. URL templates with literal {rolloutId} token. Trust posture: the CP only checks RolloutId::new(manifest.channel, manifest.channel_ref) == rolloutId (the RFC-0008 §6.3 canonical-id discriminator). It does not verify the signature. The agent verifies the signature against ciReleaseKey on receipt. Even when forwarding a signed manifest, the CP never pretends to attest to it.
nixfleet-cli - operator workstation tools
An umbrella binary with operator subcommands. nixfleet mint-token reads the org root private key (32 raw bytes / hex / PEM PKCS#8), generates a nonce, builds TokenClaims, JCS-canonicalises, ed25519-signs, outputs the bootstrap-token JSON. nixfleet derive-pubkey reads a private key file and emits the base64 ed25519 pubkey - used once when bootstrapping the org root key. nixfleet mint-operator-cert mints a clientAuth-EKU X.509 cert signed by the offline fleet root for operator mTLS access.
There is no big “fleet management” CLI in the open kernel - operations happen through git commits and CI, not CLI commands.
nixfleet-release - CI release pipeline orchestrator
Most complex binary. Orchestrates build -> inject closureHash -> stamp meta -> canonicalise -> sign -> release:
- Enumerate hosts (
auto= all;auto:exclude=foo,bar; or explicit list). - Build closures per host.
- Per-closure push (optional
--push-cmdhook; env:NIXFLEET_HOST,NIXFLEET_PATH,NIXFLEET_CLOSURE_HASH). - Evaluate
.#fleet.resolved. - Inject
closureHashper built host. - Stamp meta (
signedAt = now,ciCommit,signatureAlgorithm). - Canonicalise via
nixfleet-canonicalize. - Sign via
--sign-cmdhook (env:NIXFLEET_INPUT,NIXFLEET_OUTPUT). - Smoke verify (re-parse, canonical round-trip, structural check).
- Project per-channel rollout manifests (
rolloutId = SHA-256(canonical(manifest))); sign each. - Atomic write of
releases/fleet.resolved.json{,.sig},revocations.json{,.sig},rollouts/<rolloutId>.json{,.sig}. - Optional git ops (stage, commit, push).
The hook contract is what makes signing pluggable: framework doesn’t care how you sign (TPM, HSM, YubiKey, KMS, software ed25519); it cares only that the hook reads canonical bytes from $NIXFLEET_INPUT and writes raw signature to $NIXFLEET_OUTPUT.
nixfleet-verify-artifact - offline auditor
Three subcommands (pure verification, no network): artifact (verify a fleet.resolved), rollout-manifest (verify a rollout manifest, asserts rolloutId equals the canonical {channel}@{channel_ref} per RFC-0008 §6.3), probe (verify a host-signed probe payload against an OpenSSH host pubkey). Given just signed artifacts plus trust roots, an auditor can verify the chain without ever touching the control plane.
11. Testing fabric
Three tiers, fastest-first.
Tier C - eval-only (~5-15s, every PR)
nix fmt -- --ci- Alejandra + shfmt + deadnix.nix flake check --no-build- eval every output across the system matrix.mkFleet-eval-tests- 14 fixtures (7 positive + 7 negative) undertests/lib/mk-fleet/. Positive fixtures must produce expected.resolved.jsongolden files; negative fixtures must throw expected eval errors._agent-v2-trust.nix,_cp-v2-trust.nix,_trust-options.nix- eval-only assertions on agent/CP module wire shape (ExecStart flags, trust.jsonschemaVersion = 1, etc.).
Tier B - Rust unit/integration (~15-30s, pre-push subset, full in CI)
cargo nextestworkspace-wide (currently ~560 tests). Concentration:nixfleet-control-plane(Axum endpoint integration with in-process mTLS, SQLite transactions, mTLS CN matching, V001-V006 migration tests, graceful-shutdown drain),nixfleet-reconciler(state-machine transitions, signature round-trips, cycle detection),nixfleet-proto(round-trip serialisation, trust config),nixfleet-canonicalize(JCS golden vectors, RFC 8785 Appendix E),nixfleet-release(sign-smoke roundtrip + adversarial verify),nixfleet-verify-artifact,nixfleet-agent(boot-recovery convergence + per-variant DispatchHandler unit tests).cargo clippywith-D warnings.
Tier A - microvm scenarios (minutes, nightly / on-demand)
Full integration via runNixOSTest hosting microvm.nix guests under one host VM (much faster than per-node QEMU). Linux x86_64 only (microvm.nix needs nested KVM). Scenarios under tests/harness/scenarios/, registered in modules/tests/harness.nix. Memory budget max(4096, 3072 + N×256); fits fleet-50 in 16 GB.
| Scenario | Purpose |
|---|---|
fleet-harness-smoke | 1 stub CP + 2 stub agents fetch fixture over mTLS within 60s |
fleet-harness-fleet-{2,5,10} | Parameterised smoke for N agents |
fleet-harness-signed-roundtrip | Real signed fixture -> mTLS serve -> agent verify-artifact accept |
fleet-harness-auditor-chain | Offline runCommand: verify-artifact rejects bit-flips |
fleet-harness-corruption-rejection | Bit-flip artifact + sig; assert typed VerifyError |
fleet-harness-manifest-tamper-rejection | Same for rollout manifests; content-address mismatch |
fleet-harness-teardown | Real CP + real agents. Wipe CP DB mid-run; assert state recovery within one reconcile cycle. The validation of done-criterion #1. |
fleet-harness-deadline-expiry | Confirm-deadline timeout -> 410 |
fleet-harness-stale-target | Year-old fixture; agent’s freshness gate rejects + posts StaleTarget |
fleet-harness-boot-recovery | Fire-and-forget: pre-staged stale last_dispatched; assert check_boot_recovery clears before poll loop |
fleet-harness-secret-hygiene | Agent decrypts age secret; testScript greps CP disk + journal + audit; assert plaintext absent |
fleet-harness-rollback-policy | Real CP + agent under onHealthFailure = "rollback-and-halt"; inject Failed via host-side sqlite3; assert RollbackSignal, agent rollback, Reverted, idempotency holds |
fleet-harness-concurrent-checkin | Two agents in same tick window; assert no duplicate dispatch and ordered confirms |
fleet-harness-enroll-replay | Bootstrap-token nonce replay rejected with 409 |
fleet-harness-future-dated-rejection | Artifact with meta.signedAt past clock-skew slack rejected |
fleet-harness-module-rollouts-wire | End-to-end manifest -> checkin -> confirm wiring under signed dispatch |
Real-binary harness nodes (tests/harness/nodes/cp-real.nix + agent-real.nix) consume services.nixfleet-control-plane.enable = true / services.nixfleet-agent.enable = true directly - the scenario surface is the operator surface. Stub nodes (cp.nix, agent.nix, cp-signed.nix, agent-verify.nix) keep their curl+jq scaffolding because they exercise routes the real CP doesn’t expose (e.g. GET / for fleet-N substrate scaling, GET /canonical.json{,.sig} for the offline-auditor contract).
CI workflows: .github/workflows/ci.yml - format job + validate job (nix run .#validate, default fast mode: format + flake eval + mkFleet-eval-tests + host builds for every nixosConfiguration). Pre-commit hook: format + real-SSH-key detector. Pre-push hook: format + mkFleet-eval-tests + cargo nextest run --workspace.
12. Glossary
| Term | Meaning |
|---|---|
| Closure | Nix’s term for a store path plus all its transitive dependencies. The unit of deployment. Identified by hash. |
| Closure hash | sha256 over the contents of a closure. Two identical closures share a hash. |
fleet.resolved.json | Signed canonical projection of the fleet - hosts, channels, rolloutPolicies, waves, edges, budgets. CI-signed. |
| Channel | A release train (stable, edge). Each has its own rollout policy, freshness window, signing interval, compliance frameworks. |
| Channel ref | The git ref a channel is currently rolled out to. CI updates this when it produces a release. |
| Rollout | An in-flight transition of a channel from one ref to another. Has a state machine and per-host states. |
| Wave | A subset of a rollout’s hosts dispatched together, with a shared soak window before the next wave proceeds. |
| Rollout manifest | Signed per-channel artifact freezing the rollout plan. Identified by the canonical RFC-0008 §6.3 composite rolloutId = "{channel}@{channel_ref}". |
| Soak window | Time a host must remain Healthy before being marked Soaked. Wave promotes only when all members are Soaked. |
| Magic rollback | If the agent doesn’t post /confirm within confirmDeadlineSecs, the CP marks the dispatch rolled-back; the next checkin tells the agent to revert. |
| Freshness window | Per-channel max age of meta.signedAt accepted by verify_artifact. Defends against stale-target replay by a compromised CP. |
rejectBefore | Compromise switch: any artifact with meta.signedAt < this timestamp is refused regardless of which key signed it. |
| Trust roots | The four signing keys: CI release key, cache keys, org root key, host SSH keys (see §4). |
| mTLS | Mutual TLS - both server and client present certificates. Agent identity is the cert’s CN. |
| Bootstrap token | Org-root-signed claims (hostname, expectedPubkeyFingerprint, nonce, expiry) the agent uses once to enrol. |
| JCS | JSON Canonical Serialization (RFC 8785). Deterministic byte layout for signing. |
| Persistence contract | Schema declaring directories/files that survive reboots. Impls (e.g. impermanence) read this and apply their mechanism. |
hostSpec | Universal identity carrier - hostname, primary user, locale, root SSH keys, etc. |
| Scope | A self-activating NixOS module (agent, CP, cache, microvm-host). Auto-included by mkHost but disabled by default. |
| Contract impl | A module that satisfies a contract. Lives under impls/, exposed as flake.scopes.<family>.<impl>. |
| Stranger fleet test | The discipline: a fleet you’ve never seen, with different operators and services, must be able to use the framework without any organisation-specific assumption. |
| import-tree | The flake input that auto-discovers and imports .nix files under modules/. Skips _*.nix. |
| Underscore prefix | _*.nix files are skipped by import-tree’s auto-import. Imported explicitly by mk-host.nix. |
13. How to read this codebase
- Start with
flake.nix- five lines of meaningful logic. Openlib/default.nixnext, thenlib/mk-host.nix. That’s the API surface. - Open
contracts/host-spec.nix,contracts/persistence.nix,contracts/trust.nix- read each fully. Maybe 80 lines combined. They define the entire vocabulary. - Pick one runtime module (
modules/scopes/nixfleet/_agent.nixis a good one) and read it with the corresponding crate’ssrc/main.rsopen in the other window. See how the NixOS module’sExecStartflags map to the crate’s CLI. - Read
crates/nixfleet-proto/src/agent_wire.rsandcrates/nixfleet-reconciler/src/verify.rs. The boundary contracts and the verification logic. Most of the design pressure sits here. - RFCs come last: RFC-0001 / 0002 / 0003 in order.
Verification is cheap:
nix flake check --no-build # full eval, ~5s
nix run .#validate # default fast mode
nix run .#validate -- --rust # add cargo nextest + clippy
nix run .#validate -- --vm # add microvm scenarios (Linux only)
nix build .#nixosConfigurations.<host>.config.system.build.toplevel # one host's closure
One-sentence summary
Git is truth; CI is the notary; attic is the content store; the control plane is a router; agents are the last line of defense; and every boundary artifact carries its own proof. Everything else is implementation.
Boundary contracts
The single authoritative reference for every artifact, key, and format that crosses a layer boundary during v0.2. If it is not listed here, it is not a contract - it is implementation detail that can change without coordination.
Every entry declares:
- Producer - the layer/component that emits the artifact.
- Consumer(s) - layers/components that read it.
- Schema/version - current version and the discipline for evolving it.
- Verification - what a consumer must check before trusting the content.
Boundaries cross between three layers:
- CI / infra - coordinator host, out of tree; lives in
fleetrepo. - Nix declarative - this repo’s
lib/,modules/, +nixfleet-compliance. - Rust runtime - this repo’s
crates/(agent + control plane).
I. Data contracts
1. fleet.resolved.json
| Producer | CI (operator CI invokes the Nix layer’s eval) |
| Consumer | Control plane, agents (fallback direct fetch) |
| Schema | v1 - shape defined in RFC-0001 §4.1 |
| Canonicalization | JCS (RFC 8785), see §IV |
| Signature | CI release key (see §II #1) |
| Metadata | meta.signedAt (RFC 3339), meta.ciCommit, meta.schemaVersion, meta.signatureAlgorithm ("ed25519" | "ecdsa-p256"; optional, defaults to "ed25519") |
Evolution discipline. Within v1, fields may be added; consumers MUST ignore unknown fields. Removing or changing the meaning of a field requires schemaVersion: 2 and a migration window. meta.signatureAlgorithm was added after the initial schemaVersion: 1 draft - artifacts without the field MUST be interpreted as "ed25519" for backward compatibility.
Consumer MUST verify before use:
- JCS bytes match the canonicalized payload.
meta.signatureAlgorithm(default"ed25519") matches the algorithm of the pinnednixfleet.trust.ciReleaseKey.- Signature verifies against the pinned
nixfleet.trust.ciReleaseKeyusing the declared algorithm. (now − meta.signedAt) ≤ channel.freshnessWindow(units: minutes; see RFC-0001 §4.1).meta.schemaVersionis within the consumer’s accepted range.
Producer pipeline (nixfleet-release). The framework ships one orchestrator binary that produces this artifact: eval fleet.resolved -> filter expired pins -> build host closures (per-host pin-aware, see below) -> inject closureHash = basename(toplevel) -> stamp meta.{signedAt, ciCommit, signatureAlgorithm} -> canonicalize via nixfleet_canonicalize -> invoke a sign hook -> write releases/fleet.resolved.json{,.sig}. The orchestration is a contract; the cache-push and signing tools it shells out to are not.
Per-host commit pins. Each host entry MAY carry an optional pin: { commit; reason; expiresAt? } field declaring that the host’s closure must be built from a specific source-control commit rather than the current release commit. mkFleet resolves pins from a most-specific-wins precedence chain (host > tag > channel) and emits the result on each affected host; nixfleet-release honors the pin by invoking nix build "<pin_source_url>?rev=<commit>#nixosConfigurations.<host>.config.system.build.toplevel" when the pin’s commit differs from the release commit, and by filtering pins past expiresAt before the build dance starts. Operators MUST pass --pin-source-url to nixfleet-release whenever any active pin specifies a non-current commit (validated post-eval; missing flag aborts release with a list of offending hosts). Pin metadata reaches consumers via hosts.<name>.pin - the dashboard and CLI surface it for visibility.
Producer hook contract (binding):
--push-cmd(optional) is invoked once per built closure withcwd= invocation cwd and these env vars set:NIXFLEET_HOST(host name),NIXFLEET_PATH(absolute store path),NIXFLEET_CLOSURE_HASH(basename of the path). Non-zero exit aborts the run.--sign-cmd(required) is invoked once withNIXFLEET_INPUT(path to a tempfile containing the canonical bytes) andNIXFLEET_OUTPUT(path the hook MUST write the raw signature bytes to). Non-zero exit, missing output file, or 0-byte output aborts the run.
These env-var names are part of the contract - renaming them is a §VIII amendment. The shell command strings themselves and any tools they shell out to (attic, nix copy, tpm-sign, cosign, GPG, ssh-keygen -Y, …) are operator-supplied and not framework concerns.
2. Wire protocol (agent ↔ control plane)
| Producer/Consumer | Both agent and CP (Rust runtime) |
| Schema | v1 - RFC-0003 §4 |
| Transport | HTTP/2 over TLS 1.3, mTLS mandatory |
| Version header | X-Nixfleet-Protocol: 1 |
Evolution discipline. Major version in header; mismatched major = HTTP 400. Additive fields within a major; MUST-ignore-unknown-fields on both sides. Removing a field requires a major bump and dual-version CP support during migration.
3. Probe descriptor
| Producer | nixfleet-compliance (Nix layer) |
| Consumer | Agent (Rust runtime) at runtime |
| Schema | Per-control schema = "<framework>/<version>" field (e.g. "anssi-bp028/v1") |
| Payload | { command, args, timeoutSecs, expect, schema } |
Evolution discipline. Each framework+version pair is immutable once shipped. New version = new schema string (anssi-bp028/v2); agent ships a handler registry keyed on (control, schema). Controls MAY support multiple schema versions during migration.
4. Probe output
| Producer | Agent (executing the probe command) |
| Consumer | CP (aggregation), auditor (verification) |
| Schema | Declared by the control (§I.3 above) |
| Canonicalization | JCS |
| Signature | Host SSH ed25519 (see §II #4) |
Evolution discipline. Output shape is part of the control declaration - changes go through the control schema version. Signature covers the canonicalized bytes plus { control, schema, hostname, bootId, generationHash, ts }.
5. Secret recipient list
| Producer | fleet.nix (Nix layer) |
| Consumer | agenix encryption tooling at commit time; agent at activation |
| Schema | agenix-native, pinned by flake.lock |
Evolution discipline. Pinned to the agenix version in flake.lock. Upgrading agenix is a coordinated commit that re-encrypts all secrets; treat as a spine-level change, not a routine dependency bump.
6. Log / event schema
| Producer | CP (reconciler), agent |
| Consumer | Operator queries, auditors reading historical state |
| Schema | RFC-0002 §7 - structured event with logSchemaVersion field |
Evolution discipline. Same as wire protocol - additive within a major, bump on breaking changes. Historical events MUST remain parseable for the declared audit retention window.
7. Activation timing invariant (fire-and-forget)
Operators tuning the magic-rollback confirm window MUST preserve the coupling:
confirm_deadline_secs ≥ POLL_BUDGET + CLOCK_SKEW_SLACK
Where:
confirm_deadline_secsis per-channelactivate.confirmWindowSecsinfleet.resolved.json(or the CP’s--confirm-deadline-secsflag default - currently360).POLL_BUDGETis the agent-side fire-and-forget poll duration (currently300s, defined ascrates/nixfleet-agent/src/activation.rs::POLL_BUDGET).CLOCK_SKEW_SLACKis the symmetric tolerance baked into both the freshness gate and the rollback timer (currently60s, defined ascrates/nixfleet-reconciler/src/verify.rs::CLOCK_SKEW_SLACK_SECS).
Why. Agents activate via fire-and-forget: systemd-run --unit=nixfleet-switch queues a detached transient unit, and the agent then polls /run/current-system for up to POLL_BUDGET. If the deadline expires before the poll succeeds, the CP’s rollback timer marks the pending_confirms row rolled-back and any subsequent confirm POST returns 410 Gone, triggering the agent’s local rollback path - even though the activation itself was succeeding. The slack absorbs benign clock drift between the CP’s deadline computation and the agent’s poll completion.
How to tune. Slow-link channels (large closures over residential uplinks, long activation scripts): raise confirmWindowSecs AND POLL_BUDGET together, keeping the inequality. Tight rollout windows (canary channels with short freshness): lower both, but never set confirmWindowSecs < POLL_BUDGET + CLOCK_SKEW_SLACK.
The CP enforces nothing here - operators that violate the invariant get the chaos cascade described above. Future versions may add a runtime warning at CP startup when --confirm-deadline-secs is below the documented minimum.
Switch-inhibitor carve-out. The agent skips switch-to-configuration when a critical component (dbus implementation, systemd, kernel, init) differs between /run/current-system and the new closure - nixos-rebuild switch would refuse the same. The new generation is still bound to the system profile (nix-env --set runs unconditionally before fire), so the next reboot completes activation. The agent posts ReportEvent::ActivationDeferred { component } instead of running the live switch; this is NOT a SwitchFailed outcome and triggers no rollback.
The deferred lifecycle is human-paced, not agent-paced, so it explicitly opts out of the 360s confirm-deadline rollback timer documented above. CP receipt of ActivationDeferred parks the host_dispatch_state row in DeferredPendingReboot; the rollback timer’s partial index WHERE state = 'pending' naturally excludes it. The confirm endpoint accepts post-reboot confirms against deferred rows without the deadline gate ((Pending AND deadline > now) OR DeferredPendingReboot). Wave promotion + channel-edge gates see deferred hosts as ConfirmWindow (in-flight, not terminal-for-ordering), so successor waves and channel crossings correctly wait for the operator’s reboot.
Operator surfaces:
/v1/hostsexposespendingReboot: truefor hosts whosehost_dispatch_staterow isDeferredPendingReboot. DB-backed, so the signal survives CP restart. Cleared when the row transitions toConfirmed(post-reboot retroactive confirm).nixfleet statusshows⟳ pending rebootahead of the✓ convergedlabel so operators see deferred hosts at a glance.
Detection is canonicalize-equality on four store-relative paths: etc/systemd/system/dbus.service, sw/lib/systemd/systemd, kernel, init. Any mismatch defers; either side missing a path is out-of-scope and does not defer (see crates/nixfleet-agent/src/activation/linux.rs::detect_switch_inhibitors). The agent persists a last_deferred sentinel in its state-dir to suppress redundant activate-and-defer cycles for the same closure_hash; the suppression is cleared on record_confirm_success (post-reboot). Out of scope for this carve-out: glibc major-version swaps, boot.loader.systemd-boot ↔ grub swaps.
Closure-hash quarantine carve-out. A second per-closure suppression sits alongside the deferred sentinel: when activation produces SwitchFailed or VerifyMismatch the agent records last_failed_closure { closure_hash, last_failure_at, failure_count } in its state-dir. On the next dispatch within QUARANTINE_WINDOW_SECS (24h) for the SAME closure_hash, the agent skips activate() and posts ReportEvent::ClosureQuarantined (rate-limited to one post per QUARANTINE_REPOST_THROTTLE_SECS = 1h). Auto-clears when the channel-ref advances to a fresher closure_hash (the suppression check stops matching). No CP-side state machine entry - the existing SwitchFailed -> rollback flow already drives host_dispatch_state to RolledBack; quarantine is purely the operator-visible “agent has stopped retrying this closure” signal, surfaced as quarantinedClosure: <hash> on /v1/hosts and ✗ quarantined in nixfleet status. The dispatch suppression order is: deferred first, then quarantine; both checks are O(1) state-dir reads with closure_hash equality, so dispatch overhead during steady-state suppression is negligible.
8. Rollout manifest
| Producer | CI (one manifest per channel, per fleet.resolved commit) |
| Consumer | Control plane (adoption + serve), agents (verify before consuming dispatch), auditors |
| Schema | v1 - shape defined in nixfleet-proto::rollout_manifest, semantics in RFC-0002 §4.4 |
| Canonicalization | JCS (RFC 8785), see §III |
| Signature | CI release key (see §II #1) - same trust root as fleet.resolved.json and revocations.json |
| Identifier | rolloutId = "{channel}@{channel_ref}" per RFC-0008 §6.3 (supersedes the v0.1 content-addressed shape in RFC-0002 §4.4). Constructed only via RolloutId::new(channel, channel_ref). |
| Anchor | fleetResolvedHash - sha256 of the canonical bytes of the projecting fleet.resolved.json. Closes mix-and-match across snapshots at the same channel ref. |
| Storage | releases/rollouts/<rolloutId>.{json,sig} |
Evolution discipline. Within v1, fields may be added; consumers MUST ignore unknown fields. Adding a field changes every existing manifest’s content hash by definition (the new field is part of the canonical surface), so schemaVersion bumps in lockstep with field additions that reach production CI - there is no “rolling additive” window for this artifact the way there is for fleet.resolved.json. Removing or changing the meaning of a field requires schemaVersion: 2 and a migration window.
Consumer MUST verify before use:
- JCS bytes match the canonicalized payload.
- Signature verifies against the pinned
nixfleet.trust.ciReleaseKey. (now − meta.signedAt) ≤ channel.freshnessWindow(units: minutes; same gate asfleet.resolved.json).- Recipient recomputes
RolloutId::new(manifest.channel, manifest.channel_ref)from the parsed manifest and asserts it equals therolloutIdthe recipient was told to fetch (RFC-0008 §6.3). Signature verification (step 2) over the canonical bytes already binds the parsedchannelandchannel_reffields to the producer’s intent, so the identifier check rejects mix-and-match attempts (e.g. a manifest from channel B served under channel A’s URL). meta.schemaVersionis within the consumer’s accepted range.- (Agent only)
(hostname, wave_index)∈manifest.host_set. - (CP only, on adoption)
manifest.fleetResolvedHashmatches the hash of thefleet.resolved.jsonthe CP currently holds verified - refuses adoption otherwise. Same rule: hash the received bytes, not the parsed struct.
Producer pipeline (nixfleet-release). Same orchestrator as fleet.resolved.json - after the resolved snapshot is signed, iterate fleet.channels, project each channel into a RolloutManifest (sorted host_set, target closure, wave layout, health gate, compliance frameworks, fleetResolvedHash), canonicalize, sign via the same --sign-cmd hook, write releases/rollouts/<rolloutId>.{json,sig}. The producer hook contract from §I #1 (NIXFLEET_INPUT / NIXFLEET_OUTPUT env vars) applies unchanged - one signing seam, three artifact types.
Trust topology. The CP holds NO signing key for rollouts. It is a verified stateless distributor: it adopts pre-signed manifests it can verify, refuses those it cannot, and serves the verified bytes byte-for-byte at GET /v1/rollouts/<rolloutId>. This preserves the “CP forges no trust” property: every byte an agent acts on traces back to a CI-held key.
Architectural invariant - rollout topology is immutable for the rollout’s life. The manifest carries the resolved topology snapshot computed from fleet.resolved at projection time: wave membership (host_set), per-host target closure, AND per-budget host membership (disruption_budgets[], the operator’s selectors resolved against fleet.hosts.tags at that instant). Once the manifest is signed, none of these reshape until the rollout terminates and is replaced. Consequence: mid-rollout retags affect future rollouts only - they cannot reshape the budget enforcement an in-flight rollout is running under. This mirrors how waves already work and unifies the model: fleet.resolved declares intent (selectors); the rollout manifest declares topology (resolved hosts). Cross-rollout fleet-wide enforcement (e.g. “no more than one etcd node disrupted at a time, ever, across all channels”) survives by matching budgets across active rollouts via selector equality.
II. Trust roots
Four keys. Everything else is derived. For each: who holds the private key, where the public key is declared, and who verifies.
1. CI release key
| Private | HSM / TPM-backed keyslot on coordinator (operator infra) |
| Public (declared) | nixfleet.trust.ciReleaseKey in fleet.nix (Nix layer) |
| Verified by | CP (on fleet.resolved load), optionally agents |
| Algorithm | ed25519 or ecdsa-p256 - declared alongside the public key; the signature’s algorithm (§I #1 meta.signatureAlgorithm) must match |
| Rotation grace | nixfleet.trust.ciReleaseKey.previous valid for 30 days after rotation |
Algorithm rationale. ed25519 is the preferred default for HSMs, YubiKeys, cloud KMS, and software-held keys. ECDSA P-256 exists as a second-class citizen because commodity TPM2 hardware (Intel PTT, AMD fTPM, most discrete TPMs) exposes RSA + NIST P-256 but not the ed25519 curve (TPM2_ECC_CURVE_ED25519 = 0x0040 is rare). Both algorithms produce 64-byte signatures and have comparable security margins (~128-bit). Producers (operator CI) pick one at install time based on hardware; the trust-root declaration tells consumers which verifier to use.
Public-key encoding.
ed25519- raw 32-byte public key, base64-encoded infleet.nix(matches the format used byssh-keygen, agenix, minisign).ecdsa-p256- uncompressed point, 64 bytes (X ‖ Y, no0x04prefix), base64-encoded. Consumers convert to SEC1 / DER SPKI at verify time.
The declaration shape:
nixfleet.trust.ciReleaseKey = {
algorithm = "ecdsa-p256"; # or "ed25519"
public = "<base64 of raw bytes>";
};
Signature encoding. Raw 64 bytes for both algorithms - R ‖ S for ECDSA, standard R ‖ S for ed25519. No DER wrapping, no PGP armour. Put next to the canonical payload as fleet.resolved.json.sig.
Rotation procedure.
- Generate new keypair (operator infra) - may differ in algorithm from the outgoing one.
- Commit: set
ciReleaseKey = <new>,ciReleaseKey.previous = <old>infleet.nix. Consumers that pin both must accept signatures under either algorithm during the overlap. - CI starts signing with new key on next build.
- After 30 days, remove
previousfromfleet.nix; old-key-signed artifacts rejected.
Compromise response. Immediate: remove compromised key from fleet.nix, set rejectBefore = <timestamp> (all artifacts signed before that are refused regardless of key). Rebuild CI environment. Sign a fresh fleet.resolved from known-clean CI. Document in SECURITY.md.
2. Cache trust keys
| Private | Each cache implementation’s own keystore (harmonia signing key file, attic signing key, cachix authtoken-derived, etc.) |
| Public (declared) | nixfleet.trust.cacheKeys (Nix layer) - flat list of opaque strings |
| Verified by | nix’s substituter (via nix.settings.trusted-public-keys) before every closure activation |
| Format | Implementation-defined string. Stock <name>:<base64> (harmonia, nix-serve, cachix) and attic’s attic:<host>:<base64> are both accepted by nix and may be mixed in one list. |
| Rotation grace | Add the new key alongside the old in the list; remove the old once all hosts have switched. |
Framework agnosticism. The framework forwards these strings opaquely - it does not parse, dispatch on, or otherwise discriminate between cache implementations. Choosing harmonia, attic, cachix, plain nix-serve, or a custom HTTP cache is a fleet-side decision; the framework’s only requirement is that the chosen impl serves the standard nix-cache HTTP protocol so that services.nixfleet-cache.cacheUrl works.
3. Org root key
| Private | Offline hardware (Yubikey) held by operator |
| Public (declared) | nixfleet.trust.orgRootKey (Nix layer) |
| Verified by | CP, when validating enrollment tokens |
| Algorithm | ed25519 |
| Rotation grace | 90 days; effectively never under normal operation |
Rotation procedure. Rare. If it rotates, every bootstrap token generated from the old key becomes invalid - every host re-enrollment requires a new token signed by the new key. Not a routine event.
Compromise response. Catastrophic: every enrollment token is potentially forgeable. Revoke old key, issue all hosts new bootstrap tokens, re-enroll fleet. Consider this the equivalent of an “infrastructure rebuild” event.
4. Host SSH key
| Private | Per-host /etc/ssh/ssh_host_ed25519_key (generated at provision) |
| Public (declared) | fleet.nix host entry (hosts.<n>.pubkey) (Nix layer) |
| Verified by | Auditor (probe output signatures), CP (mTLS cert binding at enrollment) |
| Algorithm | ed25519 (OpenSSH-compatible) |
| Rotation grace | Host key change = re-enrollment; no grace |
Rotation procedure. If a host’s key changes, the old host is considered gone and a new one is being enrolled. Secrets must be re-encrypted for the new recipient; probe-output signatures chain through the boot/generation record.
Operational note: enforce the trust posture with strict mode
The four roots above describe what the framework verifies. Whether that verification fires depends on the CP being configured to use it: --client-ca enables mTLS, --revocations-{artifact,signature}-url keeps revocations live across rebuild, and the X-Nixfleet-Protocol header guards the wire shape. By default each fallback degrades silently (warn-and-continue) so dev/test isn’t blocked.
Production fleets should set services.nixfleet-control-plane.strict = true (or pass --strict / NIXFLEET_CP_STRICT=1). In strict mode the CP refuses to start when any of these flags is unset, and rejects requests missing the protocol header. The NixOS module emits a warning when the listener is exposed beyond loopback while strict = false.
III. Canonicalization
JCS (RFC 8785) with a single Rust implementation, byte-identical across all signers and verifiers.
Producer-side (the Nix layer’s lib/mk-fleet.nix) MUST emit values that round-trip through JCS losslessly: ints only (no floats), deterministic attr order, no JSON-incompatible types. Consumer-side (the Rust runtime’s bin/nixfleet-canonicalize) pins the library.
- Library choice. Pinned to
serde_jcs0.2, hosted bycrates/nixfleet-canonicalize. Rationale: direct RFC 8785 implementation overserde_json::Value; handles UTF-16 key sorting and ECMAScript number formatting per spec. Any change to this pin is a contract change (§VII) requiring signoff from every layer that signs or verifies artifacts (CI/infra, Nix, Rust). - Golden-file test.
crates/nixfleet-canonicalize/tests/fixtures/jcs-golden.{json,canonical}with byte-exact equality asserted intests/jcs_golden.rs. Runs on every push via pre-pushcargo nextest run --workspace; fails loudly on any drift. The ed25519-signed-bytes extension of this fixture lands alongside the CI release key. - Usage. Every signed artifact (fleet.resolved, probe output) is canonicalized via this single library before signing and before verification. No ad-hoc serializers in Nix, shell, or other crates.
When the Nix layer needs to produce a JCS-canonical artifact (e.g. CI signing fleet.resolved), it invokes the same Rust canonicalizer via a small shell tool (nixfleet-canonicalize). Do not reimplement in Nix or shell.
IV. Control-plane storage purity rule
The control plane’s SQLite database exists to cache operational state. Every column MUST satisfy one of:
- Derivable from git + agent check-ins. Documented in a line comment on the column:
CREATE TABLE hosts ( hostname TEXT PRIMARY KEY, -- derivable from: fleet.resolved current_gen TEXT, -- derivable from: agent check-in last_seen_at DATETIME, -- derivable from: agent check-in ... ); - Explicitly listed in “accepted data loss.” See below.
Accepted data loss list - state that is intentionally not preserved through a control-plane teardown:
| State | Reason | Recovery |
|---|---|---|
| Certificate revocation history | Revocations are operational decisions, not automated. | Operator re-declares revocations after teardown. |
| Per-rollout event log (> 30 days old) | Historical trace, not operational. | Available via log aggregation (§I.6), not CP-internal. |
Rule. A new column that is neither derivable nor on the accepted-loss list is a contract violation. It fails the teardown test and must be either removed or moved into the declarative state.
V. Versioning patterns
Current state at a glance
| Contract | Current version | Evolution |
|---|---|---|
fleet.resolved.json | schemaVersion: 1 | Additive within v1; bump for breaking changes. meta.signatureAlgorithm added in v1 - optional, defaults to "ed25519" when absent. |
RolloutManifest | schemaVersion: 1 | Additive within v1; every shape change rebumps in lockstep (§I #8). |
revocations.json | schemaVersion: 1 | Same shape as fleet.resolved.json. |
| Wire protocol | v1 (header) | Additive within major; dual-support during migration |
| Probe descriptor per framework | <framework>/v1 per framework | New string for new shape; old shape kept during migration |
| Probe output | Tracked with the control | Same as descriptor |
| Log/event | logSchemaVersion: 1 | Same pattern as wire protocol |
| Agenix format | Pinned by flake.lock | Treat upgrade as spine change |
The three patterns and why they are NOT unified
Three boundary contracts in this framework version themselves three different ways. Each is right in its scope; trying to unify them would lose information.
| Pattern | Used by | Identifier | Scope of a bump |
|---|---|---|---|
A - meta.schemaVersion: u32 | Signed artifacts (fleet.resolved.json, RolloutManifest, revocations.json) | JSON field inside the artifact | The data shape of the artifact bytes |
| B - HTTP header | Agent ↔ CP wire (§I #2), log/event schema (§I #6) | X-Nixfleet-<Capability>: <major> per request | Per-request interaction capability |
| C - Embedded schema string | Compliance controls (§I #3, #4) | <vocabulary>/<version> per item | One vocabulary item’s contract |
Why not pick one across the board:
- Wire version inside the artifact (Pattern A everywhere) would couple wire bumps to artifact bumps - every new wire field would force re-signing every artifact in flight, even artifacts whose data shape did not change.
- Artifact schema in an HTTP header (Pattern B everywhere) would destroy self-description: an auditor reading canonical bytes off disk, out of a cache, or shipped by email has no HTTP envelope to read the version from.
- Global vocabulary version (Pattern C everywhere) would force every contract to re-version on any single change, breaking the per-framework / per-artifact cadence assumption that lets compliance frameworks evolve independently.
The patterns are right in their respective scopes; the inconsistency is real but load-bearing.
Decision tree - picking a versioning pattern for a new contract
Q1. Is the contract a self-describing chunk of data that may be read out-of-context (off disk, out of a cache, by an auditor, by a third-party tool, mailed to someone)?
- Yes -> Pattern A (
meta.schemaVersion: u32). The bytes carry their own version label.
Q2. Is the contract a per-request interaction capability between two endpoints sharing live session state?
- Yes -> Pattern B (HTTP header). The version applies to the request, not to a persisted blob.
Q3. Is the contract an independent vocabulary item that evolves on its own cadence, distinct from peer items in the same family?
- Yes -> Pattern C (embedded schema string). Each item carries its own version; the family does not aggregate.
If none fit, the contract is probably small enough not to need a versioning mechanism at all - pin by flake.lock (e.g., agenix format §I #5) or by review.
Naming conventions per pattern
| Pattern | Field/key naming | Version literal | Example |
|---|---|---|---|
| A | camelCase JSON keys; envelope under meta.* | unsigned integer | "meta": {"schemaVersion": 1} |
| B | X-Nixfleet-<Capability> HTTP header | bare integer for major | X-Nixfleet-Protocol: 1 |
| C | <vocabulary>/<version> string | v<N> suffix | "anssi-bp028/v1" |
Bump procedure per pattern
Pattern A - meta.schemaVersion.
- Additive fields land within the current
schemaVersion; consumers MUST ignore unknown fields. - Removing or changing the meaning of a field requires bumping
schemaVersionand shipping a migration window where consumers accept both versions. - Compatibility window default: 30 days from the first signed artifact under the new version. After the window, old-version artifacts are refused.
- Sunset notice: announce the bump in
CHANGELOG.mdat least one release before the cutover; flag old-version artifacts in CP logs during the window. - Producers stop emitting the old version at least one full
freshnessWindowbefore consumers stop accepting it, so no in-flight artifact ages out mid-rotation.
Pattern B - HTTP header.
- Additive fields within the same major (consumers MUST ignore unknown fields, same posture as Pattern A).
- Removing a field or changing semantics requires a major bump (
X-Nixfleet-Protocol: 2) and dual-version CP support during migration. - Compatibility window default: one full agent renewal cycle (30 days) so a rolling cert renewal naturally drags every agent onto the new major.
- Sunset notice: CP logs a deprecation warning when it admits a request under the old major; flips to HTTP 400 after the window.
- Wire endpoints are versioned, not rotated - there is no “old key” analogue; the deprecation is the rotation.
Pattern C - embedded schema string.
- Each
<vocabulary>/<version>pair is immutable once shipped. - New version = new schema string (e.g.
anssi-bp028/v2); the agent ships a handler registry keyed on(control_id, schema). - Compatibility window: controls MAY support multiple schema versions during migration; the framework imposes no global cutover.
- Sunset notice: per-control, declared by the framework’s release notes when a new schema version supersedes an old one.
- Rotation/deprecation is per-control; the framework does not aggregate across the vocabulary family.
Concrete lifecycle examples
Pattern A - adding meta.signatureAlgorithm to fleet.resolved.json. Field added optionally within schemaVersion: 1. Consumers absent the field interpret it as "ed25519" for backward compatibility (§I #1). No bump. This is the prototypical compatible additive change - under stricter discipline it could have been a schemaVersion: 2 bump, but the “default when absent” rule preserved single-version compatibility for unmodified consumers.
Pattern B - widening EvaluatedTarget (RFC-0003 §4.1). Three new optional fields (rollout_id, wave_index, activate) added to CheckinResponse.target. Per RFC-0003 §6 additive rule, no X-Nixfleet-Protocol bump required: old agents that don’t deserialize the new fields keep working; new agents reading them from an old CP receive None.
Pattern C - adding a new compliance framework version. A new anssi-bp028/v2 lands as a new probe descriptor. The agent registry adds a v2 handler keyed on (control_id, "anssi-bp028/v2"). Hosts on channels still emitting v1 probes keep the v1 handler; hosts on channels migrated to v2 receive v2 probes. No global cutover; the v1 handler is removed from the registry only after every channel’s compliance config is migrated.
VI. Implementation agnosticism
The framework promises mechanism, not implementation. The following are explicit non-commitments - the framework runtime contains no code that depends on these choices, and a fleet may freely substitute any conforming alternative without forking nixfleet.
| Concern | Framework requires | Fleet picks |
|---|---|---|
| GitOps source for the channel-refs poll | An HTTPS URL pair (artifact + signature) that returns the raw signed bytes when GET’d, optionally with Authorization: Bearer <token>. Configured via services.nixfleet-control-plane.channelRefsSource.{artifactUrl, signatureUrl, tokenFile}. | Forgejo / Gitea / GitHub / GitLab / sourcehut / plain HTTPS / S3 with presigned URLs / anything HTTP-shaped. URL templates for common forges live in flake.scopes.gitops.* (this repo’s impls/gitops/) as pure data - adding a new forge is one .nix file, no Rust changes. |
| Binary cache server | Nothing - the framework does not ship a cache-server module. Hosts that should serve a cache wire one in fleet-side. | services.harmonia, services.atticd, services.nix-serve, cachix as a service, or a hand-rolled wrapper. The consuming fleet picks. |
| Binary cache client | An HTTPS URL + a public key string. Configured via services.nixfleet-cache.{cacheUrl, publicKey}. | Any cache speaking the standard nix-cache HTTP protocol (narinfo + nar). Identical client config regardless of which server impl is upstream. |
| Cache trust keys | A flat list of opaque strings forwarded to nix.settings.trusted-public-keys. Configured via nixfleet.trust.cacheKeys. | Stock <name>:<base64>, attic attic:<host>:<base64>, or both at once - see §II #2. |
| PKI / mTLS issuer | Cert + key file paths on disk. The framework reads them; their provenance is not a contract. | Caddy’s internal CA (current fleet choice), Smallstep, vault-pki, hand-rolled scripts, or a public CA - anything that produces RSA / ECDSA / Ed25519 cert files compatible with rustls. |
| Secrets backend | Cert / key / token paths in option fields. The framework reads files; how they got there is not a contract. | agenix (current fleet choice), sops-nix, plain nixops, manual secret-staging scripts, or systemd-creds. |
| Disk layout | A disko.devices attrset on the host. | Hand-rolled disko config in the consuming fleet, or none if filesystems are pre-provisioned. |
| Impermanence | An environment.persistence option must exist (the framework’s own service modules contribute to it). The framework imports the upstream impermanence flake to satisfy this. | Activate via nixfleet.impermanence.enable = true, or leave disabled - the schema is always declared. |
What this means for fleets. Every framework binary or NixOS module touches only the contract surface above. A fleet that wants GitHub instead of Forgejo, harmonia instead of attic, sops-nix instead of agenix, or vault-pki instead of Caddy CA changes its scope imports and its option values - the framework code is rebuilt without modification.
What this means for nixfleet maintainers. New tech-specific impls land under impls/<family>/ and get exposed at flake.scopes.<family>.<impl> - they remain opt-in for fleets. If something tech-specific must enter the framework’s runtime path - e.g. a new wire-protocol participant - it’s a contract change governed by §VII below.
Irreducible technology assumptions
A small set of technology choices are load-bearing for the framework - they’re not implementation choices a fleet can swap. Replacing one of these means building a different framework.
| Assumption | Why load-bearing | Replacing means |
|---|---|---|
| Nix + flakes | The whole declarative side (mkHost, mkFleet, the option system, hostSpec contract, fleet.resolved evaluation) is built on Nix evaluator semantics; the framework has no non-Nix front-end. | Re-implementing the declarative layer in another DSL - different framework. |
| NixOS (system layer) | The Linux agent’s activation pipeline assumes NixOS’ generation model: /run/current-system resolves to the active toplevel, nixos-rebuild switch --system <path> is the activation primitive, post-switch verification reads basename(realpath /run/current-system). The §I #1 contract refers to “closure hash”; that concept is meaningful in NixOS terms. | A separate activation backend abstraction - see roadmap. Until that lands, non-NixOS Linux is out of scope. |
| systemd | Every framework NixOS module declares systemd.services.nixfleet-*. Hardening, restart policy, credential plumbing, dependency ordering all use systemd primitives. | Rewriting the system-service layer for runit/s6/launchd - same scope as a non-NixOS port. |
| mTLS over HTTP/1.1 | Agent ↔ control-plane authentication identity is the client cert CN; authorisation is per-route. The CP’s rustls config is the trust boundary the agent verifies; replacing TLS means a different wire protocol. | A different wire protocol (Noise, Tailscale ACL, mutual auth over WireGuard). Different framework. |
TPM is not on this list. TPM hardware is a fleet’s choice of signing keyslot, not a framework requirement. The keyslots/tpm impl ships at flake.scopes.keyslots.tpm as one option among many; the framework runtime never links a TPM library. A fleet using a YubiKey, software key, HSM, or KMS for the CI release key is fully framework-supported - see §I #1’s hook contract. The current reference fleet happens to use TPM-backed ECDSA P-256; that’s deployment opinion.
Why call these out. The agnosticism work made it easy to add new tech-specific impls as scopes. The four assumptions above cannot be captured by the same pattern - there is no scope a fleet can import to replace systemd. Documenting them here prevents the framework from drifting into pretending they’re substitutable, and gives future maintainers a clear test: if it’s listed below the agnosticism table, scope-side; if it’s listed in this irreducible-assumptions table, framework-side and out of scope to abstract.
VII. Non-contracts (explicit)
The following are NOT contracts - they may change without coordination:
- Internal CP SQLite layout (as long as §IV rule holds).
- Internal agent process structure (threads, tokio tasks).
- Internal reconciler intermediate data structures.
- Nix module option defaults (overridable per-host).
- Formatter choices, lint rules.
- Directory layout inside
crates/beyond crate names.
If something that should be a contract is drifting, propose it as an addition to this document via PR - do not unilaterally stabilize it in code.
Implementation status disclosure. Some contracts in §I - notably parts of
CheckinResponse.target(RFC-0003 §4.1) and the rollback-and-halt semantics in the reconciler (RFC-0002 §5.1) - are schema-honored but behavior-partial. The framework declares the wire shape and the option surface, but specific code paths are deferred. This disclosure is not a contract weakening - the listed contracts remain authoritative and additive - but it makes explicit that “passes verification” does not yet mean “exercises every documented field.”
Operator config file
~/.config/nixfleet/config.toml is operator-side state with the following schema:
cp_url = "https://cp.example.com:8080"
ca_cert = "/etc/nixfleet/ca.pem"
client_cert = "/home/operator/.config/nixfleet/operator.pem"
client_key = "/home/operator/.config/nixfleet/operator.key"
All fields are optional in the file - missing fields fall through to NIXFLEET_* env, then to explicit flags. The CLI fails closed: any unfilled field triggers ConfigError::Missing with a hint to run nixfleet config init.
The file path can be overridden per-invocation with --config <path> or NIXFLEET_CONFIG.
VIII. Amendment procedure
- Open a PR that modifies this document.
- Label it
contract-change. - Review requires a signoff from each layer whose code implements the contract.
- Merge only after the code change that implements the new contract is ready in the same PR (or a linked follow-up that must land within the same spine milestone).
Nix source layout
The Nix half of the codebase is split into four layers. The split is structural, not stylistic - each directory plays a distinct role in the API surface and import graph. When adding code, this doc tells you which layer it belongs in.
If you’re looking for the runtime architecture (components, trust flow, build order), read ./architecture.md instead. This doc is about source organization for contributors.
The four layers
| Directory | Role | Imported by |
|---|---|---|
lib/ | Public flake API. Function-style helpers consumers call from their own flake.nix. | Consumer fleets via nixfleet.lib.* |
modules/scopes/ | Auto-included service modules. Contributed to every host mkHost builds; gated by enable flags. | Implicitly, through mkHost |
contracts/ | Typed schemas with no implementation. Declare options that other code reads and writes; carry no runtime behavior. | mkHost (auto-imports all) |
impls/ | Opt-in implementations of contract schemas. A fleet picks at most one impl per family. | Consumer fleet, explicitly |
The split is visible in modules/flake-module.nix: flake.lib exposes the lib/ layer; flake.scopes.* exposes the impls/ layer as named alternatives; modules/scopes/* are wired into mkHost’s default module list; contracts/* are auto-imported via mkHost’s prelude.
When does code go where?
lib/ - public API
Goes here if it’s a function consumers call: mkHost, mkFleet, mkVmApps, mergeFleets, withSignature. Pure Nix functions, no NixOS module evaluation. Exposed at nixfleet.lib.<name> via lib/default.nix.
Rule of thumb: if the body is fleetConfig: { ... } or args: { ... } and never declares options.* or config.*, it belongs here.
Example - the recently split mkVmApps is a function returning a flake-apps attrset; its implementation lives in lib/mk-vm-apps.nix plus the lib/vm-platform.nix / lib/vm-helpers.sh / lib/vm-scripts/ siblings.
modules/scopes/<scope>/ - auto-included service modules
Goes here if it’s a NixOS module the framework wants every relevant host to evaluate, gated by an enable flag. The agent, the control plane, the operator user, the cache pinning, the microvm host - all auto-included by mkHost so consumers don’t have to remember to imports = [ ... ] every relevant module on every host.
Each file is a complete NixOS module: declares its services.<name>.* options, plus the config = lib.mkIf cfg.enable { ... } block that lights up when the consumer flips the flag.
Naming convention: file starts with _ (e.g. _agent.nix, _control-plane.nix) to keep the import-tree merge predictable.
contracts/ - typed schemas, no implementation
Goes here if it’s an option schema other code depends on, but the schema itself has no behavior. The schema declares what fields exist and what types they take; downstream impls or service modules read those fields and do the actual work.
Today: hostSpec (host identity), nixfleet.persistence.* (persisted-paths schema), nixfleet.trust.* (trust-root keys + algorithms).
Rule of thumb: if removing this file’s config = ... block would change no observed runtime behavior, it’s a contract.
Putting a contract here (rather than inline in a service module) decouples readers from writers. Multiple service modules can contribute nixfleet.persistence.directories without knowing whether the consumer fleet uses impermanence, ZFS rollback, or no impermanence at all. The impl reads those contributions and translates.
impls/<family>/<impl>.nix - opt-in implementations
Goes here if it’s a concrete implementation of a contract schema, where multiple alternatives could exist and a consumer fleet picks one explicitly.
Today’s families:
| Family | Contract | Impls |
|---|---|---|
persistence | contracts/persistence.nix | impermanence |
keyslots | (none yet - contract is implicit in TPM module) | tpm |
gitops | source-URL builders for services.nixfleet-control-plane.channelRefsSource | forgejo (also aliased as gitea) |
secrets | identity-path resolution for agenix/sops backends | secrets (single canonical impl) |
Each impl is exposed as flake.scopes.<family>.<impl> (see modules/flake-module.nix). Consumer fleets opt in by importing exactly one per family:
imports = [ inputs.nixfleet.scopes.persistence.impermanence ];
Sibling entries are mutually exclusive. Adding a third impl to an existing family is when the family-vs-impl boundary earns its keep - write a new file alongside the existing one, no schema change required.
What does not go in any of these
- Rust code. Lives in
crates/, independent build graph. - Per-host NixOS configuration. Lives in the consumer fleet’s flake (e.g.
cache.nix,workstation.nix). The framework doesn’t ship host configs; it ships the machinery to build them. - Test fixtures and scenarios. Live in
tests/harness/, not in any framework layer. - Internal flake plumbing that isn’t part of the public API or a NixOS module:
modules/apps.nix(thevalidateflake-app),modules/formatter.nix(treefmt config),modules/rust-packages.nix(crane wiring). These live atmodules/root, not in a layer subdirectory.
Cross-references
modules/flake-module.nix- the wire-up that turns these directories into flake outputs.lib/mk-host.nix- the function that auto-includesmodules/scopes/*andcontracts/*for each host../contracts.md- the cross-stream artifact contracts (different sense of “contract” - wire formats and signed artifacts, not Nix option schemas).
Dependency pinning policy
Fleet repos that consume nixfleet inherit nixpkgs, home-manager, disko, and the other shared inputs via inputs.<name>.follows = "nixfleet/<name>" rather than pinning their own copies. The framework modules are evaluated against the exact revisions declared in this repo’s flake.lock; an independent consumer pin can drift on option renames, type changes, or removed modules between the revisions the framework was tested against and the ones the consumer evaluates with.
The practical contract: nix flake update nixfleet in a fleet repo updates every shared dependency in one step. The framework commits to staying current against nixos-unstable so consumers are not pinned to a stale tree. Fleet-specific inputs (themes, editor plugins, things the framework does not know about) are pinned independently by the consumer - the follows chain only covers what the framework guarantees to test against.
Crates
NixFleet’s Rust workspace consists of 9 crates with clear separation of concerns. Each file in this directory is a one-screen mental model for one crate; for type signatures and method-level detail, follow the rustdoc link.
| Crate | One-line summary |
|---|---|
| nixfleet-proto | Wire types: Serde-derived schema for every artefact and HTTP body. |
| nixfleet-canonicalize | JCS canonical JSON for signing - lean deps, no async runtime. |
| nixfleet-verify-artifact | Offline auditor: verifies signed artefacts against trust roots. |
| nixfleet-state-machine | Pure per-host reducer (RFC-0005 §3); same code on agent + CP. |
| nixfleet-reconciler | Pure decision procedure: reconcile, verify_artifact, planner gates. |
| nixfleet-release | CI release tool: signs fleet.resolved.json + revocations sidecar. |
| nixfleet-cli | Operator umbrella binary (nixfleet subcommands). |
| nixfleet-agent | Host daemon: polls CP, fetches/applies closures, reports back. |
| nixfleet-control-plane | Axum HTTP service + SQLite; routes signed intent to agents. |
nixfleet-agent
Role. Host daemon. Runs as nixfleet-agent.service (systemd on Linux) / system.nixfleet.agent launchd label (Darwin). Polls the control plane over mTLS, fetches dispatched closures from the binary cache, activates them via nixos-rebuild switch (or the Darwin equivalent), confirms convergence or reports failure, and self-signs evidence payloads with the host SSH key. Carries the actuator side of the contract: the only thing in the architecture that mutates a host’s running system, and the last line of defence (rollback on failed activation, freshness-window enforcement on stale intent).
Key types. comms::ReqwestReporter (mTLS-bearing HTTP client to CP), the event-driven runtime (runtime/: reducer task, worker fabric, applier, outbound queue, manifest-poll / heartbeat / longpoll / probe workers), evidence_signer (ed25519 over JCS canonical payload bytes using the host’s /etc/ssh/ssh_host_ed25519_key), manifest_cache (persisted last-good RolloutManifest for offline-tolerant verification), freshness (clock-skew check against signed-at vs the channel’s freshness window), recovery (boot-time check-for-rollback path). Per-host lifecycle runs through nixfleet_state_machine::step() (same reducer as the CP-mirror), with auto-rollback on activation failure and CP-driven rollback signal handling.
Surface. Binary nixfleet-agent with CLI flags mirrored by environment variables (NIXFLEET_AGENT_*): --control-plane-url, --machine-id (must match the client-cert CN), --poll-interval (default 60s), --trust-file, --ca-cert, --client-cert, --client-key, --bootstrap-token-file (enrol via /v1/enroll when client-cert is absent), --state-dir (default /var/lib/nixfleet-agent), --compliance-gate-mode, --ssh-host-key-file, --health-checks-config. NixOS module services.nixfleet-agent.{enable, controlPlaneUrl, machineId, trustFile, tls.{caCert, clientCert}, healthChecks, ...} materialises the systemd unit and renders the health-checks JSON.
Links.
- Generated rustdoc:
api/nixfleet_agent/ - Relevant RFCs: RFC-0003, RFC-0005, RFC-0007, RFC-0008, RFC-0011, RFC-0012
- Architecture component: §1.5 Agent, §3 The main flow
nixfleet-canonicalize
Role. Pure deterministic JSON canonicalisation for the signing path. JCS (RFC 8785): sort object keys lexically, normalise number representation, force UTF-8. LOADBEARING - every signer (CI release, agent evidence, CP rebuild) and every verifier routes through this crate; drift here invalidates signatures fleet-wide. Lives in the offline-auditor closure: no reqwest, no tokio, no async runtime, so a third-party regulator can build the verifier standalone without pulling the operator’s network stack.
Key types and state machines. None public beyond two free functions. Operates on serde_json::Value (parsed input) and emits canonical bytes ready for ed25519 / ECDSA signing. No mutable state; no statefulness across calls.
Surface. Two surfaces: the library functions canonicalize(&str) -> Result<String> (JSON string to JCS-canonical bytes) and sha256_jcs_hex<T: Serialize>(&T) -> Result<String> (hex-lowercase SHA-256 of the canonical bytes - the function nixfleet-reconciler::compute_canonical_hash and rolloutId derivation route through here); plus the nixfleet-canonicalize binary that reads JSON on stdin and writes canonical bytes on stdout. Binary exit codes: 0 ok, 1 parse / canonicalise error, 2 I/O error. The binary is the auditor’s tool, paired with nixfleet-verify-artifact for full signature checking.
Links.
- Generated rustdoc:
api/nixfleet_canonicalize/ - Relevant RFCs: RFC-0001, RFC-0003
- Architecture component: §1.2 CI, §4 The trust flow
nixfleet-cli
Role. Operator umbrella binary (nixfleet). Talks to the control plane over mTLS for status and rollout views, and ships offline helpers for bootstrap and key derivation. Library form so binaries compose against it and unit tests exercise table rendering and status classification without spinning up a real CP. Carries the operator’s day-to-day surface; everything an operator types lives behind one of these subcommands.
Key types and state machines. ResolvedClientConfig (cp_url + CA / client cert / client key paths after the flag > env > file layered loader runs), FileConfig / Overrides (config-file shape and per-layer overrides), StatusInputs (now, hosts, channel_freshness) feeding the deterministic table renderer. Status-label priority is explicit (Failed > Quarantined > PendingReboot > Converged > Stale > InFlight > Queued) with tests locking each transition; pin metadata appends as a 🔒<short> suffix so health stays the primary signal.
Surface. Subcommands of the nixfleet binary: status [--json] [--no-color] (fleet table: convergence, freshness, outstanding compliance / runtime-gate / health failures, pin markers), rollout hosts <rollout-id> [--json] (per-host summary, one row per host), rollout events <rollout-id> [--json] (chronological event-log stream with <open> markers for unresolved dispatches), config init --cp-url --ca-cert --client-cert --client-key [--path] [--force] (write ~/.config/nixfleet/config.toml), derive-pubkey (base64 ed25519 pubkey from raw private key file), mint-operator-cert (mTLS client cert from the offline fleet root CA), mint-token (bootstrap token for first-boot enrolment). Network calls hit /v1/hosts, /v1/channels/{name}, /v1/rollouts/{id}/hosts, /v1/rollouts/{id}/events.
Links.
- Generated rustdoc:
api/nixfleet_cli/ - Relevant RFCs: RFC-0003, RFC-0010
- Architecture component: §1.4 Control plane, §3 The main flow
nixfleet-control-plane
Role. TLS server, event-driven runtime, and persistence layer. Wraps nixfleet-reconciler’s pure decision procedure and nixfleet-state-machine’s reducer in an Axum HTTP service backed by SQLite operational state, polls the signed-artefact directory for fresh fleet.resolved.json, projects an Observed view from the database, dispatches Actions by emitting agent-facing wire bodies, and persists everything per RFC-0006 (single-MPSC runtime, applier-co-write derived views). Carries the routing-and-state-storage half of the agent / CP protocol; the agent never talks to anything else.
Key types. TickInputs / TickOutput / VerifyOutcome / VerifyOk (one reconciler iteration’s input bundle and result), AppState (server-wide handle wiring Mutex<rusqlite::Connection> sized for O(100) hosts, the verified-fleet snapshot, the signed-artefact poller, the freshness window, the optional Prometheus registry), the polling timer (re-verifies the on-disk artefact at signing_interval_minutes cadence), the rollouts-source (resolves a manifest into the per-channel dispatch plan). The CP-mirror of per-host and per-rollout state runs through nixfleet_state_machine::step(); persistence is the host_rollout_records reducer cache plus the dispatch_history log used by /v1/rollouts/{id}/trace.
Surface. Library: tick(&TickInputs) -> Result<TickOutput> (one reconciliation iteration) and render_plan(&TickOutput) -> String (JSONL summary - one tick line plus one line per action, with offline Skips coalesced into a skip_summary). Binary: HTTP routes under /v1/agent/* (events, heartbeat, dispatch, renew), /v1/enroll, /v1/whoami, /v1/hosts, /v1/channels/{name}, /v1/rollouts, /v1/rollouts/{id}, /v1/rollouts/{id}/trace, /v1/deferrals, /metrics, /healthz. NixOS module services.nixfleet-control-plane.{enable, listen, tls.{caCert, certFile, keyFile}, trustFile, ...} materialises the nixfleet-control-plane.service systemd unit.
Links.
- Generated rustdoc:
api/nixfleet_control_plane/ - Relevant RFCs: RFC-0002, RFC-0003, RFC-0005, RFC-0006, RFC-0008, RFC-0011
- Architecture component: §1.4 Control plane, §3 The main flow
nixfleet-proto
Role. Single canonical source of every wire-format type crossing the agent / CP / CI boundary. Sits at the bottom of the workspace dependency graph; every other crate depends on it. Each pub module mirrors a JSON artefact on disk (fleet.resolved, revocations.json, trust.json, rollout manifest, host rollout-state marker) or an HTTP request/response body. Optional fields serialise as null (not omitted) so JCS bytes round-trip identically with the Nix evaluator output.
Key types and state machines. Per module: fleet_resolved::FleetResolved (the signed CI artefact projected from mkFleet, with Channel, Host, RolloutPolicy, Wave, Edge, DisruptionBudget, Pin, and Meta sub-types), trust::TrustConfig / TrustedPubkey / KeySlot (typed trust attrset deserialised from trust.json with time-aware key rotation), revocations::Revocations + RevocationEntry (signed cert-revocation sidecar), host_rollout_state::HostRolloutState (the per-host soak / promotion / failure state machine), rollout_manifest::RolloutManifest (per-rollout snapshot with HostWave and RolloutBudget), agent_wire and enroll_wire (HTTP bodies for /v1/agent/* and /v1/enroll), compliance::ComplianceControl (typed control with evaluate / probe projections), fleet_view::HostsResponse / RolloutTrace (operator-facing read views).
Surface. All types are Serialize + Deserialize + Clone + PartialEq + Debug. Tests round-trip every wire shape via crates/nixfleet-proto/src/testing.rs (gated behind the testing feature). Schema versioning lives on the wire types themselves (schema_version field where applicable); RFC-0003 owns the protocol-version policy. The crate exposes no functions and no async surface - it is purely the types.
Links.
- Generated rustdoc:
api/nixfleet_proto/ - Relevant RFCs: RFC-0001, RFC-0003, RFC-0010
- Architecture component: §1.2 CI, §1.4 Control plane, §1.5 Agent
nixfleet-reconciler
Role. Pure-function rollout reconciler and sidecar verification layer. Stateless: takes (FleetResolved, Observed, now) and returns a deterministic list of Actions. The control plane wraps it in an Axum server and a SQLite-backed Observed; the agent uses the verify half independently. Carries the “given fleet intent X and observed state Y, exactly what must happen next?” decision contract. No I/O, no clock except the now argument, no randomness - which means every action is auditable from inputs alone.
Key types. Action (the enum of decisions: OpenRollout, Dispatch, Skip, Promote, RotateTrustRoot, …), Observed (snapshot from the CP: channel refs, host state, active rollouts, deferrals, host-probes), Rollout (one open rollout’s view), the planner gates registry (channel_edges, wave_promotion, host_edges, disruption_budget, compliance_wave, quarantine). verify::SignedSidecar and verify::VerifyError cover signature failure modes that the auditor binary reports verbatim. The per-host and per-rollout state-machine reducers live in nixfleet-state-machine; this crate consumes their state via Observed but does not own them.
Surface. Library only, no binary. Public entry points: reconcile(&FleetResolved, &Observed, now) -> Vec<Action> (the canonical decision procedure - RFC-0002), topological_channel_order(...), verify_artifact / verify_rollout_manifest / verify_revocations / verify_signed_sidecar (signature verification with freshness-window enforcement - RFC-0011), compute_canonical_hash, compute_rollout_id_for_channel (canonical {channel}@{channel_ref} identifier per RFC-0008 §6.3), check_trust_rotations (emits RotateTrustRoot actions when a slot’s retire_at has passed and a successor exists), project_manifest (FleetResolved -> per-channel RolloutManifest projection).
Links.
- Generated rustdoc:
api/nixfleet_reconciler/ - Relevant RFCs: RFC-0002, RFC-0010, RFC-0011
- Architecture component: §1.4 Control plane, §3 The main flow, §4 The trust flow
nixfleet-release
Role. Producer for releases/fleet.resolved.json and its signed sidecars. Runs in CI; enumerates hosts from a consumer flake, builds each, pushes closures to a cache, evaluates the resolved fleet, injects closure hashes, stamps meta, canonicalises, calls an external sign hook, optionally writes signed revocations.json and per-channel rollout manifests, then atomically writes the artefacts and optionally git-commits / pushes. Carries the producer side of the signed-intent contract: anything an agent will execute starts as bytes that come out of this crate.
Key types and state machines. ReleaseConfig (assembled by the binary CLI; declares flake dir, sign hook, signature algorithm, release dir, git target, optional revocations attr, optional pin source URL), HostsSpec (Auto / AutoExclude / Explicit), HostKind (Nixos / Darwin, drives the flake-attr prefix and build path), RunOutcome (Released { commit_sha, hosts } or NoChange). No long-lived state; the pipeline is a top-level run(&ReleaseConfig) -> Result<RunOutcome> and the binary thin-wraps it.
Surface. The library exports run, inject_closure_hashes, stamp_meta, canonicalize_resolved, filter_expired_pins, and render_commit_message; the binary is nixfleet-release with flags mapping 1:1 onto ReleaseConfig. Hook contract is the external --sign-cmd: receives canonical bytes on $NIXFLEET_INPUT, writes a raw signature to $NIXFLEET_OUTPUT; signature algorithm must be ed25519 or ecdsa-p256. Per-host pinning supports pin.commit / pin.expires_at; non-current-commit pins require --pin-source-url so the pipeline can build via flake-ref. reuse_unchanged_signature produces byte-stable releases on no-op CI runs.
Links.
- Generated rustdoc:
api/nixfleet_release/ - Relevant RFCs: RFC-0001, RFC-0010
- Architecture component: §1.2 CI, §3 The main flow, §4 The trust flow
nixfleet-state-machine
Role. Pure per-host rollout state-machine reducer (RFC-0005 §3 + RFC-0006 §3). A single step(state, event, now, policy) -> Result<(state, Vec<Effect>), TransitionError> function. No I/O, no clock reads, deterministic. The same crate runs in the agent (drives the host’s local state from worker output) and the CP (mirrors that state from inbound events) — both sides share the reducer by construction.
Key types. HostRolloutState (the 6-state machine: Pending, Activating, Soaking, Soaked / Failed / Reverted / Deferred / Converged variants per RFC-0005 §3), Event (the input vocabulary — Local* variants emitted by the agent, Remote* mirrors synthesized CP-side from wire AgentEvents), Effect (side-effect descriptors the runtime applies — LocalEmitEvent, RemoteAppendEventLog, RunActivation, RunRollback, …), ProbeSubResult (per-control accounting carried on evidence-probe results — RFC-0007 §3.4), RolloutId newtype (canonical {channel}@{channel_ref} per RFC-0008 §6.3).
Surface. Library only. Public entry points: step (the canonical reducer), wire_conversions (bidirectional AgentEvent ↔ Event / OutboundAgentEvent ↔ AgentEvent maps that keep nixfleet-proto free of state-machine awareness per the d013 lift, RFC-0004 §2). Cargo.toml’s dependency list is part of the safety contract — tokio / reqwest / rusqlite are forbidden; CI verifies via cargo tree.
Links.
- Generated rustdoc:
api/nixfleet_state_machine/ - Relevant RFCs: RFC-0005, RFC-0006, RFC-0007, RFC-0004, RFC-0008
- Architecture component: §1.4 Control plane, §1.5 Agent
nixfleet-verify-artifact
Role. Offline verifier CLI. Given a signed artefact, its raw signature, and a trust.json, it answers a single binary question: does this artefact carry a valid signature from a currently-trusted key, signed inside the freshness window? Designed to be the regulator / auditor’s standalone tool: depends only on nixfleet-proto, nixfleet-canonicalize, and nixfleet-reconciler::verify - no network, no SQLite, no Tokio. Carries the verification half of the trust contract that CI’s signer asserts.
Key types and state machines. No public Rust API; the crate is binary-only. Internally routes through nixfleet_reconciler::verify::{verify_artifact, verify_rollout_manifest, verify_canonical_payload} and nixfleet_proto::TrustConfig::active_keys_at(now) so the same time-aware key-rotation policy that the CP uses applies offline.
Surface. Three clap subcommands:
artifact --artifact <path> --signature <path> --trust-file <path> --now <RFC3339> --freshness-window-secs <secs>- verify a signedfleet.resolved.json.rollout-manifest --manifest <path> --signature <path> --trust-file <path> --now <RFC3339> --freshness-window-secs <secs> --rollout-id <id>- verify a signed rollout manifest AND check that the on-disk bytes recompute to--rollout-id(catches mix-and-match / rename attacks).probe --payload <path> --signature <path> --pubkey <path>- verify a signed probe-output payload against a host’s OpenSSH-formatssh-ed25519pubkey.
Exit codes are stable: 0 verified, 1 verify error, 2 argument / I/O / parse error. schemaVersion mismatches on trust.json produce a typed argument error rather than a silent fallback.
Links.
- Generated rustdoc:
api/nixfleet_verify_artifact/ - Relevant RFCs: RFC-0001, RFC-0010, RFC-0011
- Architecture component: §1.5 Agent, §4 The trust flow
Rust API reference
The full Rust API reference for every crate in this workspace is generated by cargo doc --document-private-items and published alongside this book - entry point at ./api/nixfleet_proto/ (the wire-types crate; use rustdoc’s sidebar + search to reach the other workspace crates). cargo doc --no-deps doesn’t emit a workspace-root index, so any per-crate page works as a starting point.
Why a separate site
Rust’s official tool (rustdoc) is what every developer expects to read for a Rust crate. It includes everything we’d otherwise have to recreate by hand:
- Type signatures (functions, structs, enums, traits)
- Per-field and per-variant docs
- Resolved cross-references (
[SomeType]-> working link) - Source links per item
- Search index
- IDE integration via
rust-analyzer(the same content you see hovering an identifier in your editor)
The output is HTML, not Markdown - so it lives next to the mdbook book rather than inside a chapter.
Where to start
Each crate has an index.html rooted at <crate-name>/index.html. Useful starting points:
nixfleet_proto- wire types (CheckinRequest,EvaluatedTarget,ReportEvent,FleetResolved,TrustConfig). The boundary contract between agent and control plane.nixfleet_control_plane::server- the long-running TLS server: handlers, middleware, reconcile loop, state.nixfleet_control_plane::dispatch- pure dispatch decision (theDecisionenum +decide_target).nixfleet_agent::activation- agent’s realise + switch + post-switch verify pipeline.nixfleet_reconciler- the pure decision engine (verify_artifact, reconcile, action emission).
Regenerating
nix run .#docs runs cargo doc --workspace --document-private-items --no-deps, then mdbook build, then copies target/doc/ into the published site at book/api/. Idempotent.
The --document-private-items flag means internal modules and private functions show up too - important for this codebase since the line between “framework public” and “internal” doesn’t always match the pub/private boundary.
Quickstart
The minimum path from a fresh repo to a single managed host. For multi-host fleets, see RFC-0001 (fleet.nix) and the operator cookbook. For the unattended signed-rollout machinery, see architecture and RFC-0002 (reconciler).
A minimal host
mkFleet is the typical path even for a single host: declare the host, channel, and rollout policy; the framework wires the per-host nixosSystem for you. Direct nixfleet.lib.mkHost is also supported for one-off setups (the rest of this section uses mkFleet; see lib/mk-host.nix for the direct primitive).
{
inputs = {
nixpkgs.url = "github:NixOS/nixpkgs/nixpkgs-unstable";
nixfleet.url = "github:arcanesys/nixfleet";
};
outputs = { nixpkgs, nixfleet, ... }: let
fleet = nixfleet.lib.mkFleet {
hosts.my-server = {
system = "x86_64-linux";
channel = "stable";
tags = [];
nixosArgs = {
hostSpec.userName = "deploy";
modules = [
nixfleet.scopes.persistence.impermanence
nixfleet.scopes.secrets
./hardware-configuration.nix
({ ... }: {
users.users.deploy = {
isNormalUser = true;
extraGroups = [ "wheel" ];
openssh.authorizedKeys.keys = [ "ssh-ed25519 AAAA..." ];
};
services.nixfleet-agent = {
enable = true;
controlPlane.url = "https://cp.example.com:8080";
};
})
];
};
};
channels.stable = {
rolloutPolicy = "all-at-once";
signingIntervalMinutes = 60;
freshnessWindow = 1440;
};
rolloutPolicies.all-at-once = {
strategy = "all-at-once";
waves = [{ selector.all = true; soakMinutes = 0; }];
};
};
in {
nixosConfigurations = fleet.nixosConfigurations;
};
}
fleet.nixosConfigurations.<host> is a standard nixosSystem (or darwinSystem for Darwin platforms). Nothing in the result is NixFleet-specific — if you remove the agent module, the host is a vanilla NixOS configuration deployable with stock tooling.
Deploy
Standard NixOS / Darwin tooling, no NixFleet-specific glue:
nixos-anywhere --flake .#my-server root@192.168.1.50 # fresh install
sudo nixos-rebuild switch --flake .#my-server # local rebuild
darwin-rebuild switch --flake .#my-mac # macOS
Fleet rollouts are git-driven from this point: commit -> CI signs -> CP polls fleet.resolved.json -> agents pull their per-host target on next checkin. There is no operator CLI verb between commit and host activation. See operator cookbook -> Deploy a fleet change.
Build and install the operator CLI
cargo build --release -p nixfleet-cli
install -m 0755 target/release/nixfleet ~/.local/bin/
Alternatively, run without installing: nix run github:arcanesys/nixfleet#nixfleet-cli -- <subcommand>.
Initialise operator config
nixfleet config init \
--cp-url https://cp.example.com:8080 \
--ca-cert /etc/nixfleet/ca.pem \
--client-cert ~/.config/nixfleet/operator.pem \
--client-key ~/.config/nixfleet/operator.key
Writes ~/.config/nixfleet/config.toml (mode 0600). Override values per-invocation via flags or NIXFLEET_* environment variables. The flag > env > file precedence is locked in tests.
Verify
nixfleet status # rendered fleet table
nixfleet status --json # raw HostsResponse for piping
nixfleet rollout hosts <id> # per-host summary for a rollout
nixfleet rollout events <id> # chronological event-log stream
For the full CLI surface (subcommands, flags, status-label precedence, pin markers), see reference/crates/nixfleet-cli.
Next steps
- Enrol additional hosts: operator cookbook -> Add a host to the fleet
- Mint a bootstrap token: bootstrap-token-lifecycle
- Test the loop locally on VMs first: vm-lifecycle
- Verify your fleet config before pushing: testing
- Recovery runbook if cp goes down: disaster-recovery
Operator cookbook
Tasks the operator does, with concrete commands. Add new recipes when something becomes routine.
The recipes below use these placeholders:
<fleet>- your fleet repo (the one withflake.nix+mkFleet).<secrets>- your agenix secrets repo, if separate from<fleet>.cp- the host runningservices.nixfleet-control-plane.workstation- any host withnixfleet.operatorenabled (where you run thenixfleetCLI).newhost,stuckhost- example host names per recipe.
Substitute your own values throughout.
Deploy a fleet change
# 1. Edit fleet config locally
$EDITOR <fleet>/...
# 2. Commit + push to origin
git -C <fleet> commit -am "feat: ..."
git -C <fleet> push origin main
# 3. CI runs; commits a [skip ci] release commit ~minutes later
# 4. CP's channel-refs poll picks up the new artifact within 60s
# 5. Agent's next checkin: dispatch fires, agent activates, confirms
# To verify the deploy reached cp:
ssh root@cp "journalctl -u nixfleet-control-plane.service --since '5 minutes ago' \
--no-pager | grep -E 'snapshot refreshed|dispatch|confirm received'"
If cp gets stuck (rare since the prime + freshness-gate fixes), redeploy directly:
nh os switch .#cp --target-host root@cp --use-remote-sudo
Mint a bootstrap token for a new host
# On an operator workstation (any host with nixfleet.operator enabled)
nixfleet mint-token \
--hostname newhost \
--csr-pubkey-fingerprint <SHA-256-base64-of-newhost-pubkey> \
--org-root-key /run/agenix/org-root-key \
> newhost-token.json
# Encrypt to newhost via agenix; commit to <secrets>/agents/
agenix -e <secrets>/agents/newhost-bootstrap-token.age < newhost-token.json
git -C <secrets> commit -am "agents/newhost-bootstrap-token"
git -C <secrets> push origin main
# Bump fleet's lock; deploy newhost
nix flake update secrets -C <fleet>
git -C <fleet> commit -am "chore(flake): bump secrets for newhost"
git -C <fleet> push origin main
Revoke a host’s cert
# Open the SQLite DB on cp and insert a cert_revocations row.
ssh root@cp "nix-shell -p sqlite --run \
\"sqlite3 /var/lib/nixfleet-cp/state.db <<SQL
INSERT INTO cert_revocations (hostname, not_before, reason, revoked_by)
VALUES ('newhost', datetime('now'), 'compromised', '<your-name>');
SQL\""
# Newhost's existing cert is now rejected on every /v1/* call.
# To re-enroll: mint a fresh bootstrap token + redeploy newhost.
Rotate the org root key
The org root key is the trust anchor for bootstrap tokens. Rotating it means:
- Operator generates a new ed25519 keypair on an operator workstation.
- Encrypt the private half to the operator user(s) only via agenix ->
<secrets>/org-root-key.age. cp MUST NOT be a recipient. - Update
<fleet>/.../trust.nix:- Move the current
nixfleet.trust.orgRootKey.currentto.previous(rotation grace window). - Set
.currentto the new public half.
- Move the current
- Commit + push fleet -> CI re-signs -> cp picks up the new trust.json on next deploy.
- Old tokens minted under the previous key keep working for the rotation window (until the next config change moves
.previousto null).
Diagnose a stuck agent
ssh root@stuckhost "
echo '=== agent status ==='
systemctl is-active nixfleet-agent.service
echo '=== last 50 agent log lines ==='
journalctl -u nixfleet-agent.service -n 50 --no-pager
echo '=== current-system ==='
readlink /run/current-system | xargs basename
"
Then check what the CP saw last from this host:
ssh root@cp "nix-shell -p sqlite --run \
\"sqlite3 /var/lib/nixfleet-cp/state.db \\
'SELECT id, rollout_id, state, datetime(dispatched_at), datetime(confirmed_at) \
FROM pending_confirms WHERE hostname = \\\"stuckhost\\\" ORDER BY id DESC LIMIT 5;'\""
Look for: rows in pending long after deadline (rollback timer broken), repeated dispatches for the same target (closure_hash format drift), rolled-back rows (deadline expired before agent activated).
Add a host to the fleet
- Add the host’s
mkHost { ... }call in<fleet>/flake.nix. - Mint a bootstrap token (recipe above).
- Add the host to
<secrets>/secrets.nixrecipient lists for the secrets it should have access to. nixos-anywhere --flake .#newhost root@<bootstrap-ip>.- New host enrolls on first boot (uses the bootstrap token to get an mTLS cert), checks in, gets dispatched its declared closure.
Tag a release
# Tag a stable point - useful before major refactors so we have a known-good restore.
git -C <fleet> tag -m "v0.2.0-rc1: dispatch chain on hardware" v0.2.0-rc1
git -C <fleet> push origin v0.2.0-rc1
Bootstrap-token lifecycle
Operator runbook for minting, declaring, deploying, and consuming bootstrap tokens under the signed-nonce allowlist regime.
Minting + declaring
$ nixfleet mint-token \
--hostname newhost \
--org-root-key /run/agenix/org-root-key \
--fleet-resolved /tmp/fleet.resolved.json \
> /tmp/bootstrap-token-newhost.json
nonce: 1ed727e1f9c24e6ab87eb9693ba35e26
expiresAt: 2026-05-13T22:57:45Z
Add to fleet.nix `bootstrapNonces`, commit, and push:
{
nonce = "1ed727e1f9c24e6ab87eb9693ba35e26";
hostname = "newhost";
expiresAt = "2026-05-13T22:57:45Z";
mintedAt = "2026-05-12T22:57:45Z";
mintedBy = "ci-runner";
}
Paste the snippet into your fleet repo’s fleet.nix under
bootstrapNonces = [ ... ];, commit, and push.
CI signing
Forgejo CI runs nixfleet-release --bootstrap-nonces-attr 'fleet.bootstrapNonces' ..., which:
- Reads the operator-declared list from
fleet.nix. - Prunes entries with
expiresAt < signedAt(auto-audit pruning). - Builds
BootstrapNoncespayload + canonicalises. - Signs via
tpm-sign(same trust class asfleet.resolved.json). - Writes
releases/bootstrap-nonces.json+.sig. - Commits + pushes alongside
fleet.resolved.json.
Typical CI cycle: ~2 min.
CP applies the allowlist
CP polls bootstrap-nonces.json every 60 s. On a successful
verify, it replaces the in-memory AllowedNoncesView wholesale.
The CP refuses to serve /v1/* requests until at least one
verified allowlist has been applied
(bootstrap_nonces_primed = true).
Deploying the token to the host
# scp to host's /tmp
$ scp /tmp/bootstrap-token-newhost.json newhost:/tmp/
# install root-only on the host
$ ssh root@newhost '
install -m 400 -o root -g root \
/tmp/bootstrap-token-newhost.json \
/var/lib/nixfleet/bootstrap-token-newhost.json
shred -u /tmp/bootstrap-token-newhost.json
'
The agent must have --bootstrap-token-file /var/lib/nixfleet/bootstrap-token-newhost.json in its unit cmdline.
Set this via the NixOS option
services.nixfleet-agent.bootstrapTokenFile in your fleet config
and let the next rebuild propagate.
Triggering enrolment
$ ssh root@newhost '
rm /var/lib/nixfleet/agent-cert.pem
systemctl restart nixfleet-agent
'
The agent enters first-boot enrolment, reads the bootstrap token,
posts to /v1/enroll. The CP verifies the token signature, looks
up the nonce in the allowlist, and issues a 10-min cert (or
whatever agentCertValiditySecs is set to).
Post-enrolment
The nonce is consumed:
- In the signed allowlist: it stays until the operator removes
it OR until
expiresAtpasses and the next CI sign cycle prunes it. - In CP state.db (
enroll_token_nonces): replays within the current CP DB lifecycle return 409.
If the CP DB is wiped: the signed allowlist still has the entry until pruned by expiry, so a replay would return either:
- 401
nonce_allowlist_expiredif the allowlist’sexpiresAthas passed (the operator’s lever) - 200 OK with JSON
EnrollResponsebody (new cert issued) if the allowlist’sexpiresAtis still in the future AND the operator hasn’t removed the entry - this is the small replay window that exists by design until the operator manages the allowlist (or untilexpiresAtpasses naturally).
To narrow this window: keep token validity short (default 24h),
or remove the entry from fleet.nix after enrolment confirms
to commit + sign.
Disaster recovery
If state.db is wiped (Refinery checksum mismatch, disk loss,
intentional rebuild):
- CP starts up clean.
- Pollers run;
bootstrap-nonces.jsonapplied to memory. - CP can re-issue certs to hosts whose nonces are still in the allowlist (and whose tokens are still on disk).
- For hosts whose nonces have been removed/expired: operator re-mints + re-declares.
No host is “permanently dead” from a CP rebuild - full re-enrolment is always available given operator action.
VM lifecycle
NixFleet ships mkVmApps, a helper that exposes a per-host VM lifecycle on the consuming fleet’s nix run interface. Use it to exercise fleet configurations locally before deploying to real hardware.
Wire mkVmApps into the consuming fleet
outputs = { nixpkgs, nixfleet, ... }: {
# ... mkHost calls ...
apps = nixfleet.lib.mkVmApps { inherit pkgs; };
};
Once wired, the following nix run subcommands are available on the consuming fleet.
Subcommands
| Subcommand | What it does |
|---|---|
nix run .#build-vm -- -h <name> | Boot the NixOS installer under QEMU, run nixos-anywhere to install the host’s declared config to a fresh qcow2 under ~/.local/share/nixfleet/vms/, power off. Subsequent start-vm invocations boot the installed disk directly. |
nix run .#build-vm -- --all | Build every VM declared in the fleet. |
nix run .#build-vm -- -h <name> --rebuild | Wipe and reinstall (useful after clean-vm or after rotating a baked trust pin). |
nix run .#start-vm -- -h <name> [--vlan N] | Boot a previously-built VM. --vlan N puts every VM on a shared QEMU multicast L2 so they resolve each other by hostname. |
nix run .#stop-vm -- -h <name> | Power off a running VM. |
nix run .#stop-vm -- --all | Power off every running VM. |
nix run .#clean-vm -- -h <name> | Remove the VM’s qcow2 + per-host state under ~/.local/share/nixfleet/vms/. Must build-vm again before next start-vm. |
nix run .#test-vm -- -h <name> | Run integration test scenarios against the VM (host-specific test set). |
Per-host configuration
- RAM is declared per host in
hostSpec.vmRam(default 1 GiB). Pass--ram Nat runtime to override. - Port forwards live in
hostSpec.vmPortForwards. The host’s SSH port is auto-assigned alphabetically (2201 + index). - VLAN must match across every VM in a fleet that needs to resolve peers by hostname. Pass the same
--vlan Nto everystart-vminvocation.
Reference fleet
nixfleet-demo exercises every subcommand end-to-end on a 4-VM reference fleet (forge, cp, web-01, web-02). The repo’s README is a 10-step walkthrough from build-vm to a converged signed-GitOps loop - clone it as the fastest way to internalise the lifecycle.
Common workflows
- Iterate on a new module locally: edit fleet config ->
nix run .#build-vm -- -h <name> --rebuild->nix run .#start-vm -- -h <name>. - Wipe and reinstall a single VM:
nix run .#clean-vm -- -h <name>thennix run .#build-vm -- -h <name>. - Spin up the full fleet locally:
nix run .#build-vm -- --allthennix run .#start-vm -- -h <each>(each with the same--vlan). - Wipe everything and restart:
nix run .#stop-vm -- --all && nix run .#clean-vm -- --all && nix run .#build-vm -- --all.
Footguns
- Darwin returns empty.
mkVmAppsis a no-op on Darwin platforms;aarch64-darwinpkgs.OVMFis broken upstream. Build VMs on Linux hosts. clean-vmwipes guest state. Any keys generated on first boot inside the VM (release-signing keypair, agenix identity, host SSH keys) are gone. If youclean-vma forge VM in the demo pattern, every downstream VM that baked the previous release-trust pin must be rebuilt too - they otherwise reject signatures withBadSignature.- VLAN mismatch is silent. A typo in
--vlanacross VMs produces unresolvable hostnames with no obvious error message. All VMs in a fleet must use the same VLAN port number. - First CI run is slow. When running the demo pattern (forge VM hosts a CI runner), the first push compiles the workspace from source - 20-45 minutes typical. Subsequent pushes are 2-5 minutes once the store is primed.
Testing your fleet config
NixFleet ships a validate test runner that gates every level of verification - format, flake check, host eval, system builds, Rust unit + integration tests, and VM-harness scenarios. Run it before every push to the fleet repo.
nix run .#validate # fast: format + flake check + eval + host builds
nix run .#validate -- --rust # + cargo nextest + clippy + nix-sandbox builds
nix run .#validate -- --vm # + every fleet-harness-* scenario
nix run .#validate -- --all # everything
Modes
| Mode | What it runs | Typical time |
|---|---|---|
| (default) | nix fmt --check, nix flake check, every nixosConfigurations.* host eval, every host’s system.build.toplevel build | seconds to a few minutes |
--rust | adds cargo nextest run, cargo clippy --all-targets --all-features, all crates/*/tests/ integration tests, nix-sandbox builds | several minutes |
--vm | adds every fleet-harness-* scenario (smoke, signed-roundtrip, auditor-chain, deadline-expiry, stale-target, boot-recovery, rollback-policy, concurrent-checkin, enroll-replay, …) | tens of minutes |
--all | everything above | longest |
When to use which mode
- Before every
git push: default mode. It’s what CI runs anyway; running it locally first surfaces issues before they hit a runner. - Before opening a PR that touches Rust crates:
--rust. Clippy + nextest catch the regressions humans never spot in review. - When reproducing a fleet-harness regression:
--vm. The harness scenarios are deterministic; running locally lets youjournalctlthe VM during the failing flow. - Before tagging a release:
--all. Cold-cache cost; once per release is reasonable.
Scenario catalogue
For the list of fleet-harness-* scenarios and what each one exercises, see reference/harness. The scenarios are intentionally narrow - each one isolates a single property of the signed-GitOps loop, the rollback machinery, or the reconciler.
Interaction with CI
CI runs the default mode on every push and --rust + --vm on every release tag. Local validate and CI validate execute the same code path, so a green local run is a strong predictor of a green CI run. Drift between local and CI results is treated as a bug in the runner, not in your fleet config.
Disaster recovery - destroying the control plane
Background: see ../design/architecture.md §6 (CP-resident state by recovery profile) + §8.
Operator runbook for wiping the CP and rebuilding from signed artifacts.
Validation: fleet-harness-teardown in CI.
Pre-flight
Before destroying state, confirm:
- Signed artifacts reachable (
fleet.resolved.json+.sig, andrevocations.json+.sigif configured) from the URLs in--channel-refs-artifact-url/--revocations-artifact-url. - Build-time fallback intact (
--artifact/--signature/--trust-file). - Fleet CA available (
--fleet-ca-cert/--fleet-ca-key). Required only for/v1/enroll+/v1/agent/renew; existing agents keep working without it. - At least one agent currently online.
If any check fails, fix the prerequisite first - do not proceed.
Procedure
# 1. Stop the service.
systemctl stop nixfleet-control-plane.service
# 2. Wipe the SQLite database (leave audit.log if present).
rm -rf /var/lib/nixfleet-cp/state.db \
/var/lib/nixfleet-cp/state.db-wal \
/var/lib/nixfleet-cp/state.db-shm
# 3. Restart.
systemctl start nixfleet-control-plane.service
Do not delete
/etc/nixfleet/cp/trust.json,/etc/nixfleet/cp/fleet-ca-*.pem, or anything under/etc/nixfleet/cp/. Those are flake-provided trust roots; deleting them turns recovery from “outage” into “breach”.
CP restart reopens a fresh DB, reads trust.json, primes
verified_fleet (upstream poll first, build-time fallback), replays the
signed revocations sidecar if configured, and resumes accepting checkins.
With production 60s polling, expect full agent repopulation in 70-120s.
Verify
# CP healthy, snapshot primed (within ~30s of restart).
curl -sk https://localhost:8443/healthz | jq '.last_tick_at != null'
# Verified-fleet snapshot is fresh (mTLS - substitute your operator pair).
curl -sk \
--cacert /etc/nixfleet/cp/ca.pem \
--cert <CLIENT_CERT_PEM> --key <CLIENT_KEY_PEM> \
https://localhost:8443/v1/channels/stable | jq '.signed_at'
# Revocations sidecar replayed (when configured).
journalctl -u nixfleet-control-plane.service --since='5 min ago' \
| grep -E 'revocations poll|cert_revocations'
# Every expected agent has checked in.
journalctl -u nixfleet-control-plane.service --since='5 min ago' \
| grep 'checkin received' | awk '{print $NF}' | sort -u
All four pass -> recovery is complete.
When this fails
- CP refuses to start. Check the verify-fleet error:
--trust-filepermissions, corrupted build-time artifact (roll back the flake commit), or unexpected schema state on a non-empty DB (file a bug if you wiped per Step 2). - Agents don’t reconnect.
journalctl -u nixfleet-agent.serviceon the host - usually cert expiry or revocation. Re-enroll via the bootstrap-token flow. - Recovery > 10× target. Upstream-fetch issue: Forgejo down, expired
auth token, network partition.
journalctl -u nixfleet-control-plane.service | grep channel-refs.
Troubleshooting
Known failure modes from real-hardware testing. Each entry: symptom -> cause -> fix.
CP service flaps after deploy
Symptom: systemctl status nixfleet-control-plane.service shows failed with code=killed, signal=TERM. PID changes every 10s.
Cause: agenix entry references a file that doesn’t exist in the secrets flake input - usually because the agenix recipient list doesn’t include cp for a secret CP needs.
Fix: Run journalctl -u nixfleet-control-plane.service | grep agenix. Look for failed to open input file or no identity matched any of the recipients. Add cp to the recipient list in your secrets repo, re-encrypt, push, bump fleet’s secrets lock, redeploy.
SQLite migration error on CP boot
Symptom: CP fails to start with applied migration V1__initial_schema is different than filesystem one V1__initial.
Cause: The DB was initialised by a previous (v0.1) version of the CP that used a different migration filename. Refinery refuses to apply when names diverge.
Fix: Wipe the DB; migrations re-apply from scratch. Safe because pending_confirms/token_replay/cert_revocations are not load-bearing across upgrades.
ssh root@cp "systemctl stop nixfleet-control-plane.service && \
rm -f /var/lib/nixfleet-cp/state.db /var/lib/nixfleet-cp/state.db-wal \
/var/lib/nixfleet-cp/state.db-shm && \
systemctl start nixfleet-control-plane.service"
Forgejo poll fails with TLS handshake error
Symptom: journalctl -u nixfleet-control-plane.service | grep forgejo shows forgejo poll failed; retaining previous cache.
Cause: Old behaviour - CP’s reqwest client used webpki-roots only, which doesn’t include the Caddy local CA. The reqwest build now uses the rustls-tls-native-roots feature.
Fix: Bump fleet’s nixfleet input past the fix, redeploy cp.
Verify fails with BadSignature even though the trust public key matches the TPM
Symptom: verify_ok: false reason: BadSignature on every reconcile tick. Manual signature check confirms the key matches.
Cause: Old behaviour - verifier rejected high-s ECDSA signatures (Bitcoin-style strict-low-s). TPM2_Sign emits high-s ~50% of the time. The verifier now normalises high-s before verifying.
Fix: Bump fleet’s nixfleet input past the fix. The verifier now normalises high-s before verifying.
Agent fails activation with unrecognized arguments: --system
Symptom: Agent log shows nixos-rebuild: error: unrecognized arguments: --system /nix/store/.... Activation halts; no rollback fires.
Cause: Old behaviour - agent shelled out to nixos-rebuild switch --system <path>. NixOS 26.05’s nixos-rebuild-ng (Python rewrite) renamed the flag to --store-path. The agent now calls nix-env --profile ... --set + <path>/bin/switch-to-configuration switch directly, bypassing nixos-rebuild’s CLI surface entirely.
Fix: Bump fleet’s nixfleet input past the fix, redeploy each host.
Rollback timer never marks expired rows
Symptom: pending_confirms rows stay pending indefinitely past their confirm_deadline.
Cause: Old behaviour - query did WHERE confirm_deadline < datetime('now'). Stored values are RFC3339 (T-separator) but datetime('now') returns space-separated. Lex compare put T (0x54) above space (0x20), so deadlines always looked greater than now. The fix wraps the column in datetime(...) to normalise.
Fix: Bump fleet’s nixfleet input past the fix, redeploy cp.
CP stair-steps backwards through deploy history
Symptom: After deploying a new fleet rev, cp dispatches itself to OLDER closures on every CP restart. Each restart steps backwards.
Cause: Old behaviour - CP primed verified_fleet from the compile-time --artifact path, which is always the previous CI release (the [skip ci] release commit lands AFTER the closure is built). Each closure restart re-primed from a one-step-older artifact. The CP now does a synchronous Forgejo prime BEFORE opening the listener; per-tick re-verify is gated on signed_at freshness.
Fix: Bump fleet’s nixfleet input past the fix, redeploy cp once. After that the cascade is permanently broken.
Agent re-dispatches the same target every checkin (ghost loop)
Symptom: DB shows the same (hostname, rollout_id, target_closure_hash) confirmed every 60s. Activation appears successful but never settles.
Cause: Old behaviour - agent’s closure_hash_from_path stripped after the first -, returning just the 32-char hash. CP declares the FULL basename. String comparison never equal -> Decision::Dispatch every checkin. The fix returns the full basename.
Fix: Bump fleet’s nixfleet input past the fix, redeploy each host.
CP’s current closure ≠ artifact’s declared, even when cp is on the latest deploy
Symptom: CP’s current is XXXXXXX-nixos-system-cp-...0810_5176864f_turbo-otter, artifact says YYYYYYY-..._5176864f_turbo-otter. Same nixfleet rev suffix but different store hashes.
Cause: The fleet flake references inputs.self/releases/fleet.resolved.json for the CP’s artifact path. When CI runs, it builds the closure BEFORE committing the new release artifact. An operator workstation may build AFTER, with the new artifact in the source tree. Different inputs.self -> different closure hash.
Fix: One activation cycle naturally converges (cp activates to the artifact-declared closure, which then matches on the next checkin). Not a bug; an artifact of the self-referential design. Tracked but not actively fixed - decoupling the artifact path from inputs.self is a possible future change.
RFCs
Authoritative design documents for the v0.2+ contract. Each RFC owns one boundary; together they define what is load-bearing across releases.
| RFC | Topic | Status |
|---|---|---|
| RFC-0001 | Declarative fleet topology (mkFleet, selectors, rollouts) | Accepted |
| RFC-0002 | Reconciler decision procedure | Accepted |
| RFC-0003 | Agent / control-plane wire protocol | Accepted |
| RFC-0004 | Architectural-pattern checklist (lift discipline) | Descriptive |
| RFC-0005 | Event-driven host-rollout state machine | Accepted |
| RFC-0006 | Control-plane functional core / imperative shell | Accepted |
| RFC-0007 | Multi-scope health probes + compliance shorthand | Accepted |
| RFC-0008 | Rollout-level state machine + derived-view discipline | Accepted |
| RFC-0009 | Hardware-rooted trust (TPM, attestation) | v0.3 target |
| RFC-0010 | Trust lifecycle (operator roles, rotation) | v0.3 target |
| RFC-0011 | Freshness-window policy | v0.3 target |
| RFC-0012 | Air-gapped operation (signed bundles) | v0.3 target |
The RFC pages above are mdbook wrappers that include the canonical sources from the repo’s docs/rfcs/ tree.
RFC-0001: Declarative fleet topology (fleet.nix)
Status. Accepted.
Scope. Schema and evaluation contract for the fleet flake output. Does not cover reconciliation semantics (RFC-0002) or activation (RFC-0003).
1. Motivation
Every seam in nixfleet today routes around a missing object: “the fleet as declared”. The control plane has desired state in SQLite; the CLI has flags; the operator has intent in their head. None of these are git-tracked, reviewable, or composable. Before any of the downstream spine work can land, we need one thing: a pure, evaluable Nix value representing the fleet. Everything downstream consumes it.
Design goals, in order:
- Pure.
nix eval .#fleetreturns the full value with no IO, no network, no control-plane call. - Self-contained. No cross-referencing outside the flake - hosts, tags, policies all resolved at eval time.
- Typed. Module system with option types; misuse fails at
nix flake check. - Composable. A
fleetis a value; multiple flakes can merge fleets (for org-wide super-fleets). - Minimal. Schema covers what’s needed for RFC-0002 / RFC-0003 / RFC-0004; resists feature creep.
2. Schema
# flake.nix
outputs = { self, nixpkgs, nixfleet, ... }: {
fleet = nixfleet.lib.mkFleet {
# ------------------------------------------------------------
# 2.1 Hosts - the atomic unit.
# ------------------------------------------------------------
hosts.attic-01 = {
system = "x86_64-linux";
configuration = self.nixosConfigurations.attic-01;
tags = [ "homelab" "always-on" "eu-fr" "server" ];
channel = "stable";
};
hosts.rpi-sensor-01 = {
system = "aarch64-linux";
configuration = self.nixosConfigurations.rpi-sensor-01;
tags = [ "edge" "eu-fr" ];
channel = "edge-slow";
};
# ------------------------------------------------------------
# 2.2 Tags - logical groupings, purely descriptive.
# Tags have no hierarchy; use as many as needed per host.
# ------------------------------------------------------------
tags = {
homelab.description = "Manuel's personal fleet.";
"always-on".description = "Expected to be reachable 24/7.";
"eu-fr".description = "Hosted in France; ANSSI policies apply.";
};
# ------------------------------------------------------------
# 2.3 Channels - release trains.
# Pinned to a git ref at reconcile time (see RFC-0003).
# ------------------------------------------------------------
channels.stable = {
description = "Main production channel.";
rolloutPolicy = "canary-conservative";
signingIntervalMinutes = 60; # default; listed for clarity
freshnessWindow = 1440; # 24h in minutes; REQUIRED, no default
# - invariant: ≥ 2 × signingIntervalMinutes
compliance = {
mode = "enforce"; # per-channel default for evidence probes;
# per-probe mode (RFC-0007 §3.3) overrides
frameworks = [ "anssi-bp028" ];
};
};
channels.edge-slow = {
description = "Battery-powered edge nodes; weekly reconcile.";
rolloutPolicy = "all-at-once";
reconcileIntervalMinutes = 10080; # 7 days in minutes
signingIntervalMinutes = 60;
freshnessWindow = 20160; # 2 weeks in minutes
};
# ------------------------------------------------------------
# 2.4 Rollout policies - named, reusable.
# ------------------------------------------------------------
rolloutPolicies.canary-conservative = {
strategy = "canary";
waves = [
{ selector = { tags = [ "canary" ]; }; soakMinutes = 30; }
{ selector = { tagsAny = [ "non-critical" ]; }; soakMinutes = 60; }
{ selector = { all = true; }; soakMinutes = 0; }
];
healthGate = {
systemdFailedUnits.max = 0;
complianceProbes.required = true;
};
onHealthFailure = "rollback-and-halt";
};
rolloutPolicies.all-at-once = {
strategy = "all-at-once";
healthGate.systemdFailedUnits.max = 0;
};
# ------------------------------------------------------------
# 2.5 Edges - ordering constraints across hosts (within a rollout).
# ------------------------------------------------------------
edges = [
{ after = "db-primary"; before = "app-*"; reason = "schema migrations"; }
];
# ------------------------------------------------------------
# 2.6 Channel edges - ordering across channels (across rollouts).
# `before` channel must converge before any new rollout opens on
# `after`. Edge predecessors with no rollout history are open
# (proceed); halted predecessors block until the operator
# resolves them or removes the edge.
# ------------------------------------------------------------
channelEdges = [
{ before = "edge"; after = "stable"; reason = "coordinator canaries first"; }
];
# ------------------------------------------------------------
# 2.7 Disruption budgets - max in-flight per selector. Tag-driven
# at the wire level: each budget carries its `selector` (operator
# intent) and is resolved into a concrete host list at OpenRollout
# time, snapshotted into the rollout manifest. Mid-rollout retags
# affect future rollouts only - a rollout's topology is immutable
# for its life. Cross-rollout fleet-wide enforcement survives the
# snapshot model: in-flight summing matches by selector identity
# across all active rollouts' snapshots.
# ------------------------------------------------------------
disruptionBudgets = [
{ selector = { tags = [ "etcd" ]; }; maxInFlight = 1; }
{ selector = { tags = [ "always-on" ]; }; maxInFlightPct = 50; }
];
};
};
The following additional top-level keys exist; they’re spec’d in the RFCs that own them rather than duplicated here:
healthChecks/tags.<t>.healthChecks/hosts.<h>.healthChecks— multi-scope probe declarations (RFC-0007).compliance/tags.<t>.compliance/hosts.<h>.compliance— multi-scope compliance refinement (RFC-0007 §3.7).revocations— signed agent-cert revocation list (RFC-0003 §4.5 + RFC-0010).bootstrapNonces— durable replay-invariant allowlist for/v1/enroll(RFC-0003 §4.5).
3. Selector algebra
Used by waves, edges, and budgets. Keep it minimal - resist reinventing Kubernetes label selectors.
selector :=
| { tags = [ "a" "b" ]; } # host has ALL listed tags
| { tagsAny = [ "a" "b" ]; } # host has ANY listed tag
| { hosts = [ "attic-01" ]; } # explicit host list
| { channel = "stable"; } # all hosts on this channel
| { all = true; } # every host in the fleet
| { not = <selector>; } # negation
| { and = [ <sel> <sel> ]; } # intersection
No wildcards in host names (resolve to explicit list). No regex. Evaluates to a concrete set of hosts at flake-eval time - fully static.
4. Evaluation contract
4.1 What the control plane consumes
The control plane never evaluates Nix. It reads the resolved fleet from a single JSON artifact produced by CI:
nix eval --json .#fleet.resolved > fleet.json
fleet.resolved is a derived attribute. Two resolution policies coexist:
- Waves are pre-resolved to host lists at fleet-eval time (CI). Wave membership is signed into the artifact.
- Disruption budgets carry their
selectorthrough unchanged - resolution to host lists happens at OpenRollout time and is snapshotted into the per-rollout manifest. Thefleet.resolvedartifact records intent; the rollout manifest records the frozen topology that intent produced for that specific rollout. Mid-rollout retags affect future rollouts only.
Shape:
{
"schemaVersion": 1,
"hosts": {
"attic-01": {
"system": "x86_64-linux",
"closureHash": "sha256-...",
"tags": ["homelab", "always-on", "eu-fr", "server"],
"channel": "stable"
}
},
"channels": { "stable": { "rolloutPolicy": {...}, "compliance": {...} } },
"waves": {
"stable": [
{ "hosts": ["canary-box"], "soakMinutes": 30 },
{ "hosts": ["rpi-01", "rpi-02"], "soakMinutes": 60 },
{ "hosts": ["attic-01"], "soakMinutes": 0 }
]
},
"channelEdges": [
{ "before": "edge", "after": "stable", "reason": "coordinator canaries first" }
],
"disruptionBudgets": [
{ "selector": { "tags": ["etcd"] }, "maxInFlight": 1 }
]
}
The rollout manifest (releases/rollouts/<rolloutId>.json, signed) carries the resolved snapshot:
{
"channel": "stable",
"hostSet": [ ... ],
"disruptionBudgets": [
{
"selector": { "tags": ["etcd"] },
"hosts": ["etcd-1", "etcd-2", "etcd-3"],
"maxInFlight": 1
}
],
...
}
4.2 Invariants checked at nix flake check
- Every host’s
configurationis a validnixosConfiguration. - Every host’s
channelexists inchannels. - Every channel’s
rolloutPolicyexists inrolloutPolicies. - Every selector resolves to at least one host (warn, not fail - empty selectors are sometimes intentional).
compliance.frameworksreference known frameworks fromnixfleet-compliance.- Edges form a DAG (no cycles).
- Disruption budgets are satisfiable given fleet size (warn if
maxInFlight = 1on a 100-host budget will take forever).
4.3 Signed artifact contract
fleet.resolved.json is a trust-boundary artifact (see ../design/architecture.md §4). CI produces and signs it with the CI release key; every consumer verifies before use.
- Signing. CI writes
fleet.resolved.json+fleet.resolved.sigto the channel’s storage. The signature covers the full canonicalized JSON plus asignedAtRFC 3339 timestamp (embedded asmeta.signedAtin the artifact). - Verification - control plane. On every fetch, verifies the signature against the pinned CI release public key. Signature mismatch or unknown key -> refuse to reconcile the channel; emit an alert.
- Verification - agents (optional path). An agent that fetches
fleet.resolveddirectly (rather than receiving targets from the control plane) performs the same verification. Enables the trust-minimized bootstrap in RFC-0003 §4. - Key pinning. The CI release public key is committed to the flake (
nixfleet.trust.ciReleaseKey) and embedded in every built closure. Key rotation is a new commit + a grace window during which both keys verify. - Freshness. Downstream consumers (RFC-0003 §7) enforce
now − meta.signedAt ≤ channel.freshnessWindowto defend against stale-closure replay by a compromised control plane.freshnessWindowis declared per-channel in minutes (see §2.3); there is no implicit default and the value is part of the signed payload so a compromised control plane cannot widen it.
Canonicalization uses a stable, spec-defined encoding (JCS or deterministic CBOR - final choice tracked as an open question below) so that signatures produced by Nix evaluation are byte-identical to what verifiers reconstruct.
5. Composition
Two flakes can merge fleets:
fleet = nixfleet.lib.mergeFleets [
(import ./fleet-paris.nix)
(import ./fleet-lyon.nix)
];
Conflicts (same host name, same channel definition with different values) fail eval. Merge is associative but not commutative when policies define overrides - document the precedence (later wins).
6. What’s deliberately out of scope
- Secrets. Declared alongside, not inside, the fleet schema.
- Enrollment / host identity. A host exists in the fleet schema regardless of whether it’s enrolled. Enrollment is an orthogonal state.
- Runtime state.
fleet.resolvedis purely declarative. Observed state (which host is online, what gen is running) lives in the control plane only. - Dynamic host sets. No “autoscaling” — every host is named in the flake. If you need dynamic, generate the flake from a higher-level tool.
RFC-0002: Rollout execution engine
Status. Accepted.
Depends on. RFC-0001 (fleet.nix schema), magic rollback, compliance gates.
Scope. The decision procedure that turns fleet.resolved + observed fleet state into wave-by-wave reconciliation actions. Does not cover how actions reach hosts — that’s RFC-0003. The per-host state machine is RFC-0005; the rollout-level state machine is RFC-0008.
1. Motivation
Once a fleet is declaratively resolved (RFC-0001), something has to decide: “given this desired state and what I see on the ground right now, what do I do next?” That’s the reconciler. It must be deterministic, idempotent, observable, and provably safe under partial-visibility - hosts go offline, agents crash mid-activation, compliance probes fail, network partitions happen.
This RFC specifies its state machines and decision procedure. Implementation language is incidental (today: Rust on the control plane).
2. Inputs & outputs
Inputs, read each reconcile tick:
fleet.resolved- the desired state JSON from RFC-0001. Signature-verified against the pinned CI release key (RFC-0001 §4.3) before any field is read. A failed verification aborts the tick and raises an alert; the previously-verifiedfleet.resolvedstays authoritative. Signatures that verify but predatechannel.freshnessWindow(minutes, per-channel; RFC-0001 §2.3) are likewise rejected, preventing a compromised control plane from replaying old intent.channel refs- current git ref per channel (per RFC-0003).observed state- per-host {current generation hash, last check-in timestamp, last reported health, last compliance probe result, current rollout membership}.rollout history- active and recently completed rollouts with their state.
Outputs, emitted per reconcile tick:
- Zero or more intent updates per host: “host X, target generation Y, within rollout R, wave W”.
- Zero or more rollout state transitions: “rollout R wave W -> Soaking”, “rollout R -> Halted”.
- Zero or more events for observability: decisions, skips, waits, with structured reasoning.
The reconciler itself is stateless: all state lives in the database. A cold-started reconciler picking up an in-progress rollout converges to the same actions as the one that started it. This is essential for restarts and for future HA.
3. State machines
3.1 Rollout lifecycle
The rollout-level state machine is defined by RFC-0008 §3 (8 states: Opening → Active → Converging → Terminal, with Reverted / Failed / Superseded / Pruned for failure and end-of-life). The v0.1 outline this section used to carry (Pending → Planning → Executing → WaveActive → WaveSoaking → WavePromoted → Converged) was superseded — see RFC-0008 for the canonical shape and the per-rollout event_log projection that records it.
Transitions are only taken during reconcile ticks. There is no async callback from an agent that directly mutates rollout state — agents update observed state only; the reconciler reads observed state and decides.
3.2 Per-host rollout participation
The per-host state machine inside a rollout is defined by RFC-0005 §3 (7 states: Pending → Activating → Soaking → Converged, with Failed → Reverted on sustained probe failure and Deferred for activations that staged the profile but skipped the live switch because dbus/systemd/kernel/init can’t be hot-swapped). The v0.1 outline (Queued/Dispatched/ConfirmWindow/Healthy/Soaked) was superseded — see RFC-0005 for the canonical shape.
4. Decision procedure
On each reconcile tick (periodic: default 30s; event-triggered: on agent check-in, on git ref change, on manual nudge):
0. Fetch fleet.resolved + signature; verify signature against pinned CI
release key; reject if signature invalid OR
(now − meta.signedAt) > channel.freshnessWindow (minutes; per-channel,
no default - RFC-0001 §2.3). On rejection: abort tick, keep
last-verified snapshot, emit alert.
1. Load verified fleet.resolved, observed state, active rollouts.
2. For each channel c:
a. If channels[c].ref differs from lastRolledRef[c]:
-> open a new rollout R for channel c at ref r.
-> static compliance gate:
evaluate all type ∈ {static, both} controls against
fleet.resolved[c].hosts configurations.
If any required control fails -> R ends in Failed (blocked).
-> Else -> R.state = Planning.
3. For each rollout R in Planning:
a. Compute waves from policy.waves + selectors against current hosts.
b. R.state = Executing; first wave -> WaveActive.
4. For each rollout R in Executing:
a. For each wave W in R.currentWave:
- If W is WaveActive:
* For each host h in W with state ∈ {Queued, Dispatched} and
(h is online) and (no edge predecessor is incomplete) and
(disruption budgets permit):
-> advance h to Dispatched, emit intent for h.
* For hosts h ∈ W in ConfirmWindow:
-> if deadline passed with no phone-home -> h -> Reverted.
* For hosts h ∈ W in Healthy:
-> evaluate health gate; if fail -> h -> Failed.
* If all hosts in W are Soaked -> W -> WaveSoaking.
* If failed-host count in W exceeds policy.healthGate.maxFailures:
-> trigger policy.onHealthFailure.
- If W is WaveSoaking:
* If soak elapsed and runtime compliance probes pass for all
hosts in W -> W -> WavePromoted, advance R.currentWave.
5. Emit events for every state transition with reasoning.
6. Persist new state; commit atomically.
4.1 Edge ordering
Edges (RFC-0001 §2.5) are consulted within the current wave: a host cannot advance from Queued to Dispatched while any of its declared predecessors in the same rollout is not yet Converged. Edges across channels or across rollouts are ignored (edges are rollout-local; cross-rollout coordination is an explicit non-goal of v1).
4.2 Disruption budgets
Budgets (RFC-0001 §2.6) apply across all active rollouts simultaneously. A host counts against its budget from Dispatched through Converged. If advancing the next host would exceed maxInFlight or maxInFlightPct for any matching budget, the reconciler defers - host stays in Queued until a slot opens.
In-flight definition. A host is “in flight” for disruption-budget accounting if and only if its state is Activating or Soaking per RFC-0005 §3. Pending is NOT in flight: the host has not yet been dispatched and continues serving on the prior closure. Exit states (Converged, Failed, Reverted) are not in flight. The planner additionally counts its own QueueDispatch emissions within a single plan_next() invocation against the budget, so within-tick batches respect maxInFlight even though the underlying fleet_state snapshot does not refresh between gate evaluations.
Budget snapshots are per-rollout; identity is by selector. Each rollout’s manifest carries a frozen disruption_budgets[] snapshot - the operator’s selector resolved against fleet.hosts at OpenRollout time. The reconciler reads the snapshot, never the live fleet.disruptionBudgets[].selector. Mid-rollout retags therefore cannot reshape an in-flight rollout’s budget membership; they take effect on the next rollout. Cross-rollout fleet-wide enforcement survives the snapshot model: in-flight summing matches budgets across active rollouts by selector equality, so two rollouts that share a tags = ["etcd"] budget cap concurrent etcd disruption to the global maxInFlight.
4.3 Concurrency across channels
Two ordering primitives, in increasing strictness:
-
Disruption budgets - fleet-wide caps on in-flight count. Always active. Two channels rolling out concurrently respect the same
tags = ["etcd"]cap. -
channelEdges- DAG ordering between channels. A{ before; after }edge holds OpenRollout forafteruntilbeforehas no non-terminal rollout. This is the v0.3 punt closed: cross-channel coordination is no longer “punt to disruption budgets only”, it has its own primitive. Edge predecessors with no rollout history are open (proceed); aHaltedpredecessor blocksafteruntil the operator resolves it. The reconciler emitsAction::RolloutDeferred { channel, target_ref, blocked_by, reason }when the edge holds; emission is debounced viaObserved.last_deferralsso a still-blocked channel doesn’t pollute the journal across reconcile ticks.
Per-channel: at most one active rollout. A new ref arriving while a rollout is in progress is queued; when the current rollout reaches Converged / Halted / Cancelled, the queued ref triggers a fresh rollout. Queue depth ≤ 1 - if two new refs arrive, only the latest is retained (intermediate commits are skipped).
4.4 Rollout manifests
A RolloutManifest is the per-rollout signed plan: the frozen view of which hosts are in which wave, signed by CI at the same time it signs fleet.resolved.json. The manifest is the artifact that lets agents verify their wave assignment without trusting the CP.
Why this exists. fleet.resolved.json is the desired-state snapshot - it rolls forward continuously as new CI commits land. A rollout has a different temporal scope: its plan freezes at rollout-open and stays frozen until the rollout terminates. Without a separately-named, frozen artifact, an attacker (or buggy CP) could serve host A “you’re in wave 1” and host B “you’re in wave 3” of the same logical rollout, and neither agent could detect the inconsistency. Content-addressing the manifest closes that gap.
Producer. CI produces N+1 signed artifacts per commit where N is the number of channels in fleet.resolved.channels: the resolved snapshot itself, plus one manifest per channel. Each manifest is a deterministic projection of fleet.resolved for one channel - every input (host membership, wave layout, target closure, health gate, compliance frameworks) is already inside the signed snapshot. CI signs both with the same ciReleaseKey. The CP holds no signing key for rollouts; it is a verified stateless distributor.
Identifier. rolloutId = "{channel}@{channel_ref}", constructed via RolloutId::new(channel, channel_ref) (RFC-0008 §6.3). Two channels can share a channel_ref (the architectural point of multi-channel cascading from a single git push), so the composite is the canonical key. Verification recomputes the identifier from the parsed manifest fields and compares against the advertised value; mismatch is a hard refuse-to-act.
Anchor. The manifest carries fleetResolvedHash - sha256 of the canonical bytes of the fleet.resolved.json it was projected from. This closes a mix-and-match attack: during a key-rotation overlap window where both predecessor and successor sign valid fleet.resolved.json snapshots at the same channel ref, an attacker without the anchor could pair a manifest from snapshot X with the resolved.json from snapshot Y. The anchor makes that inconsistency provably detectable.
Adoption. When the reconciler opens a new rollout for channel c at ref r (step 2a of the reconcile loop), it loads releases/rollouts/<rolloutId>.json, verifies the signature against ciReleaseKey, recomputes the content hash, and persists (rollout_id, manifest_hash, host_set) into host_rollout_state. If the manifest is missing or fails verification, the CP refuses to open the rollout. There is no fallback path to unsigned dispatch - the inversion-of-trust property does not bend.
Distribution. Agents fetch the manifest via GET /v1/rollouts/<rolloutId> (RFC-0003 §4.6) on first sight, verify it independently against the trust roots they already hold, recompute the hash, and assert that (hostname, wave_index) ∈ manifest.host_set. Mismatch is a hard refuse-to-act with ReportEvent::ManifestMismatch. The cached manifest is the source of truth for the rollout’s lifetime - subsequent checkins re-assert that the CP-advertised rolloutId matches the cached one. A second-call manifest with the same rolloutId but different content cannot exist (the hash would differ).
Schema. Defined in nixfleet-proto::rollout_manifest. The host_set array MUST be sorted by hostname ascending; the per-budget hosts arrays in disruption_budgets MUST be sorted alphabetically. JCS sorts object keys but not array elements, so the producer’s emission order is the canonical order.
Disruption-budget snapshot. Each manifest carries disruption_budgets[] - the operator’s selectors from fleet.disruptionBudgets resolved against fleet.hosts at projection time, frozen for the rollout’s life. The reconciler reads from this snapshot rather than re-resolving live fleet.hosts.tags per tick, which is what makes mid-rollout retag safe (§4.2). Cross-rollout in-flight counting matches budgets by selector equality.
Future work. With len(host_set) in the thousands, full-roster manifests grow into the hundreds of KB. Per-host scoping (one signed object per host) trades manifest count for message size; a Merkle-inclusion proof shape trades both at the cost of a more complex verifier. Single-tenant fleets at v0.2 scale do not need either; they belong in v0.3.
5. Failure handling
5.1 onHealthFailure semantics
halt- freeze the rollout. Hosts already Converged stay on the new generation. In-flight hosts complete their current state transition naturally (no forced rollback). Operator mustnixfleet rollout {resume, cancel, rollback}.rollback-and-halt- for every host in the rollout in state ∈ {Dispatched, Activating, ConfirmWindow, Healthy, Soaked, Converged}, emit intent to revert to the previous channel rev. Rollout ends in Reverted.rollback-all(future, out of scope for v1) - as above, and continue to revert hosts from prior converged rollouts on the same channel up to N generations back. Dangerous. Explicit opt-in.
5.2 Offline hosts
A host offline when its wave begins stays Queued indefinitely. Does not block wave progression - the wave advances once all online member hosts are Soaked, and the offline host is marked Skipped. When it returns, it is dispatched with the target of whatever the current channel ref is (not necessarily the one that was rolling out when it was offline).
Rationale: a laptop closed for two weeks should not block a fleet rollout, and should wake up to the current desired state, not replay history.
5.3 Probe failure taxonomy
Runtime compliance probes distinguish three outcomes (per the compliance RFC):
passed- host advances.failed- host Failed; triggersonHealthFailure.probe-error— probe itself broken (nonzero exit, malformed output, timeout). Treated asfaileduniformly (RFC-0007 §6). Operators who want “tolerate probe errors” use per-probemode = "observe"(RFC-0007 §3.3).
6. Reconcile triggers
- Periodic. Default 30s. Tunable per-channel via
reconcileIntervalMinutes(RFC-0001 §2.3) for slow channels likeedge-slow. - Event-driven.
- Agent check-in with status delta -> reconcile tick within ≤1s.
- Git ref change (webhook or poll) -> immediate tick.
- Operator CLI command (
deploy,rollout cancel, etc.) -> immediate tick.
Debouncing: multiple events arriving within a small window (configurable, default 500ms) collapse to a single tick. Avoids thrashing under high check-in rates.
7. Observability
Every decision writes a structured event:
{
"ts": "2026-04-24T10:17:03Z",
"rollout": "stable@abc123",
"wave": 2,
"host": "attic-01",
"transition": "Queued -> Dispatched",
"reason": "edge predecessor db-primary reached Converged",
"budgets": { "etcd": "not-applicable", "always-on": "3/10 in flight" }
}
Events are queryable via CLI (nixfleet rollout events <id>) and emitted as structured logs. Every skip, every wait, every failure carries its reasoning - “why didn’t this host upgrade yet?” must always be answerable from logs alone.
RFC-0003: Agent ↔ control-plane protocol
Status. Accepted. Depends on. RFC-0001, RFC-0002, magic rollback. Scope. Wire protocol between agent and control plane. Identity, endpoints, polling, versioning, security properties. Does not cover control-plane-internal APIs. The agent-state event flow is owned by RFC-0005 (§4.1 here points at it).
1. Design goals
- Pull-only for control flow. Agents initiate every connection. Control plane never needs to reach an agent - works behind CGNAT, hotel WiFi, intermittent links.
- Stateless on the wire. Each request is self-describing. No sessions, no long-lived connections, no WebSockets in v1.
- Declarative intent, not commands. The control plane answers “what should host X be running?”, never “run this command”. Scripted execution is outside the agent’s vocabulary on purpose.
- Zero-knowledge for secrets. Secrets do not transit the control plane in plaintext. The protocol carries closure hashes and references, not secret material.
- Explicitly versioned. Every request and response carries a protocol version. Mismatches fail loudly.
2. Identity model
- Host key = SSH host ed25519 key. Machine-lifetime key already present on every NixOS host (
/etc/ssh/ssh_host_ed25519_key). Signs probe outputs (RFC-0002 §5.3), decrypts agenix secrets, anchors the agent’s cryptographic identity. Not transmitted to the control plane; only its public half is declared infleet.nix. - Agent identity = mTLS client certificate, derived from the host key. At enrollment, the agent generates the CSR using the SSH host key as the signing key; the public key in the cert is the host’s SSH public key. CN =
hostname, SANs carry declared host attributes (channel, tags - redundant with fleet.resolved, used only for sanity checking). This binding means compromising the mTLS cert and compromising the host key are the same event; short-lived certs bound the exposure of that event. - Cert issuance. Agent sends the CSR + a one-shot bootstrap token (signed by the org root key, scoped to
expectedHostname+expectedPubkeyFingerprint). Control plane verifies both, issues cert with 30-day validity. A mismatch between the CSR’s public key and the token’sexpectedPubkeyFingerprintaborts enrollment. - Cert rotation. Agent requests renewal at 50% of remaining validity. Old cert valid until expiry; overlap prevents downtime.
- Cert revocation. Control plane maintains a small revocation set (hostname -> notBefore timestamp). Agents with certs issued before
notBeforefor their hostname are rejected. Simpler than CRLs; works because cert lifetime is short. - No shared credentials. No API keys, no HMAC secrets, no bearer tokens. mTLS end to end.
3. Wire format
- Transport. HTTP/2 over TLS 1.3. mTLS mandatory.
- Body. JSON. Canonical field names, no nulls (absence means absence), timestamps RFC 3339 UTC.
- Headers.
X-Nixfleet-Protocol: 1- major version. Mismatched = 400.X-Nixfleet-Agent-Version: <semver>- informational.Content-Type: application/json.
- Why not gRPC/protobuf? Stability, debuggability, homelab introspection. Revisit if wire size becomes a problem (it won’t at fleet sizes nixfleet targets).
4. Endpoints
All endpoints rooted at https://<control-plane>/v1/.
4.1 Agent-driven event flow
RFC-0005 supersedes the v0.1 POST /agent/checkin + POST /agent/confirm + POST /agent/report triple. The wire surface is now:
POST /v1/agent/events— outbound event stream (DispatchAck, ActivationStarted, ActivationComplete/Failed/Deferred, ProbeTopologyDeclared, ProbeResult, ProbeFailureFirst, Failed, RollbackComplete, Converged). One event per POST; CP dedupes by(hostname, rollout_id, seq). See RFC-0005 §4.2 for the event vocabulary.POST /v1/agent/heartbeat— liveness + drift-detection. Minimal payload (current_closure,last_event_seq_by_rollout); never advances state. See RFC-0005 §4.3.GET /v1/agent/dispatch— agent long-polls for queuedDispatchpayloads (the only CP→agent message; rollback is agent-decided). Preserves the pull-only contract per §1 design goal. See RFC-0005 §4.1.
The agent verifies every target_closure against the signed manifest fetched via §4.4 before acting on it; no CP-advertised value is trusted directly. See RFC-0002 §4.4 for the threat model that contract closes.
4.2 GET /agent/closure/<hash>
Optional. If the host cannot reach the binary cache directly (restricted network), the control plane can proxy closures. Preference remains: agents fetch from cache, not control plane - this endpoint exists as a fallback, not a default path.
4.3 Enrollment endpoints
Out of scope for this RFC in detail. Summary:
POST /enroll- accepts bootstrap token + CSR, returns signed cert. Token is burned on use.POST /agent/renew- accepts current cert (mTLS) + CSR, returns refreshed cert.POST /agent/bootstrap-report- pre-cert reporting path for failures that prevent normal cert provisioning.
Bootstrap-nonce allowlist (durable replay invariant)
The CP refuses any /v1/enroll whose token nonce is not present in the
signed bootstrap-nonces.json artifact (declared in fleet.nix, signed
by ciReleaseKey, polled on the same cadence as revocations.json).
This closes the replay-after-DB-wipe vector: even if state.db is wiped
(rebuild, incident, disk loss), the durable replay invariant lives in
the signed fleet repo, not in CP-local state.
The allowlist entry’s expiresAt is authoritative - it may be tighter
than the token’s own claims.expires_at, but never extends past it
(the token’s own claim is checked separately). Operators can
declaratively narrow a still-unexpired token’s validity window by
reducing this value or removing the entry, without rotating the token
itself.
nixfleet-release prunes entries with expiresAt < signedAt at sign
time so the signed artifact contains only the operational set;
fleet.nix retains historical entries as a curated audit log.
See docs/operations/bootstrap-token-lifecycle.md for the operator
runbook.
Bootstrap report
Agents that fail enrollment can’t reach the mTLS-gated event endpoints (no cert yet). POST /agent/bootstrap-report exists for this case alone.
Authentication. Bound to a hostname + agent-supplied pubkey via the same bootstrap token used by POST /enroll. The token is NOT consumed — multiple bootstrap reports may fire while the operator iterates on the underlying issue. The token’s lifetime gates the window.
Allowlisted events. Only TrustError and EnrollmentFailed events are accepted on this endpoint. Anything else is 400. The allowlist enforces the path’s narrow purpose: surfacing why enrollment is broken, not generic agent telemetry.
Response. 204 No Content on accept; the CP records the event in event_log so the operator dashboard sees pre-cert failures in the same place as post-cert ones. Subsequent successful /enroll does not retroactively rewrite the bootstrap-report rows.
4.4 GET /v1/rollouts/<rolloutId>
Distributes the signed RolloutManifest (RFC-0002 §4.4) to agents. mTLS-gated like every other endpoint. The CP serves the on-disk pre-signed pair byte-for-byte; it does not re-derive, re-sign, or otherwise transform the manifest.
Path parameter. rolloutId is the canonical RFC-0008 §6.3 composite "{channel}@{channel_ref}" exactly as the CP advertised it in /agent/checkin responses. The CP route validator enforces [a-z0-9_-]+@[0-9a-f]+ to block path-traversal smuggling.
Response. Two body shapes, served via the standard HTTP Accept content-negotiation pattern:
Accept: application/json(default) returns the manifest JSON bytes.Accept: application/octet-streamreturns the raw signature bytes (<rolloutId>.sig).
Agents fetch both. Implementations MAY also expose a single endpoint that returns both bundled (e.g. application/json with the signature in a sibling X-Nixfleet-Signature header); the wire-test harness asserts both shapes round-trip identically.
Status codes.
200 OK- manifest found, body served.404 Not Found-rolloutIdis unknown to the CP (never adopted, or evicted post-rollout-completion).503 Service Unavailable- CP recently rebuilt and has not yet reloaded the rollouts directory; agent retries afternextCheckinSecs.
Idempotency + caching. Manifests are immutable by content-address: a given rolloutId always returns the same bytes, or 404 if it never existed. Agents that have already cached a manifest do NOT need to re-fetch on every checkin - string equality against the cached rolloutId is sufficient. Defensive re-fetches (e.g. on agent restart) are safe but wasteful.
No write side. There is no POST or PUT on this endpoint. Manifests are produced by CI alone; the CP holds no signing key for rollouts. Operator workflows that need to “edit a rollout plan” require a new commit (which produces a new rolloutId).
5. Polling cadence
- Default interval. 60s, controlled server-side via
nextCheckinSecsin the checkin response. - Backoff on error. Exponential with jitter, capped at the channel’s
reconcileIntervalMinutes. Network errors do not drain the confirm window -/confirmretries aggressively (up to 5×) within the window to survive transient failures. - Load shaping. Control plane can vary
nextCheckinSecsper-host to smooth thundering herds after a push (e.g. assigning each host a slot within the polling window based on a hash of its hostname). - Idle hosts. A host with no pending target polls at the channel’s idle cadence (can be much longer - weekly for
edge-slow).
6. Versioning
- Protocol major version in header. v1 -> v2 is a breaking change; running mixed versions is disallowed and fails at check-in with a clear message. Upgrade path: control plane supports N and N+1 simultaneously; operators upgrade agents, then retire control plane’s N support.
- Schema evolution within a major. Fields may be added; agents and control plane MUST ignore unknown fields. Required fields never change meaning. Removing a field requires a major bump.
- Agent version (informational). Control plane refuses agents older than its declared minimum, emits events for newer agents (may indicate staged upgrade in progress).
7. Security model
Defended against:
- Passive network observer. TLS 1.3 - sees only traffic shape.
- Active on-path attacker without a cert. mTLS fails the handshake; no data exposed.
- Compromised non-target agent. Cert only authorizes its own hostname; cannot request targets for other hosts, cannot submit reports for other hosts. Control plane enforces
cert.CN == request.hostnameon every endpoint. - Compromised control plane - closure forgery. Cannot learn secrets (zero-knowledge property). Can serve a different closure hash as target -> agent fetches from attic, verifies attic’s ed25519 signature against the pinned attic public key (docs/design/architecture.md §4), refuses unsigned or foreign-signed closures.
- Compromised control plane - stale-closure replay. A compromised CP cannot forge closures but could point hosts at an older-but-still-validly-signed closure to block security fixes. Mitigation: every check-in response references a CI-signed
fleet.resolvedrevision; the agent fetches that artifact (directly from cache or via the CP) and refuses any target whose backingfleet.resolved.meta.signedAtis older thanchannel.freshnessWindow(per-channel declaration in minutes, required, no default - RFC-0001 §2.3). The freshness window is itself inside the signed artifact, so a compromised CP cannot widen it. - Replay. Confirm requests include
bootId; the control plane rejects a confirm whosebootIddoesn’t match the expected new boot.
Not defended against (explicit):
- Compromised host (root). If the host’s TLS key is stolen, the attacker can act as that host until the cert is revoked. Mitigated by short cert lifetime + TPM-backed keys (future issue).
- Denial of service. Out of scope for this RFC. Rate limiting, fail2ban-style protections, and similar are operational concerns.
- Malicious control-plane operator. Is explicitly a trusted role (can push any generation to any host). The security boundary is between the fleet and outsiders, not between operators and hosts.
8. Offline behavior
- Agent caches the last check-in response on disk. If the control plane is unreachable, the agent continues to operate at its current generation. It does not auto-revert, does not auto-upgrade.
- Prolonged offline window. If check-in fails for longer than
channel.offlineGraceSecs(default: 7 days), the agent emits a local systemd journal warning but takes no action. Action is an operator decision. - Clock skew tolerance. All deadlines (confirm window, cert validity) carry ≥ 60s slack to absorb typical host↔CP clock drift.
RFC-0004: Architectural patterns
Status. Descriptive (not prescriptive of new code; documents the discipline RFC-0005, RFC-0006, RFC-0007, RFC-0008 converged on). Depends on. RFC-0005, RFC-0006, RFC-0007. Scope. Names the recurring pattern feature work in nixfleet evaluates itself against: lift to the general pattern that already exists rather than build bespoke shapes for each new concern. Provides the checklist new feature plans run before drafting wire types or DB schemas. Not normative. Introduces no wire types, DB schemas, or module options.
1. The observation
Several distinct decisions converged on the same shape:
| Before | After |
|---|---|
| Inference from successive checkin diffs | Explicit events through event_log (RFC-0005) |
| Scattered mutable state with RwLocks | One MPSC + one mutator per side (RFC-0006 §7) |
| Manifest schema expansion per new feature | Closure-hash chain transitively signs declarations (RFC-0007 §4) |
| Per-channel special-cased compliance flags | Per-probe mode uniform across all probe kinds (RFC-0007 §3.3) |
| Independent table populated by applier writes | Derived view with event_log_seq FK-back to canonical state (RFC-0007 §7.2, RFC-0008 §6) |
In each case the framework already paid the cost of supporting the general pattern; the bespoke alternative would have compounded the maintenance surface for no shared infrastructure benefit.
2. The pattern, named
When config-or-state is expressed in a narrower or more specific shape than a general pattern that already exists in the framework, prefer the general pattern.
This is not “always abstract to the maximum.” The principle is: the framework already pays the cost of supporting general patterns (the event-log writer task, the closure-hash signing chain, the multi-scope mkFleet resolver). A new feature that reaches for a bespoke shape also pays a cost, but doesn’t share infrastructure with anything else. The bespoke shape compounds.
Concretely, four levers exist where the general pattern is already cheap to apply:
2.1 State-mutating logic → pure reducer + applier effect
If a piece of code mutates state (per-host, per-rollout, per-channel), and the mutation has explicit transitions, model it as a pure step(state, event, now) → (state, Vec<Effect>) reducer in nixfleet-state-machine. The applier handles effects. The framework already pays for one MPSC + one mutator per side (RFC-0006 §7); the new state machine plugs in.
Counter-indication: the mutation is essentially “write this value, no transition semantics.” Then it’s a setter, not a state machine.
2.2 Per-(host|channel|rollout) config → fleet/tag/host multi-scope merge
If operators declare it and might change it more than once per quarter, declare options at nixfleet.<thing> (fleet) / nixfleet.tags.<tag>.<thing> (tag) / nixfleet.hosts.<host>.<thing> (host). mkFleet resolves with host > tag > fleet precedence. The framework already pays for this resolver (RFC-0007 §4).
Counter-indication: the config is set once at infrastructure-bootstrap (trust roots, signing keys). Then it’s per-fleet only.
2.3 Per-host signed declaration → closure-hash chain, not signed manifest
If declarations are per-host and rendered into /etc/nixfleet/agent/*.json from the host’s NixOS module, the closure hash transitively signs them. Adding a top-level signed manifest field for the same content denormalizes and grows the signing surface for no security gain (RFC-0007 §5).
Counter-indication: the content is fleet-wide or cross-host (e.g., the host_set or the channel ref). Then it belongs in the manifest payload, not in any single host’s closure.
2.4 Applier-written DB table → derived view with event_log_seq FK-back
If a table is written exclusively by the applier in response to events, structure it as a derived view: write event_log row AND derived-view row in the same transaction; carry event_log_seq as a primary-key foreign-key back to the canonical store; ensure the table is provably re-derivable from event_log if lost (RFC-0007 §7.2, RFC-0008 §6).
Counter-indication: the table is short-lookup security-critical state (token_replay, cert_revocations) with a TTL lifecycle distinct from event_log’s append-only audit. Then it’s a separate concern.
3. Evaluation checklist for new features
When writing a plan for a new feature, run these questions before drafting wire types or DB schemas:
- Does this mutate state with explicit transitions? → reducer in
nixfleet-state-machine(per §2.1) - Is this per-host declarative config operators will change? → multi-scope
nixfleet.{*,tags.*,hosts.*}options (per §2.2) - Does this need to be cryptographically signed? → check whether the closure-hash chain already covers it (per §2.3) before adding a manifest field
- Is this a table the applier writes? → derived view with
event_log_seqFK-back (per §2.4) - Does the wire need a new event variant? → fit into existing event taxonomy first; only add a new variant if the semantics don’t fold into an existing kind.
If the answer to any of 1-4 is yes and you find yourself reaching for the bespoke alternative, you’re deferring the right shape.
RFC-0005: Event-driven host-rollout state
Status. Accepted. Depends on. RFC-0001 (fleet topology), RFC-0002 (reconciler), RFC-0003 (agent/CP protocol). Supersedes. Sections 4 + 5 of RFC-0003 (polling-based checkin contract) — replaced wholesale, not extended. Scope. Per-host per-rollout state machine and the wire vocabulary that drives it. Defines the explicit dispatch → ack → multi-stage report flow that replaces the inference-from-checkin model. Does not cover control-plane-internal reconciler logic, channel-level rollout opening, or signing — those stay as RFC-0002 / RFC-0003 specify them.
1. Problem statement
The pre-v0.2 protocol (RFC-0003 §4.1) is inference-driven: the agent sends a periodic checkin every ~60 s carrying its current state snapshot (currentClosureHash, pendingClosureHash, outstandingHealthFailures, probe results). The control plane reconstructs state transitions by diffing successive snapshots and stamping its own timestamps from wallclock observations.
This produces a class of bugs that share one root cause — CP guesses transitions it should be told about. Six concrete instances:
current_closure_hashlags ~60 s after rollback. Agent firesswitch-to-configurationon the prior closure, but CP doesn’t learn the host is on the prior closure until the next regular checkin. During the gap, status showsstate = Reverted, current == declared == bad SHA.- Probe gate satisfied by stale
Passresults. Agent’sProbeStateCacheis process-lifetime; activating a new closure does not reset it. CP seeshost_probes_observed = true && host_probes_passing = truefrom the previous closure’s probes and letsHealthy → Soakedfire before any probe has run against the new closure. - Sweep threshold effectively
60 s + first_checkin_lag. CP’sfirst_seenis wallclock when it noticedoutstandingHealthFailures > 0(i.e., when a checkin reporting failures arrived), not when the failure actually started. Observed: 89 s end-to-end on aHEALTH_FAILURE_THRESHOLD_SECS = 60constant. - Soak fires too eagerly with
soakMinutes = 0.Healthy → Soakedis reconcile-tick driven; with a zero soak window the transition can happen in the same tick as confirm-ack, before any probe has actually run. - Channel-edge gate over-holds. Predecessor’s no-op rollout (closure unchanged) leaves
host_statesempty inRolloutDbSnapshot.is_active_for_ordering()returnedtrueon empty untilterminal_atwas honored (c3ab9d75, v0.2 polish). The fix worked but is symptomatic — CP shouldn’t have to infer “predecessor done” from an absence. - State shapes that the schema permits but the operator cannot interpret.
rolloutState = Soaked, current != declaredis a real combination CP can produce, but it’s nonsensical operationally. The CLI papers over it with conditional labels (✗ failedvs→ reverting) — five lines of view-layer logic to mask one model defect.
All six are the same disease: transitions happen, but CP is the last to know.
2. Design goals
- Every state transition is event-driven. CP changes state on receipt of an explicit agent event, never on the diff of two checkins.
- The agent owns the timestamp of every transition. CP stores the agent’s reported
atfield, not wallclock-on-receipt. Sweep windows, soak windows, gate eligibility — all from agent-supplied timestamps. - Probe state is per-rollout, not per-process. Each
ActivationCompleteevent resets the probe cache for the new rollout. Stale results from the prior closure cannot satisfy the new rollout’s gates. - Polling becomes a fallback, not the primary channel. Long-poll for the inbound
Dispatchdirection (the only queued message — rollback is agent-decided per §2.1); explicit event reports for outbound state. The 60 s heartbeat remains as a liveness signal and a missed-event drift detector, not as the source-of-truth. - No CLI conditionals for impossible states. The new state machine forbids the shapes (
Soaked, current != declared,Failed, current != declared) that exist today only because the model is loose. - No legacy code paths. v0.2 is a fresh wire revision; the pre-v0.2 checkin-as-state-source path is deleted, not preserved. Both agent and CP ship event-driven from day one.
2.1 Trust model alignment (RFCs 0001–0007 invariants this RFC preserves)
The event-driven model does not alter the trust contract established by the prior RFCs. Specifically:
- CP holds no signing key for state events (RFC-0002 §3). Every agent event is signed by the agent’s mTLS client cert (RFC-0003 §2). CP signs nothing it emits as state. CP does hold a CA-issuance signing key for
/v1/enrolland/v1/agent/renew-cert; production deployments bind it to the TPM so CP holds only a pubkey + sign-wrapper handle. The file-backed fallback violates the spirit of the claim. See RFC-0010 §1.5.1 for the precise contract. - CP holds no trust private keys (RFC-0010 §1.5). Verification of inbound events uses the same
TrustConfigdeserialized at CP boot — no new trust roots, no new secrets. (The CA-issuance key is not aTrustConfigentry; see RFC-0010 §1.5.1 for the signer-vs-verifier distinction.) - CP is reconstructible from git + agent state (RFC-0001 §10, RFC-0010 §1.5). The
HostRolloutRecordtable (§5) is a cache, not a source of truth. On CP rebuild (loss of/var/lib/nixfleet-cp/state.db), the heartbeat drift-detection in §4.3 prompts agents to replay their event log; CP rebuilds its view from agent reports. Historical timestamps for converged rollouts are lost on rebuild — same property as today’s DB. - Inversion of trust is preserved (RFC-0002 §4, RFC-0003 §4.6). The
Dispatchevent’starget_closurefield is advisory — a convenience pointer to the canonical value in the signed manifest the agent already holds. Agents MUST verify the field againstmanifest.host_set[hostname].targetand refuse-to-act on mismatch. CP cannot redirect an agent to an unsigned closure by tampering with theDispatchpayload; mTLS protects the wire, and the manifest signature catches any substitution. Rollback decisions are made by the agent directly from the signed manifest’sonHealthFailurepolicy — CP issues noRollbackSignal. Net effect: there is exactly one signed source of truth for every action an agent takes, and CP cannot direct the agent off that source. - Pull-only control flow (RFC-0003 §1). CP never reaches an agent. The only queued message is
Dispatch(a wave-timing signal); the agent fetches it on its next long-poll to/v1/agent/dispatch. Rollback is agent-decided, so no rollback message is ever queued. The wording “CP issues” anywhere this RFC uses it is shorthand for “CP queues for agent retrieval”; no socket is opened in the CP→agent direction. - CP blast radius unchanged (RFC-0010 §1.5). CP holds no new secrets, no new trust authority. SSH access to the CP host remains equivalent to SSH access to any production NixOS box.
- Air-gap operation unaffected (RFC-0012). Events are signed payloads on a request/response wire; they ride sovereign caches the same way checkins do today. The CP-side event handler is identical online and air-gapped.
3. State machine
┌─────────────────────────────────────────┐
▼ │
┌─────────┐ ┌────────────┐ ┌───────────┐ ┌────────────┐
│ Pending │────▶│ Activating │───▶│ Soaking │───▶│ Converged │
└─────────┘ └─────┬──────┘ └───────────┘ └────────────┘
│ │ │
│ ▼ │ sustained probe fail
│ ┌──────────┐ ▼
│ │ Deferred │ ┌────────────┐
│ └──────────┘ │ Failed │───▶┌────────────┐
│ (post- └────────────┘ │ Reverted │
│ reboot └────────────┘
│ →Soaking) │
│ │ channel halt-lift on
▼ │ new declared SHA
┌───────────┐ ▼
│ Failed │ (new rollout → Pending)
└───────────┘
Seven states, no aliases:
| State | Meaning | Entered by | Exited by |
|---|---|---|---|
Pending | CP has issued a Dispatch; agent has not yet ACKed (or rollout was just opened) | Dispatch issued | DispatchAck received |
Activating | Agent acked; switch-to-configuration is firing or has fired pending confirmation | DispatchAck | ActivationComplete (→ Soaking) / ActivationFailed (→ Failed) / ActivationDeferred (→ Deferred) |
Deferred | Activation pipeline staged the profile but skipped the live switch because dbus/systemd/kernel/init cannot be hot-swapped on the running system. The host is “soft-staged” — the new generation activates on next reboot. CP’s heartbeat handler synthesises an ActivationComplete once the agent reports current_closure == target_closure post-reboot. | ActivationDeferred event | Synthesised RemoteActivationCompleted after operator reboot |
Soaking | Agent reports activation succeeded; probes have started running against the new closure; soak window has not yet elapsed | ActivationComplete (live or synthesised) | Converged event or Failed (via sweep) |
Converged | Soak elapsed, probes passing, current == declared. Terminal for ordering. | Converged event from agent | New rollout opens for this channel |
Failed | Sustained probe failure observed by the agent and reported to CP. Agent has read onHealthFailure from the signed manifest and decided autonomously what comes next. | Failed event (agent reports sustained failure) | RollbackComplete (if policy was rollback-and-halt) or operator action (if halt-only) |
Reverted | Agent has completed rollback to prior closure. Channel-level quarantine holds the bad SHA. | RollbackComplete | Channel publishes a new SHA (declared moves past quarantine) |
States explicitly removed vs. RFC-0003 / today’s enum:
Queued— collapsed intoPending.Dispatched— was a CP-side bookkeeping flag, not a host state; replaced byPendingwith adispatched_attimestamp.ConfirmWindow—Activatingcovers it;ActivationCompleteends the phase.Healthy— collapsed intoSoaking; the “Healthy means probes report but soak window not elapsed” distinction was internal bookkeeping.Soaked(separate fromConverged) — terminal state was bifurcated by rollout policy (Soakedfor canary,Convergedfor all-at-once). Now both end atConverged. Soak duration just affects when you reach it; the destination is the same.
State invariants that the schema now enforces:
Converged⇒current == declared && all_enforce_mode_probes == Pass(per RFC-0007 §3.3; observe and disabled probes do not gate). CP refuses to write this state otherwise.Reverted⇒current != declared && current == reverted_to. Same enforcement.Faileddoes NOT imply anything aboutcurrentvsdeclared— it means “we observed sustained failure on the dispatched target”; the agent may not have started rollback yet.
4. Event vocabulary
All events are signed by the agent’s mTLS client cert (already RFC-0003 §2). Event payloads are JSON, canonicalised per RFC-0003 §3. Every event carries rollout_id, hostname, and seq (monotonic per (hostname, rollout_id) pair; gaps signal lost events; out-of-order events are dropped with a warning).
rollout_id’s canonical wire format is "{channel}@{channel_ref}" (e.g., "stable@a1b2c3d4") per RFC-0008 §6.3 — the JSON examples in §4.1/§4.2/§4.3 below use "<uuid>" as a generic placeholder. CP-side validation enforces the shape via the route filter and the reducer’s RolloutId-discriminated supersession check.
4.1 Queued for agent retrieval (agent long-polls /v1/agent/dispatch)
Per RFC-0003 §1 (pull-only control flow), CP never opens a connection to an agent. The agent long-polls /v1/agent/dispatch; when CP has queued a Dispatch for that host (the only queued message — see §2.1 for why no rollback message is queued), the response carries it. Otherwise the request blocks up to the long-poll timeout (default 60 s) and returns empty.
Dispatch
CP queues this when a host is up for activation under the current rollout. The payload is advisory: target_closure is a convenience pointer to the canonical value in the signed manifest the agent already fetched per RFC-0003 §4.6. The agent MUST verify target_closure == manifest.host_set[hostname].target before acting; mismatch is a hard refuse-to-act (emit DispatchReject, do not consume any other field).
{
"kind": "Dispatch",
"rollout_id": "<uuid>",
"target_closure": "<store-path-hash>",
"channel": "stable",
"wave": 0,
"soak_due_at": "2026-05-16T01:30:00Z",
"confirm_deadline": "2026-05-16T01:33:00Z",
"issued_at": "2026-05-16T01:27:00Z",
"seq": 1
}
Agent MUST respond with DispatchAck (after manifest cross-check) before starting the switch. If target_closure is in the agent’s local quarantine (rare; CP also enforces), agent responds DispatchReject instead.
There is no RollbackSignal queued by CP. The agent reads manifest.channels[<channel>].rollout_policy.onHealthFailure directly from the signed manifest it already holds (RFC-0003 §4.6); when it self-detects sustained failure (§4.2 Failed), it acts on that policy autonomously — no CP round-trip. CP’s role is to record that this happened, not to decide it. This collapses the rollback path into a single signed source of truth (the manifest) and removes any possibility of CP/agent disagreement on whether rollback should fire.
4.2 Outbound from agent (POST /v1/agent/events)
The agent sends one POST per event. CP returns 204 No Content on success. On 4xx, agent must NOT retry (event was rejected as invalid). On 5xx / network failure, agent retries with exponential backoff and the same seq; CP deduplicates by (hostname, rollout_id, seq).
DispatchAck — Pending → Activating
{
"kind": "DispatchAck",
"rollout_id": "<uuid>",
"received_at": "2026-05-16T01:27:01Z",
"current_closure_at_dispatch": "<prior-closure-hash>",
"seq": 2
}
CP transition: Pending → Activating. CP stores current_closure_at_dispatch as the canonical rollback target (do not re-derive from /run/current-system later — agent might restart, lose state).
ActivationStarted — visibility, no transition
{
"kind": "ActivationStarted",
"rollout_id": "<uuid>",
"started_at": "2026-05-16T01:27:03Z",
"switch_method": "systemd-run-detached",
"seq": 3
}
CP records timestamp. No state change. Used for operator observability (status --rollout-history).
ActivationComplete — Activating → Soaking
{
"kind": "ActivationComplete",
"rollout_id": "<uuid>",
"completed_at": "2026-05-16T01:27:05Z",
"observed_current_closure": "<store-path-hash>",
"switch_exit_code": 0,
"seq": 4
}
CP transition: Activating → Soaking. CP also:
- Stamps
activation_completed_at = completed_at. - Sets
current_closure = observed_current_closure. - Resets the probe state for this
(hostname, rollout_id)— any priorProbeResultfor this pair is invalidated. - Records
soak_due_at(from the originalDispatch).
ActivationFailed — Activating → Failed
{
"kind": "ActivationFailed",
"rollout_id": "<uuid>",
"failed_at": "2026-05-16T01:27:05Z",
"switch_exit_code": 1,
"stderr_tail": "<...truncated stderr...>",
"seq": 4
}
CP transition: Activating → Failed. If the manifest’s onHealthFailure is rollback-and-halt, the agent immediately fires the rollback on its own — same single signed source of truth as the §4.2 Failed-via-sustained-probe-fail path — and the next event from this agent for this rollout will be RollbackComplete.
ActivationDeferred — Activating → Deferred
{
"kind": "ActivationDeferred",
"rollout_id": "<uuid>",
"component": "dbus",
"deferred_at": "2026-05-16T01:27:05Z",
"seq": 4
}
Emitted when switch-to-configuration set the profile + bootloader but refused the live switch because component (one of dbus, systemd, kernel, init) cannot be safely swapped on a running system. The host is soft-staged: the new generation activates on next reboot. CP transition: Activating → Deferred. After the operator reboots, the agent’s first heartbeat reports current_closure == target_closure and CP’s handle_heartbeat synthesises a RemoteActivationCompleted (Deferred → Soaking) so the rollout cascade resumes automatically.
ProbeTopologyDeclared — authoritative declared-probe set
{
"kind": "ProbeTopologyDeclared",
"rollout_id": "<uuid>",
"declared_at": "2026-05-16T01:27:15Z",
"probes": [
{ "name": "nginx-version", "kind": "http", "mode": "enforce" },
{ "name": "heartbeat", "kind": "http", "mode": "observe" },
{ "name": "evidence-nis2", "kind": "evidence", "mode": "enforce" }
],
"seq": 4
}
Per RFC-0007 §8.1. Emitted once per ActivationCompleted by the agent’s probe worker after re-reading /etc/nixfleet/agent/health-checks.json. CP treats the payload as the authoritative set of probes the agent has committed to running for this rollout. Required for the wave-promotion gate to distinguish “enforce probe hasn’t reported yet — hold” from “no enforce probes declared — advance.” Missing this event holds the wave with reason "awaiting probe topology".
ProbeObservedFirst — gates may now consult probes
{
"kind": "ProbeObservedFirst",
"rollout_id": "<uuid>",
"observed_at": "2026-05-16T01:27:20Z",
"probe_name": "nginx-version",
"mode": "enforce",
"seq": 5
}
CP records probe_observed_first_at for this rollout. The soak gate (was host_probes_observed) consults THIS field, not a snapshot. Agents emit one per declared probe on first run after activation. The mode field (RFC-0007 §8.2) makes the event self-describing for replay: readers don’t need to join against the topology declaration to interpret per-probe enforcement.
ProbeResult
{
"kind": "ProbeResult",
"rollout_id": "<uuid>",
"probe_name": "nginx-version",
"status": "Pass" | "Fail" | "Unknown",
"observed_at": "2026-05-16T01:27:20Z",
"failure_reason": "<optional>",
"mode": "enforce",
"sub_results": null,
"seq": 6
}
Streamed on each probe run. CP updates its per-rollout probe map. Note: Unknown is NOT reported — first result is always Pass or Fail. (Unknown is the bootstrap state before any run.)
Payload fields (RFC-0007 §7.1 + §8.2):
mode(always present): one ofenforce | observe | disabled. Self-describing for replay; redundant with the topology declaration but cheap (4 bytes) and removes a table join at gate-eval time.sub_results(Option<Vec<ProbeSubResult>>): populated forkind = "evidence"probes only;Nonefor HTTP/TCP/exec. Each entry carries{control_id, status, framework, article}so operator dashboards preserve per-control failure visibility. AggregatestatusisPassiff everysub_result.status == Pass. The applier expandssub_resultsinto per-control rows in theprobe_failuresderived view (RFC-0007 §7.2) within the same transaction that appends toevent_log.
Probe-error semantics (uniform across all kinds per RFC-0007 §6): a probe that fails to execute (nonzero exit, malformed output, timeout) reports status = "Fail". There is no probe-error-tolerance flag; operators who want “tolerate probe errors” use per-probe mode = "observe".
ProbeFailureFirst — sweep starts ticking
{
"kind": "ProbeFailureFirst",
"rollout_id": "<uuid>",
"probe_name": "nginx-version",
"first_failed_at": "2026-05-16T01:27:35Z",
"seq": 7
}
Emitted by the agent on the first Pass → Fail transition (or first-ever Fail) for any declared probe. CP stamps probe_failure_first_at from first_failed_at (agent’s timestamp). Sweep window now measured from this exact time, not from CP wallclock.
Failed — Soaking → Failed (sustained-failure self-report)
{
"kind": "Failed",
"rollout_id": "<uuid>",
"failed_at": "2026-05-16T01:28:35Z",
"sustained_duration_secs": 60,
"failing_probes": ["nginx-version"],
"policy_applied": "rollback-and-halt",
"seq": 12
}
The agent detects sustained failure (its own probe-cache crosses HEALTH_FAILURE_THRESHOLD_SECS), reads onHealthFailure from the signed manifest, and reports Failed. The policy_applied field records which manifest policy branch the agent is about to follow:
rollback-and-halt: agent immediately fires the rollback (next event will beRollbackComplete).halt-only: agent stops and stays Failed; operator action required.
CP transitions Soaking → Failed. Note: the CP-side sweep (the legacy health-sweep block in reconcile.rs) is removed; the agent is the source of truth on its own probe state AND on the rollback decision.
RollbackComplete — Failed → Reverted
Emitted by the agent after it has autonomously executed the rollback (no CP signal required). The agent reads the rollback target from its own current_closure_at_dispatch (recorded at DispatchAck time per §4.2) and fires switch-to-configuration on that closure directly.
{
"kind": "RollbackComplete",
"rollout_id": "<uuid>",
"completed_at": "2026-05-16T01:28:40Z",
"reverted_to_closure": "<prior-closure-hash>",
"switch_exit_code": 0,
"seq": 13
}
CP transition: Failed → Reverted. CP:
- Stamps
reverted_at = completed_at. - Sets
current_closure = reverted_to_closure. - Inserts the dispatched-but-bad
target_closureinto the channel’squarantined_closuresset (existing quarantine mechanism, now consuming the agent-reported event rather than CP-derived state).
Converged — Soaking → Converged
{
"kind": "Converged",
"rollout_id": "<uuid>",
"converged_at": "2026-05-16T01:30:05Z",
"current_closure": "<store-path-hash>",
"seq": 30
}
Agent emits this when:
soak_due_athas elapsed,- all declared probes are
Pass, current_closure == target_closurefrom theDispatch.
CP transitions Soaking → Converged after re-verifying the same three invariants on the server side from the recorded state. If any invariant fails, CP rejects the event (409 Conflict) and the agent retries after re-checking.
4.3 Heartbeat
POST /v1/agent/heartbeat (replaces RFC-0003’s /v1/agent/checkin). Minimal payload:
{
"hostname": "web-01",
"agent_version": "0.2.0",
"current_closure": "<store-path-hash>",
"uptime_secs": 3600,
"last_event_seq_by_rollout": {
"<rollout-id>": 14
},
"at": "2026-05-16T01:30:00Z"
}
Purpose:
- Liveness. CP marks agent reachable; missed heartbeats (3× interval) raise a
HostUnreachablealert. - Drift detection. If
current_closuredisagrees with CP’sHostRolloutRecord.current_closurefor the host’s latest rollout, events were lost. CP responds200 + X-Nixfleet-Replay-From: <seq>listing the lastseqit has per rollout; agent re-sends events withseq > X(deduplicated server-side by(hostname, rollout_id, seq)). - No state transitions. Heartbeats never advance the state machine. State only changes on receipt of a §4.2 event.
Default interval: 60 s. Adjustable per fleet.
5. HostRolloutRecord schema
Replaces the current host_dispatch_state row + host_rollout_state row + scattered timestamps. One row per (rollout_id, hostname):
#![allow(unused)]
fn main() {
pub struct HostRolloutRecord {
pub rollout_id: RolloutId,
pub hostname: String,
pub channel: String,
pub state: HostState, // 6-variant enum from §3
// Closures
pub target_closure: ClosureHash, // from Dispatch
pub current_closure_at_dispatch: Option<ClosureHash>, // from DispatchAck
pub current_closure: Option<ClosureHash>, // from ActivationComplete / RollbackComplete
pub reverted_to: Option<ClosureHash>, // = current_closure_at_dispatch
// Transition timestamps (all agent-supplied; CP never writes wallclock here)
pub dispatched_at: DateTime<Utc>, // CP-issued; CP wallclock OK here
pub dispatch_acked_at: Option<DateTime<Utc>>, // from DispatchAck
pub activation_started_at: Option<DateTime<Utc>>, // from ActivationStarted
pub activation_completed_at: Option<DateTime<Utc>>, // from ActivationComplete
pub activation_failed_at: Option<DateTime<Utc>>, // from ActivationFailed
pub probe_observed_first_at: Option<DateTime<Utc>>, // from ProbeObservedFirst
pub probe_failure_first_at: Option<DateTime<Utc>>, // from ProbeFailureFirst
pub soak_due_at: Option<DateTime<Utc>>, // computed at Dispatch issue
pub converged_at: Option<DateTime<Utc>>,
pub failed_at: Option<DateTime<Utc>>,
pub policy_applied: Option<RolloutFailurePolicy>, // from Failed event (rollback-and-halt | halt-only)
pub reverted_at: Option<DateTime<Utc>>,
// Live probe state
pub probes: HashMap<String, ProbeRecord>, // by probe_name
// Event ordering
pub last_event_seq: u64,
}
pub struct ProbeRecord {
pub status: ProbeStatus, // Pass | Fail (never Unknown post-bootstrap)
pub last_observed_at: DateTime<Utc>,
pub last_pass_at: Option<DateTime<Utc>>,
pub failure_reason: Option<String>,
}
}
6. Gate redesign (consult explicit fields)
Every gate in crates/nixfleet-reconciler/src/gates/ becomes a pure function of the explicit fields above. No inferences, no snapshot diffs:
| Gate | Today’s check | New check |
|---|---|---|
| Channel-edges | predecessor.is_active_for_ordering() (consults host_states.values().all(...) heuristic; needed terminal_at retrofit) | Predecessor channel’s rollout state == Converged for all its hosts. Direct read. |
| Soak gate (Healthy → Soaked) | host_probes_observed && host_probes_passing && soak_elapsed — observed/passing inferred from current checkin snapshot | probe_observed_first_at.is_some() && now > soak_due_at && all_enforce_mode_probes_pass (per RFC-0007 §3.3) |
| Sustained-failure sweep | CP wallclock first-noticed first_seen, threshold = 60 s | Removed from CP. Agent reports Failed event directly when its own threshold elapses (agent has true probe-failure-start timestamp). |
| Quarantine (dispatch refuses bad SHA) | Channel’s quarantined_closures table | Same. Populated by RollbackComplete handler. |
7. Wire version
v0.2 is a fresh wire revision under X-Nixfleet-Protocol: 1 (the v0.1 wire is deleted in lockstep, not preserved alongside — there is no protocol-2 to migrate to). The pre-v0.2 contract (checkin-derived state, /v1/agent/checkin, /v1/agent/report, the HostRolloutState enum’s 9 variants) is not preserved — CP rejects requests with the legacy event shapes outright. The deleted surface:
crates/nixfleet-control-plane/src/server/routes/checkin.rs(state-deriving endpoint) → deleted; replaced byroutes/events.rs+routes/heartbeat.rs.crates/nixfleet-control-plane/src/server/reconcile.rshealth_sweepblock (HEALTH_FAILURE_THRESHOLD_SECS,first_seentracker) → deleted; agent owns the sweep timer.host_dispatch_state+host_rollout_stateDB tables → replaced by a singlehost_rollout_recordstable mapping(rollout_id, hostname) → HostRolloutRecord(§5).Healthy/Soaked/Queued/Dispatched/ConfirmWindowvariants inHostRolloutState→ removed; 6 variants remain (§3).- View-layer label conditionals introduced in
2d5b92efto mask the loose state machine → removed; one canonical label per state, nocurrent/declareddisambiguation needed.
Because v0.2 ships a new fleet image end-to-end (operator runs fleet-up, agents and CP roll out together as part of the same closure), there are no in-place upgrades and no mixed-version fleets to support. Pre-v0.2 deployments migrate by running v0.2’s fleet-up against fresh hosts.
8. Operator-visible improvements
nixfleet status --rollout-history <rollout-id>— natural with explicit timestamps per transition. Renders the event log directly.- No CLI label conditionals. Pre-v0.2 had to map
Failed + current != declared = "→ reverting"in the view layer to mask schema-permitted nonsense. Under this RFC, that state shape can’t exist —Failedandcurrent != declaredalways transitions toReverted. - Bounded sweep latency. Agent’s
Failedevent arrives within one HTTP RTT of the threshold elapsing. No 60 s + checkin-lag tail. - No stale-probe gate satisfaction. Probe cache reset on
ActivationCompletemeans the soak gate cannot be misled by results from the prior closure. - Per-rollout timeline. Every transition timestamp is preserved on the
HostRolloutRecord. Operators can answer “when exactly did this rollout start soaking?” without trawling logs.
RFC-0006: v0.2 control-plane architecture
Status. Accepted.
Depends on. RFC-0001 (fleet topology), RFC-0002 (reconciler), RFC-0003 (agent/CP protocol), RFC-0005 (event-driven host-rollout state).
Supersedes. RFC-0002 §5 (the v0.1 reconciler tick loop) and the scattered side-effect organisation of crates/nixfleet-agent/src/* and the pre-v0.2 nixfleet-control-plane/src/server/reconcile.rs.
Scope. How the per-host state machine (RFC-0005 §3) and the channel-level planner (this RFC §4) are organised so they remain pure, testable, and bug-resistant. Defines the actor-pattern runtime that wraps each pure core, the effect vocabulary they emit, and the explicit responsibility split between CP and agent. Does not change wire protocol (RFC-0005), trust model (RFC-0010), or topology (RFC-0001).
1. Problem statement
The pre-v0.2 codebase organises side effects as scattered mutable state:
- Agent.
pipeline.rscallsswitch-to-configurationdirectly.health.rswrites probe results intoProbeStateCache(RwLock).compliance.rsposts compliance events through its ownReporter.dispatch/rollback.rsmutates state in response to inboundRollbackSignal. Each module is its own actor; they communicate by writing into shared mutable state with no enforced ordering. Tests need to mock the whole I/O surface to assert any single invariant. - CP.
reconcile.rsinterleaves DB writes, signature verification, action emission, and side-effect dispatch in a single 1000-line tick function. Thehealth_sweepblock consultsnow()directly. Channel-edges read DB state in one helper, planner reads DB state in another, dispatch endpoint reads DB state in a third — all three deriving subtly different views of the same underlying truth. Bugs hide in the variations (the v0.2.0c3ab9d75terminal_atfix is one example).
This produces a recurring class of defect — two readers of “the same state” disagree because they each construct their view differently from primitive DB rows. Every per-host bug we shipped in v0.2.0 polish (probe gate stale data, sweep timing drift, channel-edge over-hold) lives in this layer, not in the wire protocol or the trust model.
RFC-0005 makes the wire event-driven. This RFC makes the code event-driven, so the same bugs cannot reappear in the next layer down.
2. Design principles
- Functional core, imperative shell. Every state-affecting decision lives in a pure function:
(state, input, now) → (new_state, [effects]). Effects are descriptive data, not executions. A separate runner interprets effects against real I/O. The pure core is#![forbid(unsafe_code)]-safe by construction andproptest-friendly by structure. - One state, one mutator. Both agent and CP have exactly one task that mutates the state machine. Background workers (probes, manifest poller, long-poll listener) emit events into a single MPSC channel; the mutator consumes events serially. No locks on rollout state. No interleaved transitions.
- Effects-as-data. The state machine returns
Vec<Effect>. Effects are an enum of concrete operations (write DB row, send HTTP request, log a metric, fire a systemd unit). The runner has one match-arm per variant. Adding a new effect = adding an enum variant + an arm; the compiler tells you what you forgot. - Same code, both sides. The per-host state-machine reducer (RFC-0005 §3) is one crate (
nixfleet-state-machine) used by both the agent runtime and the CP-mirror view. Identical transition semantics on both sides by construction — the compiler enforces what tests would otherwise have to. - Explicit responsibility ledger. CP and agent each have a written list of “what I’m responsible for” (§5 below). Anything not on the list is the other side’s job. Cross-references prevent drift.
- Replay-friendly. Every decision has its inputs visible: state + event + now. An event log replayed through the reducer reproduces the exact transition sequence. Operators can answer “what would have happened if event X arrived 5 s later” without a live system.
3. Per-host state machine (nixfleet-state-machine crate)
#![allow(unused)]
fn main() {
// crates/nixfleet-state-machine/src/lib.rs
/// Pure reducer. Same signature on agent and CP-mirror sides.
/// `now` is a parameter so tests can advance time deterministically.
pub fn step(
state: HostRolloutState,
event: Event,
now: DateTime<Utc>,
policy: &RolloutPolicy, // from signed manifest
) -> Result<(HostRolloutState, Vec<Effect>), TransitionError>;
pub enum Event {
// Inputs the agent runtime synthesises from local activity:
LocalActivationStarted { closure: ClosureHash, at: DateTime<Utc> },
LocalActivationCompleted { observed: ClosureHash, exit_code: i32, at: DateTime<Utc> },
LocalActivationFailed { exit_code: i32, stderr_tail: String, at: DateTime<Utc> },
LocalProbeObserved { name: String, status: ProbeStatus, at: DateTime<Utc> },
LocalSustainedFailureCrossed { threshold_secs: u64, at: DateTime<Utc> },
LocalRollbackCompleted { reverted_to: ClosureHash, at: DateTime<Utc> },
// Inputs the agent receives from CP via long-poll:
DispatchReceived { rollout_id: RolloutId, target: ClosureHash, soak_due_at: DateTime<Utc> },
// Inputs the CP runtime synthesises from inbound agent events (mirrors LocalXXX):
RemoteDispatchAck { ... },
RemoteActivationStarted { ... },
// ... one per RFC-0005 §4.2 event
}
pub enum Effect {
// Side-effect descriptions. Agent runtime executes Local*; CP runtime executes Remote*.
LocalFireSwitch { target: ClosureHash },
LocalFireRollbackTo { closure: ClosureHash },
LocalResetProbeCache, // RFC-0005 §4.2 ActivationComplete
LocalEmitEvent { payload: AgentEvent }, // outbound to CP
RemoteQueueDispatch { host: HostId, rollout: RolloutId },
RemoteRecordTransition { from: HostRolloutState, to: HostRolloutState, at: DateTime<Utc> },
RemoteInsertQuarantine { channel: ChannelId, closure: ClosureHash },
EmitMetric { name: &'static str, labels: Vec<(&'static str, String)> },
EmitLog { level: Level, fields: HashMap<&'static str, String>, message: &'static str },
}
}
Properties enforced by the reducer’s type signature:
- Cannot mutate state outside
step. - Cannot perform I/O inside
step(noasync, noResulton Tokio types, no&mut Database). - Cannot read
nowfromchrono::Utc::now()— must be parameter. - Cannot read manifest policy from anywhere except the
policy: &RolloutPolicyargument.
Tests:
#![allow(unused)]
fn main() {
#[test]
fn soak_does_not_fire_before_first_probe() {
let s0 = HostRolloutState::activating(...);
let (s1, _) = step(s0, Event::LocalActivationCompleted { ... }, t0, &CANARY).unwrap();
let (s2, _) = step(s1, Event::AdvanceTime(t0 + soakMinutes), t0 + soakMinutes, &CANARY).unwrap();
assert_eq!(s2.state, HostState::Soaking); // NOT Converged - no probe observed yet
let (s3, _) = step(s2, Event::LocalProbeObserved { status: Pass, .. }, t1, &CANARY).unwrap();
assert_eq!(s3.state, HostState::Converged); // Now Converged
}
proptest! {
#[test]
fn no_event_sequence_violates_invariants(events in arb_event_sequence()) {
let mut state = HostRolloutState::initial();
for (event, now) in events {
if let Ok((next, _)) = step(state.clone(), event, now, &arb_policy()) {
assert!(invariant_current_matches_declared_when_converged(&next));
assert!(invariant_no_negative_soak_window(&next));
state = next;
}
}
}
}
}
Every bug from v0.2.0 polish becomes a one-line property. Regression resistance free.
4. The planner (nixfleet-reconciler crate, refactored)
CP’s reconciler split into two layers — pure planner and impure applier — exactly mirroring the per-host pattern.
4.1 Pure planner
#![allow(unused)]
fn main() {
// crates/nixfleet-reconciler/src/planner.rs
pub fn plan_next(
manifests: &SignedManifestSet, // verified, freshness-validated
fleet_state: &FleetState, // derived from event log; see §4.2
quarantines: &QuarantineSet,
now: DateTime<Utc>,
) -> Vec<PlanAction>;
pub enum PlanAction {
OpenRollout { channel: ChannelId, target_ref: ChannelRef, rollout_id: RolloutId },
QueueDispatch { host: HostId, rollout: RolloutId, target: ClosureHash, soak_due_at: DateTime<Utc> },
MarkChannelTerminal { channel: ChannelId, rollout: RolloutId },
InsertQuarantine { channel: ChannelId, closure: ClosureHash, reason: QuarantineReason },
ClearStaleQuarantine { channel: ChannelId, closure: ClosureHash },
RecordHaltLifted { channel: ChannelId },
}
pub struct FleetState {
pub host_states: HashMap<HostId, HostRolloutState>,
pub active_rollout_per_channel: HashMap<ChannelId, RolloutId>,
pub rollouts: HashMap<RolloutId, RolloutSummary>,
}
}
plan_next is pure: no now from chrono, no DB reads, no HTTP. The runner builds FleetState from the event log + DB cache before each call.
4.2 Gates as pure boolean functions
#![allow(unused)]
fn main() {
// crates/nixfleet-reconciler/src/gates/
pub fn channel_edges_block(
fleet: &FleetState,
manifests: &SignedManifestSet,
successor: &ChannelId,
) -> Option<GateBlock>;
pub fn wave_promotion_block(
fleet: &FleetState,
host: &HostId,
rollout: &RolloutId,
) -> Option<GateBlock>;
pub fn disruption_budget_block(
fleet: &FleetState,
host: &HostId,
manifest: &Manifest,
) -> Option<GateBlock>;
pub fn quarantine_block(
quarantines: &QuarantineSet,
channel: &ChannelId,
target: &ClosureHash,
) -> Option<GateBlock>;
}
Each gate consults only its inputs. No more is_active_for_ordering() heuristics — FleetState.host_states.get(predecessor_host_in_predecessor_channel).is_converged() is a direct read.
4.3 Impure applier
#![allow(unused)]
fn main() {
// crates/nixfleet-control-plane/src/runtime.rs
pub async fn apply_plan(actions: Vec<PlanAction>, deps: &CpDeps) -> Result<()>;
}
One match per PlanAction variant. Writes to DB, queues HTTP responses for the next agent long-poll, fires metrics. The compiler ensures every variant is handled.
5. CP responsibility ledger
Every responsibility CP has in v0.2. Anything not on this list is not CP’s job.
| # | Responsibility | Mechanism | Source of truth |
|---|---|---|---|
| 1 | Verify and cache signed manifests | fleet-poll task fetches from forge, runs nixfleet-verify-artifact, persists to DB | forge’s CI release-signing key (RFC-0002 §3) |
| 2 | Open rollouts on new channel refs | Planner emits OpenRollout action | signed fleet.resolved.json |
| 3 | Queue per-host Dispatch (timing/wave signal only) | Planner emits QueueDispatch; runner places it on the agent’s next long-poll response to /v1/agent/dispatch | planner output |
| 4 | Enforce wave promotion + disruption budget + channel-edges | Pure gate functions (§4.2) | FleetState from event log |
| 5 | Maintain per-channel quarantine table | Apply InsertQuarantine / ClearStaleQuarantine actions in response to RollbackComplete events from agents | aggregated agent events |
| 6 | Append-only event log | runtime::ingest_event writes signed events to DB on receipt | inbound agent POST to /v1/agent/events |
| 7 | Heartbeat liveness monitoring | heartbeat-watcher task flags hosts with missed heartbeats | host_rollout_records.last_heartbeat_at |
| 8 | Serve operator API | /v1/operator/* endpoints read from event log + cache | event log |
| 9 | Serve agent endpoints | /v1/agent/dispatch long-poll, /v1/agent/events POST, /v1/agent/heartbeat POST, /v1/manifests/<ref> GET | various |
What CP explicitly does NOT do
| # | Non-responsibility | Why |
|---|---|---|
| N1 | Decide per-host rollback | Agent reads onHealthFailure policy from the signed manifest directly (RFC-0005 §2.1, §4.1) |
| N2 | Run the sustained-failure sweep | Agent times its own HEALTH_FAILURE_THRESHOLD_SECS from its own probe-cache (RFC-0005 §6) |
| N3 | Infer host state from checkin diffs | Agent emits explicit events (RFC-0005 §4.2) |
| N4 | Reset probe state on activation | Agent does it locally (LocalResetProbeCache effect) |
| N5 | Hold trust private keys | RFC-0010 §1.5 — exception: the CA-issuance key, TPM-bound in production per RFC-0010 §1.5.1 |
| N6 | Sign anything | RFC-0002 §3 — CP is a verified stateless distributor for manifests and events. Amendment (2026-05-17): CP does sign agent mTLS certs at /v1/enroll and /v1/agent/renew-cert; the signing material is TPM-resident in production (RFC-0010 §1.5.1). The “stateless distributor” claim continues to hold for the manifest + event pipeline. |
| N7 | Bootstrap-token authority remains outside CP (org-root threshold per RFC-0010). Agent cert generation now happens inside CP at enroll/renew time per RFC-0010 §1.5.1; the signing key is TPM-bound in production. | |
| N8 | Decide what closure a host should run | Signed manifest specifies it; agent verifies it; CP just routes |
The list is short by design. When in doubt: “could an agent figure this out from the signed manifest plus its own local state?” If yes, agent does it.
Amendment note (2026-05-17). N5/N6/N7 above were rewritten to reflect the CA-issuance signing path that landed in
feat(cp,trust): cert issuance(commit4808d4dc). The architectural intent — production-grade deployments hold no in-memory signing material — is preserved via the TPM-backedCaSignerbackend; the file-backed backend is a dev convenience. RFC-0010 §1.5.1 is the canonical statement; an additive--strictenforcement closes the operator-facing gap. Adjacent claims in RFC-0005 §2.1 are amended in the same pass.
6. Agent responsibility ledger
| # | Responsibility | Mechanism | Source of truth |
|---|---|---|---|
| 1 | Verify signed manifest, hold canonical declared state | nixfleet-verify-artifact on every fetch (RFC-0003 §4.6) | manifest signature against nixfleet.trust.ciReleaseKey |
| 2 | Run health probes against /version, etc. | probe-worker background tasks | per-probe HTTP / TCP / exec result |
| 3 | Detect sustained probe failure | Local timer in the state machine reducer | LocalProbeObserved events |
| 4 | Decide rollback (from manifest policy) | Reducer reads policy: &RolloutPolicy arg, transitions to RollbackComplete automatically when onHealthFailure = "rollback-and-halt" | signed manifest |
| 5 | Execute switch-to-configuration (forward + rollback) | LocalFireSwitch / LocalFireRollbackTo effects → systemd-run | reducer-emitted effect |
| 6 | Emit signed events to CP | LocalEmitEvent effect → HTTP POST /v1/agent/events | reducer output |
| 7 | Persist outbound event queue across restarts | Durable on-disk queue (§7.2 of RFC-0005) | local fs |
| 8 | Heartbeat | heartbeat-worker task posts /v1/agent/heartbeat every 60 s | local clock + reducer’s last_event_seq |
| 9 | Reconcile on boot | Read /run/current-system, emit synthetic ActivationComplete if needed (RFC-0005 §9.5) | local filesystem |
7. Runtime topology
7.1 Agent runtime
┌────────────────────────────────────────────────┐
│ Agent process │
│ │
│ [probe-worker]──┐ │
│ [activation-w]──┼─►mpsc::Sender<Event>───┐ │
│ [longpoll-w]────┤ │ │
│ [heartbeat-w]───┘ ▼ │
│ ┌──────────────┐ │
│ │ Reducer │ │
│ │ loop │ │
│ ┌─────────────────────────►│ step() ────►│ │
│ │ └──────┬───────┘ │
│ │ │ │
│ │ effects: LocalFireSwitch, │ Vec<Eff> │
│ │ LocalEmitEvent, ... ▼ │
│ │ ┌──────────────┐ │
│ │ │ Runner │ │
│ └──────────────────────────│ apply() │ │
│ └──────────────┘ │
│ │
└────────────────────────────────────────────────┘
- One MPSC channel feeds the reducer loop.
- The reducer is single-task, single-threaded. No locks needed on state.
- Workers can run on any tokio threads; only the reducer is serial.
7.2 CP runtime
Same shape, different effect handlers:
┌────────────────────────────────────────────────┐
│ CP process │
│ │
│ [manifest-poll]──┐ │
│ [event-ingest]───┼─►mpsc::Sender<CpEvent>──┐ │
│ [heartbeat-rx]───┤ │ │
│ [op-api]─────────┘ ▼ │
│ ┌──────────────┐ │
│ │ Reducer │ │
│ │ loop │ │
│ ┌─────────────────────────►│ plan_next() │ │
│ │ └──────┬───────┘ │
│ │ │ │
│ │ actions: QueueDispatch, │ Vec<Act> │
│ │ InsertQuarantine, ... ▼ │
│ │ ┌──────────────┐ │
│ │ │ Applier │ │
│ └──────────────────────────│ apply() │ │
│ └──────────────┘ │
│ │
└────────────────────────────────────────────────┘
The CP reducer additionally maintains a mirror of per-host state, derived by running each inbound AgentEvent through the same nixfleet-state-machine::step() function the agent uses. Two consequences:
- CP and agent cannot disagree about a host’s state for any given event sequence — same code, same input.
- Tests that prove the per-host machine correct (proptest) automatically prove the CP mirror correct.
8. Crate layout (target end-state)
crates/
nixfleet-state-machine/ [NEW] Pure per-host reducer (RFC-0005 §3)
src/
lib.rs step(), state types, effect enum
transitions/ one module per state transition group
tests/ proptest invariants
nixfleet-reconciler/ [REFACTOR] Pure planner + gates
src/
planner.rs plan_next()
gates/ one module per gate (channel-edges, waves, budgets, quarantine)
tests/
nixfleet-control-plane/ [REFACTOR] Impure runner + DB + HTTP
src/
runtime/
mod.rs single-task reducer loop
applier.rs interprets PlanAction + Effect
workers/ manifest-poll, event-ingest, heartbeat-rx, op-api
server/ /v1/* HTTP routes
db/ cache (HostRolloutRecord, event log, quarantines)
nixfleet-agent/ [REFACTOR] Impure runner + workers + activation
src/
runtime/
mod.rs single-task reducer loop
applier.rs interprets Effect (LocalFireSwitch, LocalEmitEvent, ...)
workers/ probe, activation, longpoll, heartbeat
activation/ switch-to-configuration glue (called from applier)
compliance/ compliance probe runner (emits events)
enrollment.rs unchanged
nixfleet-proto/ [UNCHANGED + new event types] wire schemas (RFC-0005 §4)
nixfleet-verify-artifact/ [UNCHANGED] manifest signature verification
Boundaries:
nixfleet-state-machinedepends only onnixfleet-proto. No tokio, no reqwest, no rusqlite.nixfleet-reconcilerdepends only onnixfleet-state-machine+nixfleet-proto. Same purity restrictions.nixfleet-control-planeandnixfleet-agentare the only crates with I/O. Both consume the pure crates above.
Compile-time enforcement: nothing in nixfleet-state-machine’s Cargo.toml declares tokio/reqwest/rusqlite as a dependency. The boundary is mechanical.
9. Effect vocabulary
Definitive list — both agent and CP runtimes match on this enum:
#![allow(unused)]
fn main() {
pub enum Effect {
// Agent-only effects (CP runner returns Error if it sees these):
LocalFireSwitch { target: ClosureHash },
LocalFireRollbackTo { closure: ClosureHash },
LocalResetProbeCache,
LocalEmitEvent { payload: AgentEvent, durable: bool }, // durable=true persists to disk first
// CP-only effects:
RemoteQueueDispatch { host: HostId, payload: Dispatch },
RemoteInsertQuarantine { channel: ChannelId, closure: ClosureHash },
RemoteClearStaleQuarantine { channel: ChannelId, closure: ClosureHash },
RemoteOpenRolloutRecord { rollout: RolloutId, channel: ChannelId, target_ref: ChannelRef },
RemoteAppendEventLog { event: AgentEvent, host: HostId },
// Shared effects (both runners handle):
RecordTransition { host: HostId, rollout: RolloutId, from: HostState, to: HostState, at: DateTime<Utc> },
EmitMetric { name: &'static str, labels: Vec<(&'static str, String)>, value: f64 },
EmitLog { level: Level, target: &'static str, message: &'static str, fields: HashMap<&'static str, String> },
}
}
Symmetric design: 4 local-only, 5 remote-only, 3 shared. The reducer knows from context which set it can emit; the runner has compile-time assurance every variant is handled.
RFC-0007: Multi-scope health probes
Status. Accepted.
Depends on. RFC-0001 (fleet topology + mkFleet API), RFC-0002 (reconciler + signed manifests), RFC-0005 (event-driven probe state), RFC-0006 (runtime architecture).
Supersedes. Sections of RFC-0002 referencing HealthGate.compliance_probes.required and Channel.compliance.{mode,strict} as channel-level booleans/enums; those fields are removed in favour of per-probe mode. Also supersedes the host_reports-as-canonical-storage references in RFC-0003, RFC-0009 (attestation events), RFC-0010 (HostUnquarantined), and RFC-0011 (StaleTargetRejected). Those event kinds land in event_log per RFC-0005 §4.3; gate-relevant subsets land additionally in the probe_failures derived view per §7.2.
Scope. The declarative shape operators use to define health probes across fleet, tag, and host scopes; the per-probe mode field that replaces the channel-level enforcement flag; the relationship between probe topology and the closure-hash signing chain. Does not cover runtime execution mechanics (those live in RFC-0005 + RFC-0006) or the agent-internal probe-runner pipeline.
1. Problem statement
Pre-v0.2 nixfleet had three places where probe-or-probe-like declarations lived, with three different shapes, lifecycles, and owners:
| Site | Scope | Shape | Who declares it |
|---|---|---|---|
services.nixfleet-agent.healthChecks (NixOS module) | per-host | { http, tcp, exec } declarations | host author |
fleet_resolved::HealthGate.compliance_probes.required: bool (signed manifest, channel-level) | per-channel | gate-enforcement flag | fleet author |
crates/nixfleet-agent/src/compliance.rs (deleted) | per-host implicit | code-driven collector wiring | nobody, implicit |
The split has two operational consequences:
- Compliance gating was a special case.
compliance_probes.requiredwas its own concept disconnected from the regular probe machinery. Two parallel paths existed in the agent (regular health probes viahealth.rs+ compliance viacompliance.rs), with two parallel paths in the CP gate logic. - There is no way to declare a probe for “every host in a tag” or “every host in the fleet.” Operators copy-paste the same probe declaration into every host’s NixOS module, or add it as a scope. Both work, but neither expresses the intent “this probe applies to every web-tagged host” directly.
This RFC unifies the three sites and adds explicit scope-level declarations.
2. Design principles
-
Per-host operationally; multi-scope declaratively. Probes execute on the host they target (typically against
localhostservices). Operators declare them at whichever scope expresses the intent most cleanly — fleet-wide for cross-cutting probes like heartbeat, tag-scoped for service-class probes likenginx-version, host-scoped only when truly host-specific. -
Per-probe
modereplaces channel-level enforcement. Each probe carries its ownmode:enforce(wave-promotion gate consults the result),observe(results surface inevent_logbut do not gate),disabled(declared but not run). The channel-levelcompliance_probes.required: boolfrom the manifest is removed; the same expressiveness is achieved per-probe. -
Closure-driven, not manifest-driven. Probe topology is rendered into each host’s NixOS closure by
_agent.nix; the closure hash is signed by CI as part of the standard manifest signing flow; the agent reads its effective probe set from disk. The signed manifest does not carry probe declarations directly — the closure hash transitively signs them. (See §5 for the full flow.) -
Compliance is a probe kind, not a parallel pipeline. A
kind = "evidence"probe reads the latest evidence file produced bycompliance-evidence-collector.service(existing systemd unit, operator-controlled cadence). The probe is a read-only consumer; production cadence lives with the collector unit. The wave-promotion gate consultsmode == "enforce"evidence-probe results the same way it consults any other enforce-mode probe. -
Resolution is deterministic and visible at fleet-eval time. Operator-controlled precedence with explicit collision warnings. No silent shadowing.
3. Declaration model
3.1 Four scopes
{
# Fleet-wide — applies to every host
nixfleet.healthChecks = {
heartbeat = {
kind = "http";
url = "http://localhost/health";
intervalSeconds = 30;
mode = "observe";
};
evidence-nis2 = {
kind = "evidence";
framework = "nis2-essential";
intervalSeconds = 60;
mode = "enforce";
};
};
# Tag-scoped — applies to every host carrying the tag
nixfleet.tags.web.healthChecks = {
nginx-version = {
kind = "http";
url = "http://localhost/version";
expectStatus = 200;
intervalSeconds = 15;
mode = "enforce";
};
};
# Per-host — only this host
nixfleet.hosts.lab.healthChecks = {
cache-disk-space = {
kind = "exec";
command = "/run/current-system/sw/bin/check-disk /var/lib/attic";
intervalSeconds = 300;
mode = "enforce";
};
};
}
3.2 Probe kinds
Every probe carries kind (discriminator), intervalSeconds, mode, plus kind-specific fields.
kind | Required fields | Optional fields |
|---|---|---|
http | url | expectStatus (default 200), bodyContains, timeoutSecs (default 3) |
tcp | host, port | connectTimeoutSecs (default 3) |
exec | command | expectExitCode (default 0), timeoutSecs (default 10) |
evidence | framework | evidencePath (default /var/lib/nixfleet-compliance/evidence.json) |
Validation at fleet-eval time refuses the manifest if any required field is absent, kind is unknown, or the same probe name appears twice at the same scope.
3.3 mode semantics
mode | Agent behaviour | CP gate behaviour |
|---|---|---|
enforce | Run probe at intervalSeconds; emit ProbeResult events | Wave-promotion gate consults the latest result; refuses promote on Fail (or Unknown past the grace window) |
observe | Run probe at intervalSeconds; emit ProbeResult events | Records results in event_log for operator visibility; does NOT gate |
disabled | Probe entry present in manifest but agent does not run it | Treated as absent for gate purposes |
disabled covers the temporary-suppression case (e.g., turning off a probe during incident response without removing it from fleet.nix).
3.4 No channel-level mode override
Per-probe mode is the sole source of truth for the gate decision. The pre-v0.2 Channel.compliance.mode field is removed alongside HealthGate.compliance_probes.required (see §6 manifest schema delta). What §3.5 below introduces is channel-scoped declaration, not a channel-level mode override: an operator can say “all stable-channel hosts run this evidence probe set” without per-host tagging, but each probe still carries its own mode wherever it is declared and the gate consults that mode exclusively.
All probe kinds resolve through the same multi-scope hierarchy; evidence is not a special case at any scope.
3.5 Channel scope
Channel-scoped declarations sit between tag and host in the resolution order. Operators declare probes attached to a specific channel so all hosts assigned to that channel pick them up:
{
# Channel-scoped — applies to every host whose `channel` is `stable`
nixfleet.channels.stable.healthChecks = {
evidence-nis2 = {
kind = "evidence";
framework = "nis2-essential";
intervalSeconds = 60;
mode = "enforce";
};
};
}
A host on stable resolves the same probe set it would have under fleet/tag/host scoping, plus any channel-scoped declarations. The multi-scope merge rule applies uniformly: later scopes override earlier ones on probe-name collision; collisions surface as mkFleet warnings (same shape as the existing tag-vs-fleet warnings).
Channel scope is a general declaration site for any probe kind (http, tcp, exec, evidence). Declaring an http health probe per channel is equally valid; this is not a compliance-specific affordance. The scope addresses the operator pattern “I want different probe sets on different channels without manually tagging every host,” which tag scope handled awkwardly when channel and tag groupings did not naturally align.
3.6 Compliance shorthand: capability layer vs policy layer (v0.2)
Compliance probes have a presence in two distinct layers, and conflating them is the source of most operator confusion. The layering is:
| Layer | Lives in | Surfaces | What it declares |
|---|---|---|---|
| L1 — capability | The host’s NixOS module config (services.nixfleet-compliance.*) | compliance-evidence-collector.service systemd unit, evidence-collector binary, /var/lib/nixfleet-compliance/evidence.json | Whether the host can produce evidence at all (collector unit present, controls available, host has the deps). |
| L2/L3 — policy | fleet.nix topology declarations (channels.<ch>.compliance.frameworks, plus refinements at nixfleet.compliance / tags.<t>.compliance / hosts.<h>.compliance) | evidence-<framework> probes synthesised into the host’s health-checks.json; gate decisions | Whether the agent consumes that evidence, under what mode, and with which per-control exemptions. |
The split is deliberate. L1 is a NixOS-module capability declaration — same character as enabling services.openssh or programs.zsh. L2/L3 is fleet topology — same character as channel assignments, tag membership, rollout policy. Conflating them produces two failure modes:
- Operators enable a framework at L2 without the collector unit at L1 → agent probes for missing evidence files; reports
Fail. - Operators enable the collector at L1 without declaring the framework at L2 → evidence is produced and rotting on disk; no one consumes it, no gate effect.
The framework keeps L1 and L2 deliberately separate (no auto-coupling): the NixOS module owns capability; fleet.nix owns policy; an operator opts into both explicitly.
3.7 Compliance scope hierarchy (v0.2)
The channel-scope compliance.frameworks shorthand desugars to evidence-<framework> probes synthesised into each host’s effective probe set (RFC-0007 §3.5 mechanism — the channel scope is the framework-set’s source of truth). On top of that, v0.2 adds per-framework refinement attrsets at fleet, tag, and host scope:
{
# Fleet-wide compliance refinement
nixfleet.compliance.frameworks.nis2-essential = {
mode = "observe"; # downgrade default for rollout window
reason = "Q2 audit window: observe mode while collectors stabilise";
controlOverrides."access-control" = {
mode = "enforce";
reason = "Always-enforce, even during observe window";
};
};
# Tag-scoped refinement
nixfleet.tags.audit.compliance.frameworks.nis2-essential = {
mode = "enforce"; # tag carriers go back to enforce
reason = "Audit-tagged hosts: always-enforce";
};
# Channel-scope declaration (existing, RFC-0007 §3.5)
nixfleet.channels.stable.compliance.frameworks = ["nis2-essential"];
# Per-host refinement (RFC-0007 §3.5 + v0.2 framework-level extension)
nixfleet.hosts.aether.compliance.frameworks.nis2-essential = {
mode = "disabled"; # Darwin host, no collector available
reason = "Aether is a Darwin developer host: no NixOS compliance collector";
};
}
Precedence at synthesis time (broadest → most-specific, later wins for non-null/non-empty):
fleet < tag < channel < host
with three field-level merge rules:
| Field | Merge rule |
|---|---|
mode | Most-specific non-null wins. Bare-string channel entries (frameworks = ["nis2-essential"]) contribute mode = null — i.e. they explicitly defer to a broader-scope mode if any, falling back to the channel’s compliance.mode default only when no scope declared a mode. Explicit channel-list-entry modes ({name = "nis2-essential"; mode = "enforce";}) DO contribute a definitive value at channel scope. |
reason | Most-specific non-empty wins. Annotates ProbeSubResult.override_reason for downstream audit. |
controlOverrides.<id> | Per-key deep merge: each scope’s entry for a given control ID replaces the same-keyed entry from broader scopes (host > channel > tag > fleet). |
Aether/Darwin shortcut. A mode = "disabled" at host scope produces an evidence-<framework> entry with mode = "disabled" in the host’s health-checks.json; the agent’s probe-runner worker skips disabled probes (per RFC-0007 §3.3). Closes the class of “exempt this single host from this framework without carving probe-shadow overrides under nixfleet.hosts.<h>.healthChecks.”
Silent no-op for un-enabled frameworks. Declaring a refinement at fleet/tag/host scope against a framework the channel hasn’t enabled (e.g. nixfleet.compliance.frameworks.iso27001 = { mode = "enforce"; }; when no channel includes iso27001 in its shorthand list) is a silent no-op — channel scope is the framework-set’s source of truth; broader scopes only refine. Operators who want to introduce a brand-new framework probe declare it explicitly under healthChecks (kind = “evidence”, framework = “…”), not via the compliance shorthand.
Aside — fleet.nix vs NixOS-state asymmetry. healthChecks lives wholly in fleet.nix: every probe declaration at every scope is a topology-layer artifact, transitively signed via the closure hash chain (§5). Compliance is asymmetric: the capability to produce evidence (services.nixfleet-compliance) is a NixOS-module declaration on the host, while the policy to consume it is fleet.nix topology. This is documented for the operator’s benefit — a probe declaration like nixfleet.compliance.frameworks.nis2-essential.mode = "disabled" doesn’t disable the collector; it disables the agent’s consumption. Disabling the collector itself is a NixOS-module change on the host. RFC-0004 §3 captures the broader pattern of where capability declarations belong (NixOS modules) vs where policy declarations belong (fleet.nix).
4. Resolution semantics
mkFleet computes the effective probe set for each host:
effective[host] = merge(
nixfleet.healthChecks, # fleet-wide
∪{nixfleet.tags.<tag>.healthChecks | tag ∈ host.tags}, # tag-scoped
nixfleet.channels.<host.channel>.healthChecks, # channel-scoped
nixfleet.hosts.<host>.healthChecks, # host-scoped
)
Precedence: host > channel > tag > fleet. A probe of the same name at a lower scope (lower number above means lower in the merge) wins outright - the higher-scope declaration is shadowed in full, not field-merged. This matches the precedence convention from RFC-0001 §“infra tag pin example”.
mkFleet access to host.tags: tags are already in scope of fleet-eval per RFC-0001 §3 (tag-driven scope inclusion). The resolver reuses the existing tag mechanism — no new fleet-eval graph traversal required.
Name collision policy: mkFleet emits a build-time warning when a lower-scope declaration shadows a higher-scope one, including the names of the overriding probe, the overridden scope, and the affected host. Silent shadowing is not permitted.
Validation that runs at fleet-eval time:
- Duplicate probe names within a single scope → eval error.
- Missing required field for the probe’s kind → eval error.
- Unknown
kindvalue → eval error. modevalue outsideenforce | observe | disabled→ eval error.- An
intervalSeconds <= 0→ eval error (usemode = "disabled"to suppress).
5. Closure flow (the signing chain)
fleet.nix (multi-scope declarations)
│
▼
mkFleet (fleet-eval-time resolver)
│
│ per-host effective probe set
▼
mkHost <host> # framework's existing config flow-back
│
│ injected into host's NixOS modules
▼
_agent.nix renders /etc/nixfleet/agent/health-checks.json
│
▼
host's NixOS closure # content-addressed
│
▼
manifest signs the closure hash # topology transitively signed
│
▼
agent reads /etc/nixfleet/agent/health-checks.json on activation
│
▼
probe runners execute against the effective set
The signed manifest declares each host’s closure_hash; the closure contains the rendered probe configuration; therefore the probe topology is cryptographically signed by the same key, with the same lifecycle, as the rest of the host’s configuration. No separate signing surface; no new wire path; the agent reads probes from its own disk — same place it reads everything else.
This is the same flow-back pattern mkFleet/mkHost already uses for scopes, channels, tags, pins, and compliance frameworks (RFC-0001 §3). Probes plug into the existing pattern rather than introducing a new manifest-payload type.
6. Manifest schema delta
Removed from fleet_resolved:
HealthGate.compliance_probes (dead placeholder)
Channel.compliance.mode (channel-level enforcement kill-switch)
Channel.compliance.strict (channel-level probe-error tolerance flag)
All three are replaced by per-probe mode. The wave-promotion gate’s source of truth is now the probe_failures derived view written by the applier from ProbeResult events where the probe declaration had mode = "enforce" and status = "Fail", per the projection in §7.2.
Probe-error semantics: uniform strict = true behaviour. A probe that errors (nonzero exit code, malformed output, network timeout) counts as status = "Fail" regardless of probe kind or channel. Operators who want “tolerate probe errors” use per-probe mode = "observe". Observe-mode failures (whether genuine or erroneous) surface in event_log for visibility but do not gate wave promotion. The legacy Channel.compliance.strict = false affordance collapses into this one axis.
No additions to the manifest schema. Probe declarations live in the closure (rendered from the multi-scope merge), not in the signed manifest payload.
6.1 DB schema delta (not part of the wire manifest)
Removed: host_reports table (v0.1 artifact, no producer in v0.2, schema columns dead).
Added: probe_failures table — derived view of event_log, back-referenced via event_log_seq foreign key. Schema in §7.2.
7. Compliance as a probe kind
The compliance-evidence-collector.service is a systemd unit on each host that, on its own schedule (operator-configurable via services.compliance-evidence-collector.interval), produces a signed evidence file at /var/lib/nixfleet-compliance/evidence.json.
An evidence probe declaration tells the agent to consume the latest evidence file:
nixfleet.healthChecks.evidence-nis2 = {
kind = "evidence";
framework = "nis2-essential";
intervalSeconds = 60;
mode = "enforce";
};
The probe runner:
- Stats the evidence file; if
mtimehasn’t advanced since last observation, reports the previous result without re-verification. - On new
mtime: reads the file, verifies its ed25519 signature against the host’s local SSH host key public half (loaded at agent startup, per RFC-0009 §5 — same source as the agent’s evidence-signing identity), checks the framework’s pass condition. - Emits
ProbeResult { name = "evidence-nis2", status = Pass | Fail, observed_at, sub_results }via the existing event channel (RFC-0005 §4.2).
The agent does NOT receive the verifying pubkey via probe-config JSON — cfg.host_pubkey is loaded locally at startup from the host’s own SSH key infrastructure. Probe declarations carry no key material; key plumbing stays in the existing RFC-0009 path.
The probe runner does not invoke the collector. Collector cadence and probe cadence are independent. The probe is a read-only consumer; the heavy work (evidence collection + signing) stays with the systemd unit on its operator-controlled schedule.
7.1 Per-control granularity in ProbeResult payload
ProbeResult events carry a kind-specific sub_results field. For kind = "evidence" probes, the payload preserves per-control accounting:
#![allow(unused)]
fn main() {
// in nixfleet-state-machine / OutboundAgentEvent::ProbeResult
struct ProbeResultPayload {
name: String,
status: ProbeStatus, // Pass | Fail (aggregate across controls)
observed_at: DateTime<Utc>,
mode: ProbeMode, // see §8.1 below
sub_results: Option<Vec<ProbeSubResult>>,
}
struct ProbeSubResult {
control_id: String, // e.g., "nis2.art21.a"
status: ProbeStatus, // Pass | Fail per individual control
framework: String, // e.g., "nis2-essential"
article: Option<String>, // e.g., "art.21.a"
}
}
For HTTP/TCP/exec probes, sub_results is None. For evidence probes, it carries one entry per control evaluated by the framework. The aggregate status is Pass iff every sub_result.status == Pass.
This preserves operator and auditor visibility into which controls fail on which host. Without sub_results, the gate would collapse to “host X compliance failing” and the per-control story would only be reachable via the raw evidence file. With it, /v1/deferrals and CP-side projections can report host = web-01, failing controls = [nis2.art21.a, iso27001.A.5.1] directly from event_log.
7.2 CP-side projection rebuild
The compliance_wave gate previously consumed db::reports::outstanding_compliance_events_by_rollout, a projection built over the v0.1-era host_reports table. That input pipeline has no producer in v0.2 (agent::compliance::* was removed). Under “v0.2 is a full rewrite, opt for optimal shapes,” host_reports itself is also deleted in this RFC. The v0.1 schema is suboptimal for v0.2 query patterns: signature_status is dead, report_json duplicates event_log.payload, event_id UNIQUE is redundant with event_log.seq monotonicity.
Replacement pipeline:
event_logis the sole canonical store. InboundProbeResultevents land inevent_logwithkind = 'agent_event'(RFC-0005 §4.3). Append-only audit; consumed by/v1/rollouts/{id}/eventsand replay tooling.- A new
probe_failuresderived view carries the typed denormalization the gate needs:
CREATE TABLE probe_failures (
event_log_seq INTEGER PRIMARY KEY REFERENCES event_log(seq), -- back-ref to canonical
rollout_id TEXT NOT NULL,
host_id TEXT NOT NULL,
probe_name TEXT NOT NULL,
control_id TEXT, -- NULL for non-evidence probes
framework TEXT, -- NULL for non-evidence probes
observed_at TEXT NOT NULL
);
CREATE INDEX idx_probe_failures_by_rollout_host_control
ON probe_failures(rollout_id, host_id, control_id);
- Single writer. The applier’s
RemoteAppendEventLogeffect handler, on detectingProbeResult { mode = "enforce", status = "Fail" }, writes theevent_logrow AND the per-sub_resultprobe_failuresrows in one transaction. No two-writer divergence; no shadow state. For probes without sub_results (HTTP/TCP/exec aggregate fail), oneprobe_failuresrow withcontrol_id = NULL. For evidence probes, one row per failing control. - Re-derivable from canonical.
event_log_seqas a back-reference foreign key meansprobe_failuresis provably derivable fromevent_log. If the table is ever lost (DB rebuild, schema rev), a walk overevent_logreconstructs it. Soft state, hard reference. - Gate reads
probe_failuresvia indexed(rollout_id, host_id, control_id). The projectiondb::probe_failures::outstanding_failing_enforce_probes_by_rolloutreturnsHashMap<RolloutId, HashMap<HostId, usize>>where the count isCOUNT(DISTINCT control_id)per(rollout, host). FleetStatefield rename:outstanding_compliance_eventsbecomesoutstanding_failing_enforce_probes(same shape, name reflects the new gate-input model — across enforce-mode probes generally, not specifically compliance).
Gate logic unchanged: refuse to promote a wave if any host in the wave has outstanding failures from earlier waves. Only the input pipeline changes — one canonical store (event_log), one derived view (probe_failures), one writer (the applier), one consumer (the gate).
8. Per-rollout enforce-probe-set discovery
For the wave-promotion gate to distinguish “this enforce-mode probe hasn’t reported yet — hold the wave” from “no enforce-mode probes are declared — advance,” the CP must know the set of enforce-mode probes the agent is expected to run. The agent provides this two ways (belt-and-braces):
8.1 Topology declaration on activation
After every LocalActivationCompleted (RFC-0005 §4.2), the agent’s probe worker re-reads /etc/nixfleet/agent/health-checks.json and emits one event:
#![allow(unused)]
fn main() {
Event::LocalProbeTopologyDeclared {
rollout_id: RolloutId,
probes: Vec<ProbeDecl>, // (name, kind, mode) for every declared probe
}
}
CP receives the corresponding outbound variant and writes one event_log row with kind = 'agent_event' carrying the topology. The CP-side projection knows, per (rollout_id, hostname), which probes the agent has committed to running.
Deterministic: the same closure produces the same LocalProbeTopologyDeclared payload every time. Replay-friendly: an event_log walk reconstructs the topology without needing access to the closure on disk.
8.2 Mode field on every ProbeResult
Each ProbeResult event also carries the probe’s mode (~4 bytes per event):
#![allow(unused)]
fn main() {
struct ProbeResultPayload {
// ... (see §7.1)
mode: ProbeMode, // enforce | observe | disabled
}
}
mode = "disabled" never appears in results (disabled probes don’t run); the field’s purpose is to let the CP correlate a result to the topology declaration without joining tables.
8.3 Gate-side reconciliation
For each (rollout_id, hostname), the gate has:
- The topology declaration: set of
(name, mode)the agent declared. - The stream of
ProbeResults with timestamps.
Gate logic for “this wave is safe to advance”:
- Every probe with declared
mode = "enforce"has aProbeResultwithstatus = Passandobserved_at >= activation_completed_at. - No probe with
mode = "enforce"has the most recentProbeResult.status = Fail.
Absence handling: a missing LocalProbeTopologyDeclared (e.g., agent crashed before emitting) holds the wave with reason "awaiting probe topology" until either the topology arrives or operator intervention clears it. Defensive against silent gate-bypass on agent crash.
9. Operator workflow
Adding a probe
- Cross-cutting: declare under
nixfleet.healthChecksinfleet.nix. - Service-class: declare under
nixfleet.tags.<tag>.healthChecks. - Host-specific: declare under
nixfleet.hosts.<host>.healthChecks.
nixos-rebuild/CI signs the new closure; fleet rollout dispatches it; agents activate; probes begin running. Standard push flow — no separate manifest-republish step.
Changing a probe
Edit the declaration at its current scope, push. The rebuilt closure has the new probe shape; the agent re-reads /etc/nixfleet/agent/health-checks.json on ActivationCompleted (RFC-0005 §4.2) and respawns runners with the new declarations.
Removing a probe
Delete the declaration; push. The next closure activation drops it from effective[host]; the agent’s LocalResetProbeCache effect kills the existing runner; the probe stops reporting. Any in-flight event_log rows for the probe remain (append-only audit log; RFC-0005 §4.3).
Disabling a probe temporarily
Change the probe’s mode to "disabled", push. The probe entry stays in the declaration (so the change is auditable in fleet.nix history), but the agent doesn’t run it and CP doesn’t gate on it. Re-enable by changing mode back.
Re-tagging a host
Add or remove a tag on a host. The merge changes; the host’s effective probe set changes accordingly on the next closure activation. Standard tag-membership semantics from RFC-0001.
A new fleet rollout under v0.2 picks up the new shape automatically on the next push; no manual wipe required beyond the standard fresh-DB story.
RFC-0008: Rollout-level state machine and uniform derived-view discipline
Status. Accepted.
Depends on. RFC-0005 (event-driven host-rollout state), RFC-0006 (control-plane architecture), RFC-0007 (multi-scope health probes), RFC-0004 (architectural patterns).
Supersedes. Ad-hoc rollout lifecycle bookkeeping previously held in independent-table writes of rollouts and quarantined_closures; both become derived views written by the applier in the same transaction as the canonical event_log append.
Scope. Two reinforcing changes: (1) elevate rollout lifecycle to a pure state machine in nixfleet-state-machine parallel to RFC-0005’s per-host machine; (2) make every applier-written CP DB table a derived view with event_log_seq foreign-key back to canonical state.
1. Problem statement
Two reinforcing architectural gaps surfaced during the v0.2 fold’s architectural-reviewer audit (RFC-0004 §4):
1.1 Rollout lifecycle is a state machine, but isn’t modeled as one
crates/nixfleet-control-plane/src/db/rollouts.rs carries rollout lifecycle as scattered boolean methods and SQL UPDATEs:
#![allow(unused)]
fn main() {
is_superseded(&self) -> bool
is_terminal(&self) -> bool
is_finished(&self) -> bool
record_active_rollout(&self, rollout_id, channel) -> Result<()>
supersede_status(&self, rollout_id) -> Result<Option<SupersedeStatus>>
mark_terminal(&self, rollout_id, now) -> Result<usize>
set_current_wave(&self, rollout_id, wave) -> Result<usize>
superseded_rollout_ids() -> Result<Vec<String>>
finished_rollout_ids() -> Result<Vec<String>>
prune_finished_rollouts(&self, retention_hours) -> Result<(usize, usize)>
}
States are implicit (intersections of booleans). Transitions live at applier call sites — no single function answers “what are the legal rollout transitions and what triggers them?” This is the same disease the per-host state had pre-RFC-0005 (RFC-0004 §1). No proptest invariants, no replay tooling, no audit-trail of rollout-level state changes (the event_log carries per-host events only).
1.2 Two CP tables remain shadow state, not derived views
After RFC-0007’s probe_failures introduction, the CP DB tables divide into four classes (RFC-0004 §2.4):
| Class | Tables (post-RFC-0007) |
|---|---|
| Reducer state cache | host_rollout_records |
| Canonical event log | event_log |
| Outbound queue | dispatch_queue |
Derived view (event_log_seq FK-back) | probe_failures |
| Applier-written, no FK-back (shadow state) | rollouts, quarantined_closures |
| Security-critical lookup (TTL lifecycle) | token_replay, cert_revocations (justified separate; see §6) |
The two shadow-state tables work the same way host_reports did before RFC-0007 deleted it: applier writes them, gates read them, but there is no FK-back to event_log proving derivability. If a future bug ever desynchronizes them from event_log, divergence is silent until a query surfaces it — exactly the v0.2.0-era bug class the cycle is replacing.
2. Design goals
-
Rollout lifecycle becomes a pure state machine. Same
step(state, event, now) → (state, Vec<Effect>)discipline as RFC-0005 §3 per-host state. Lives innixfleet-state-machinealongside the host state machine. Proptest invariants. Replay-friendly. -
Every applier-written CP table becomes a derived view.
rolloutsandquarantined_closuresgainevent_log_seqforeign-key primary references; applier co-writes the canonical event_log row and the derived-view row in a single transaction. -
One canonical store, derived views provably re-derivable. If any derived-view table is lost (DB rebuild, schema migration), a walk over
event_logreconstructs it. The reducer state cache (host_rollout_records) and the outbound queue (dispatch_queue) are explicit exceptions — they hold work-in-flight state that isn’t pure derivation. -
Rollout-level events captured in
event_log. Today, only per-host events land there. After this RFC, rollout-level transitions (RolloutOpened,RolloutTerminal,RolloutSuperseded) also land, giving operators and replay tools a complete chronological view at both granularities. -
No reducer composition headaches. The rollout state machine consumes a subset of per-host events as inputs (it sees
HostStateChangedevents emitted by the per-host applier) but operates on its own state. The two reducers run sequentially in the same applier transaction; no cross-mutator hazards.
3. Rollout state machine
┌──────────────────────────────────────────┐
▼ │
┌─────────┐ ┌──────────┐ ┌────────────┐ ┌─────────────┴───┐
│ Opening │───▶│ Active │───▶│ Converging │───▶│ Terminal │
└─────────┘ └──────────┘ └────────────┘ └─────────────────┘
│ │ │
│ ▼ │
│ ┌─────────────┐ │
│ │ Reverted │ │
│ └─────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────┐ │
│ │ Failed │ │
│ └─────────────┘ │
│ │
└─── superseded ─────────────┐ │
▼ ▼
┌────────────────────────────────────┐
│ Superseded │
└──────────────┬─────────────────────┘
│
▼
┌─────────────┐
│ Pruned │
└─────────────┘
Eight states:
| State | Meaning | Entered by | Exited by |
|---|---|---|---|
Opening | Channel-refs poll detected new ref; rollout opened; no hosts dispatched yet | RolloutOpened event | First HostJoined event (→ Active) or SuccessorOpened (→ Superseded, rare) |
Active | At least one host is in-flight (Pending/Activating/Soaking per RFC-0005) | First HostJoined event | All in-flight hosts reach Soaked or Converged (→ Converging); or any host enters Failed/Reverted (→ Reverted/Failed) |
Converging | All dispatched hosts reached Soaked; later waves remain to dispatch | All current-wave hosts reach Soaked | Next wave dispatched (→ Active); all hosts in all waves Converged (→ Terminal) |
Terminal | All hosts in all waves are Converged; channel-edges may release | All hosts Converged | SuccessorOpened (→ Superseded) or retention expiry (→ Pruned) |
Reverted | Any host reached Reverted via rollback-and-halt policy | First host Reverted event | Manual OperatorClearance (rare) or SuccessorOpened (→ Superseded) |
Failed | Any host stuck in Failed state without rollback (e.g., halt-only policy) | First host Failed event with policy != rollback-and-halt | Manual OperatorClearance or SuccessorOpened |
Superseded | A newer rollout for the same channel opened | SuccessorOpened event | Retention expiry (→ Pruned) |
Pruned | Retention timeout elapsed; rollout no longer actionable | RetentionExpired event | Row persists (table remains re-derivable from event_log); physical row deletion deferred to v0.3 retention-compaction. The in-memory state-machine instance is freed; the DB row stays for audit. |
Invariants enforced by the reducer:
Terminal ⇒ ∀ host ∈ rollout: state == Converged.Reverted ⇒ ∃ host ∈ rollout: state == RevertedAND no host is currently in-flight on the original target.- A
RolloutOpenedevent for(channel, ref)where the channel’sactive_rollout_id != Noneis a structural error → reducer returnsTransitionError::SupersessionExpected(the planner must emitSuccessorOpenedfirst). Supersededis terminal-for-ordering but not terminal-for-pruning. Channel-edges treatSupersededlikeTerminal; retention treats them differently.
4. Rollout-level events
All events are CP-internal (emitted by the applier as it processes per-host events). They do NOT cross the agent ↔ CP wire — agents only emit per-host events per RFC-0005 §4.2; CP synthesizes rollout-level events from those inputs.
Stored in event_log with kind = 'rollout_event' (new value alongside the existing agent_event | plan_action | effect | gate_decision | verify_outcome | manifest_poll).
#![allow(unused)]
fn main() {
pub enum RolloutEvent {
RolloutOpened {
rollout_id: RolloutId,
channel: ChannelId,
target_ref: ChannelRef,
at: DateTime<Utc>,
},
HostJoined {
rollout_id: RolloutId,
host_id: HostId,
wave: u32,
at: DateTime<Utc>,
},
HostStateChanged {
rollout_id: RolloutId,
host_id: HostId,
from: HostRolloutState,
to: HostRolloutState,
at: DateTime<Utc>,
},
WaveAdvanced {
rollout_id: RolloutId,
from_wave: u32,
to_wave: u32,
at: DateTime<Utc>,
},
RolloutTerminal {
rollout_id: RolloutId,
at: DateTime<Utc>,
},
SuccessorOpened {
superseded_rollout_id: RolloutId,
successor_rollout_id: RolloutId,
at: DateTime<Utc>,
},
RetentionExpired {
rollout_id: RolloutId,
at: DateTime<Utc>,
},
OperatorClearance {
rollout_id: RolloutId,
operator: String,
reason: String,
at: DateTime<Utc>,
},
}
}
These mirror the existing PlanAction outputs (RFC-0006 §4.1) but with explicit state-machine semantics. The applier emits a RolloutEvent into the rollout reducer for each relevant per-host transition, then writes the resulting effects.
5. Rollout-level effects
#![allow(unused)]
fn main() {
pub enum RolloutEffect {
RecordRolloutTransition {
rollout_id: RolloutId,
from: RolloutState,
to: RolloutState,
at: DateTime<Utc>,
},
UpdateCurrentWave {
rollout_id: RolloutId,
wave: u32,
},
InsertQuarantineFromRollout {
channel: ChannelId,
closure_hash: ClosureHash,
triggering_event_log_seq: i64,
},
SchedulePruning {
rollout_id: RolloutId,
delay: Duration,
},
}
}
The applier interprets these effects against the rollouts derived-view table. Each effect produces one event_log row (the triggering RolloutEvent) AND one or more derived-view writes, in a single SQL transaction.
6. Derived-view discipline (Lever B)
6.1 The rule
A CP DB table is derived if and only if:
- The applier is its only writer.
- Every row carries an
event_log_seq INTEGER REFERENCES event_log(seq)column (or a compound key including one). The FK is the proof obligation for re-derivability. - The derived-view row is co-written by the applier in tight temporal coupling with the canonical
event_logappend. Target shape: single SQL transaction (atomic). Current v0.2 shape (matchesprobe_failuresin RFC-0007 §7.2): the event_log writer is a fire-and-forget bounded-mpsc task, so the applier inserts the derived-view row withevent_log_seq = NULLand tightens to NOT NULL once the writer gains synchronous seq return. The eventual-consistency window between the event_log row landing and the derived-view row landing is bounded (single-applier-task ordering) and operator-observable via the prune-timer’s audit metric. - Walking
event_logchronologically can reproduce the table from empty.
The looser current shape (item 3) preserves invariants 1, 2, and 4. What is deferred is only the atomicity guarantee against a crash between the mpsc-send and the derived-view insert. Operators monitor this window via the prune-timer metric; a follow-up tightens it to true single-transaction.
6.2 Tables and their classifications post-RFC-0008
| Table | Class | Notes |
|---|---|---|
event_log | Canonical | Append-only audit; sole source-of-truth |
host_rollout_records | Reducer state cache | Per-host state machine cache; rebuilt from event_log on cold start |
dispatch_queue | Outbound queue | Work-in-flight, not derivation |
probe_failures | Derived view | Already conforms (RFC-0007 §7.2) |
rollouts | Derived view (RFC-0008 §6.3) | Migrated from independent-write to applier-co-write with event_log_seq FK |
quarantined_closures | Derived view (RFC-0008 §6.4) | Migrated similarly |
token_replay | Security lookup (exception) | TTL-pruned; different lifecycle than event_log audit. Justified separate. |
cert_revocations | Security lookup (exception) | Same as token_replay. |
The two security-lookup tables are the documented exceptions. Any future applier-written table must conform to the derived-view rule.
6.3 rollouts migration
The rollout_id is content-addressed from (channel, channel_ref) via the canonical format "{channel}@{channel_ref}". Constructed only via RolloutId::new(channel, channel_ref); the newtype’s private inner field prevents ad-hoc construction (same no-public-constructor pattern as Verified<T> per RFC-0006 §3, with a test-only escape hatch under #[cfg(any(test, feature = "test-helpers"))]). The format choice is operator-visible (appears in CLI output, the event_log payload, and rollout-event tag bodies) and matches the existing display_name convention.
display_name vs RolloutId. Both carry the <channel>@<X> shape but they are NOT interchangeable. RolloutId ({channel}@{channel_ref}) is the primary key: full channel_ref (typically a 40-char git SHA), wire-validated by the CP route, persisted in rollouts.rollout_id, and the only value that resolves to a manifest at GET /v1/rollouts/<rolloutId>. display_name ({channel}@{short-ci-commit}) is a producer-supplied, human-skimmable label carried inside the manifest payload — usable in operator surfaces, never used for lookup or equality. The display_name field is retained for compatibility with the v0.1 rendering convention and may go away in a future schema bump.
Rationale: two channels can share a channel_ref (the architectural point of multi-channel cascading from a single git push). rollout_id = channel_ref alone collides in that topology; rollout_id = channel alone violates the content-addressed property of the rest of the cycle. The composite encoding preserves both: unique per (channel, channel_ref) AND deterministic across replays. Re-derivability from event_log walks (RFC-0004 §2.4) holds because the identity is reproducible from the canonical-format inputs alone.
New schema:
CREATE TABLE rollouts (
rollout_id TEXT PRIMARY KEY,
channel TEXT NOT NULL,
target_ref TEXT NOT NULL,
state TEXT NOT NULL
CHECK (state IN ('Opening', 'Active', 'Converging', 'Terminal',
'Reverted', 'Failed', 'Superseded', 'Pruned')),
current_wave INTEGER NOT NULL DEFAULT 0,
-- FK columns are NULL-able under the v0.2 derived-view shape (matches
-- probe_failures per §6.1 item 3 + RFC-0007 §7.2): the bounded-mpsc
-- event_log writer is fire-and-forget so the applier doesn't know
-- `seq` at co-write time. A follow-up tightens these to NOT NULL when
-- the writer gains synchronous seq return.
opened_event_log_seq INTEGER REFERENCES event_log(seq),
last_transition_event_log_seq INTEGER REFERENCES event_log(seq),
opened_at TEXT NOT NULL,
terminal_at TEXT,
superseded_at TEXT
);
CREATE INDEX rollouts_channel_state ON rollouts(channel, state);
CREATE INDEX rollouts_in_flight ON rollouts(state)
WHERE state IN ('Opening', 'Active', 'Converging', 'Reverted', 'Failed');
Every state column update carries a corresponding event_log row whose seq becomes the new last_transition_event_log_seq. The boolean methods (is_superseded, is_terminal, is_finished) collapse into a single state enum read.
6.4 quarantined_closures migration
New schema:
CREATE TABLE quarantined_closures (
channel TEXT NOT NULL,
closure_hash TEXT NOT NULL,
quarantined_at TEXT NOT NULL,
-- NULL-able under the v0.2 derived-view shape; tightens to NOT NULL
-- with the same writer-side change as rollouts + probe_failures.
-- See §6.1 item 3.
triggering_event_log_seq INTEGER REFERENCES event_log(seq),
PRIMARY KEY (channel, closure_hash)
);
CREATE INDEX quarantined_closures_active ON quarantined_closures(channel);
The triggering_event_log_seq points at the RollbackComplete event (RFC-0005 §4.2) that produced the quarantine. Re-derivability: walk event_log for RollbackComplete events, group by (channel, target_closure_hash), write one row per group with the lowest seq as the trigger.
7. Reducer composition
The rollout reducer and the host reducer both consume per-host events but with different concerns:
agent posts ProbeResult
│
▼
applier receives event
│
├─▶ host reducer: step(host_state, event, now) → (new_host_state, host_effects)
│ │
│ └─▶ applier writes event_log + probe_failures + host_rollout_records
│
└─▶ rollout reducer: step(rollout_state, RolloutEvent::HostStateChanged{...}, now)
│ → (new_rollout_state, rollout_effects)
│
└─▶ applier writes event_log (kind='rollout_event') + rollouts derived view
Both run in the same applier transaction. No new MPSC; no second mutator. The host reducer’s output is the rollout reducer’s input. Order is deterministic (host first, then rollout aggregates).
The two reducers remain in nixfleet-state-machine:
crates/nixfleet-state-machine/src/
lib.rs — exports both step() functions
host/ — existing per-host reducer (RFC-0005 §3)
state.rs, event.rs, effect.rs, transitions/...
rollout/ — NEW per-rollout reducer (RFC-0008 §3)
state.rs, event.rs, effect.rs, transitions/...
Cargo.toml purity contract unchanged: no tokio, no reqwest, no rusqlite, no chrono::Utc::now(). Both reducers are pure functions of their inputs.
8. Operator-visible improvements
/v1/rollouts/{id}/events(RFC-0007 §7.2) becomes richer: it now surfaces rollout-level transitions in addition to per-host events. Operators see the full chronological story./v1/rollouts(existing): can project rollout state from the newstateenum column instead of computing it from booleans. The query simplifies.- Audit replay: an auditor walking
event_logchronologically reconstructs rollout-level state evolution without needing CP-internal knowledge. Today they would need to know thatrecord_active_rolloutSQL writes correspond to “rollout opened” — opaque. - No silent shadow-state drift: by construction,
rolloutsandquarantined_closurescan’t disagree withevent_log— they’re written in the same transaction with FK-back.
RFC-0009: Hardware-rooted trust
Status. Draft. Targets. v0.3. Depends on. RFC-0001, RFC-0003, ../design/architecture.md §4 (trust roots) / §5 (failure cases). Scope. Anchor v0.2’s signed-evidence chain in hardware. Move host signing keys into the TPM, bind agenix-style secret decryption to PCR state, add boot measurements as a probe class with the same signature semantics as runtime probes. Out of scope: confidential computing (SEV/TDX), TPM 1.2, ARM SBCs without a TPM (soft-key fallback only).
1. Motivation
../design/architecture.md §5 names the residual risk verbatim:
Host is compromised (root on the target machine). Attacker can: read secrets decrypted for that host, forge probe outputs signed with that host’s key.
v0.2 relies on the host’s SSH host key (/etc/ssh/ssh_host_ed25519_key) for both signing probe outputs and decrypting agenix secrets. That key is on disk. Disk extraction or root post-boot grants the attacker the same signing capability the host has - a forged ComplianceFailureSignedPayload is indistinguishable from a real one to the offline auditor (nixfleet-verify-artifact probe).
Closing this gap is the v0.3 thesis. The trust property RFC-0009 establishes:
A signed artifact from a host is also a proof that the host’s measured boot state matched declared expectations at the moment of signing.
2. Design principle
The TPM does not become a trust authority. It becomes a verifier of conditions for cryptographic operation. Every property v0.2 verifies cryptographically continues to be verified cryptographically; the TPM ensures those operations cannot be performed under conditions other than the intended ones.
The four trust roots in ../design/architecture.md §4 do not change. What changes is the per-host signing key: it is now generated inside the TPM, sealed against a declared PCR set, and cannot be exported. The control plane gains no new authority.
3. What already exists in v0.2
impls/keyslots/tpm/ ships a working TPM2 keyslot abstraction:
nixfleet.keyslots.tpm.keys.<name>- first-boot oneshot creates a primary key, evicts to a persistent handle, exports the public half.pcrPolicy = [ "0" "2" "4" "7" ]- bind the auth policy to a chosen PCR set; signing fails on PCR mismatch.algorithm = "ecdsa-p256" | "ed25519"- both supported. ecdsa-p256 is the realistic default (commodity TPMs rarely implement ed25519).- Per-key
tpm-sign-<name>shell wrapper that consumers (CI runner, agent) invoke to sign a file. - Idempotent across impermanence wipes - re-extracts pubkey from the persisted handle.
RFC-0009 does not reimplement any of this. It extends the existing surface with a single concept (host-identity binding) and adds the missing wire and verification machinery (boot evidence, PCR-bound secret recipients, expected-PCR derivation).
4. Components
4.1 TPM-bound host identity
Add one option to the existing keyslots.tpm.keys.<name> schema:
nixfleet.keyslots.tpm.keys.host-identity = {
handle = "0x81010003";
algorithm = "ecdsa-p256";
pcrPolicy = [ "0" "2" "4" "7" "8" "9" "11" "12" "13" "14" ];
enrollAsHostIdentity = true; # NEW - RFC-0009
};
When enrollAsHostIdentity = true:
- The keyslot’s exported pubkey (
pubkey.raw) becomes the host’s signing identity.nixfleet.host.signingPubkeyFile(new contract field oncontracts/host-spec.nix) resolves to the keyslot’spubkey.rawpath. - The agent uses the keyslot’s
tpm-sign-host-identitywrapper instead ofevidence_signer.rs’s file-backed ed25519 path. Probe outputs are TPM-signed. - The keyslot’s pubkey is what
mkFleetreferences inhosts.<name>.pubkey(existing field per RFC-0001 §2.1, currently a bare OpenSSH-format string - extended to allow either an inline string or{ source = "tpm-keyslot/host-identity"; }).
Exactly one keyslot per host may set enrollAsHostIdentity = true. mkFleet asserts this at evaluation time.
The host’s SSH host key is not removed. It continues to anchor agenix decryption (until §4.3.1 lands) and SSH transport. The TPM-bound key takes over the v0.2 signing role; ssh and agenix continue using the SSH key during the migration window. After §4.3.1 ships, agenix can opt into TPM unsealing per-secret.
The same keyslot mechanism gains a second consumer beyond signing: LUKS volume-key unsealing. A keyslot may declare enrollAsLuksUnsealer = { devices = [ "root" "data" ]; } and at first boot bind those LUKS devices to its PCR policy via systemd-cryptenroll --tpm2-device=auto --tpm2-pcrs=<policy>. The same pcrPolicy field gates both uses; signing and unsealing share one cryptographic condition. §4.3.2 describes the feature surface that consumes this.
4.2 Boot measurement chain
UEFI Secure Boot + systemd-stub + systemd-measure produce a deterministic PCR trace from firmware through the kernel command line. The closure declaration knows the kernel hash, initrd hash, and cmdline; CI computes the expected PCR set per host and emits it as part of fleet.resolved.json.
This is an additive RFC-0001 schema extension (RFC-0001 §4.1 shape, additional optional field per host):
"hosts": {
"water-plant-01": {
"system": "x86_64-linux",
"closureHash": "sha256-...",
"tags": ["..."],
"channel": "stable",
"pubkey": "ssh-ed25519 AAAA...",
"expectedBootEvidence": {
"pcrPolicy": { "pcrs": [0,2,4,7,8,9,11,12,13,14], "algorithm": "sha256" },
"expectedDigest": "sha256:9f4a2e...",
"firmwareGeneration": 3
}
}
}
The expectedDigest is a deterministic function of the closure’s bootable inputs (kernel, initrd, cmdline) and the host’s declared firmwareGeneration. mkFleet produces it; CI signs the whole artifact. Hosts without expectedBootEvidence are pre-§4.4 hosts that never enrolled into attestation - verification gating is opt-in per host.
firmwareGeneration is a manual integer in fleet.nix (hosts.<name>.firmwareGeneration = 3). Default 1. Operator bumps after testing new firmware on a staging host and capturing the new PCR digest. The framework refuses to make firmware drift silent: a host whose measured PCRs disagree with its declared expectedDigest is flagged regardless of whether the difference is malicious or a legitimate firmware update.
Manual is the v0.3 pick because it is one line of code and forces a human acknowledgment of every firmware change. Failure mode: an operator who runs a firmware update without bumping firmwareGeneration sees every host on that hardware drift into AttestationDrift until they bump. Auto-derivation - a capture tool that writes a checked-in firmware-evidence/<hostname>.json that mkFleet reads - is the natural follow-up to remove the forget-failure mode; it is in §10 open questions and not v0.3 scope.
Tooling: nix run .#capture-boot-evidence -- --hostname water-plant-01 runs on a staging host, reads its current PCR quote, and emits a fragment ready to paste into fleet.nix (or to commit via a follow-up CLI). No mechanism for “trust whatever the host reports” - the operator always reviews.
4.3 PCR-bound secrets
The host holds several classes of secret material whose decryption must be gated on boot state: application secrets (agenix-style files), the LUKS volume key for the system disk, and any data-volume keys. All are treated uniformly as PCR-bound secrets: encrypted at rest, sealed against a declared PCR policy, unsealed by the TPM only when the boot chain matches.
The unifying property:
A PCR-bound secret is decryptable only on a host whose boot measurements match the closure-derived expectation declared for that secret.
A tampered kernel, modified initrd, or unauthorized cmdline produces a PCR mismatch, which produces a TPM authorization failure, which produces a decryption failure - for every secret on that host, including the disk that holds its own filesystem. An attacker who modifies the boot chain to extract secrets at runtime does not get past the unlock step.
4.3.1 Application secrets (agenix-style)
agenix recipient declarations gain a TPM-unsealing variant. The contract addition lives in impls/secrets/ (already the home of identity-path resolution per ../design/architecture.md §9.4):
age.secrets.cluster-token = {
file = ./secrets/cluster-token.age;
recipients = [
{ type = "host-tpm";
host = "water-plant-01";
pcrPolicy = "@boot"; } # references expectedBootEvidence above
];
};
@boot resolves at evaluation time to the host’s declared expectedBootEvidence.pcrPolicy. Custom PCR sets (pcrPolicy = { pcrs = [0 7]; algorithm = "sha256"; }) are accepted for secrets that need different boot-state binding than the host-identity key.
Encryption produces an age stanza wrapping the secret to a TPM-policy recipient. Decryption succeeds only when the PCR state at unseal time matches the policy. The control plane never sees plaintext or the unsealing condition.
Implementation note: this requires extending the agenix decryption path. Two options on the table - a small wrapper around age that invokes the TPM keyslot’s wrapper for stanza decryption, or a clevis-style integration. Pick at implementation time; the wire/declaration shape above is what consumers depend on.
4.3.2 Full-disk encryption
The host’s LUKS volume keys are treated as PCR-bound secrets with the same @boot policy. The feature surface aggregates the keyslot wiring (§4.1’s enrollAsLuksUnsealer), the operator-facing escrow record, and the soft-encryption flag:
nixfleet.diskEncryption = {
enable = true;
pcrPolicy = "@boot"; # symbolic reference to the host-identity keyslot's pcrPolicy
devices = [ "root" "data" ]; # LUKS device names (boot.initrd.luks.devices.<name>)
recovery = {
method = "paper"; # paper | yubikey | split-shares | external-kms
location = "vault-b/safe-3"; # documented; never on-host
rotated = "2026-05-14"; # ISO 8601
};
allowSoftEncryption = false; # true permits passphrase-only on TPM-less hosts; flagged in evidence
};
At install time, disko declares the LUKS layout and nixos-anywhere formats accordingly - no new provisioning path. After first boot, systemd-cryptenroll --tpm2-device=auto --tpm2-pcrs=<policy> (driven by the keyslot’s enrollAsLuksUnsealer activation) binds the declared volume keys to the TPM with the keyslot’s PCR policy. Subsequent boots auto-unlock when PCRs match; no human at boot, no external KMS, no key drift between attestation and decryption.
/boot is not encrypted - it cannot be, the firmware must read it - but its integrity is verified by Secure Boot signatures via lanzaboote. The combination is the standard pattern: integrity-verified boot partition + PCR-bound encrypted root.
4.3.3 Encrypted swap
Two acceptable patterns:
- Random-key swap. Swap partition rekeyed on every boot with a kernel-random key never written to disk. Simple, no escrow needed, but loses hibernate-to-disk. Default.
- PCR-bound swap. LUKS swap unlocked from the same TPM policy. Required for hibernate support. Recovery key shares the escrow procedure of the root key.
The compliance probe (see nixfleet-compliance _encryption-at-rest, rule EAR-03) accepts either pattern.
4.3.4 Recovery flow
The TPM is the only entity that can unseal a volume key for a host. If the TPM fails (hardware failure, motherboard replacement, deliberate PCR change from a firmware update the operator did not pre-test), the host is unbootable until a recovery key is supplied at the LUKS unlock prompt.
The recovery key:
- Is generated at enrollment, never on a routine boot.
- Is escrowed off-host in one of the supported methods (paper backup, YubiKey slot, Shamir-split shares, external KMS). Format and location are operator’s choice; the framework records the choice and a rotation date, not the material.
- Is single-purpose: it unlocks the LUKS keyslot for a one-time boot, after which the operator either re-enrolls TPM unlock (if the boot chain is now the new expected state) or rotates the volume key entirely (if the host’s identity has changed).
This is the same procedure shape as RFC-0010’s org-root recovery: documented, witnessed, infrequent, observable. The volume-key escrow is one more entry in the same escrow inventory.
4.3.5 Hosts without a TPM
Edge devices and some ARM SBCs lack a TPM 2.0. The framework allows passphrase-only LUKS on these hosts when nixfleet.diskEncryption.allowSoftEncryption = true, with a soft_encryption: true attribute surfaced in the compliance probe’s evidence JSON. Compliance frameworks that require TPM binding (NIS2 essential, ANSSI BP-028 reinforced+) flag these hosts as non-compliant via per-rule severity gating; the operator either declares a rationale exception (per the governance engine) or replaces the hardware.
4.4 Boot-state probe class
The agent collects boot measurements via tpm2 quote with a control-plane-issued nonce, signs the quote with the TPM-bound host key (§4.1), and includes it in the checkin payload.
This rides RFC-0003 §4.1 POST /agent/checkin - boot state can drift between activations (firmware updates without reboot are rare but possible), and the cost of carrying a fresh quote on every checkin is negligible. The schema extension to CheckinRequest (additive, Option<T> + serde(default) per nixfleet-proto convention):
#![allow(unused)]
fn main() {
pub struct CheckinRequest {
// ... existing fields ...
pub boot_evidence: Option<BootEvidence>, // NEW - RFC-0009
}
pub struct BootEvidence {
pub pcr_quote: Vec<u8>, // TPM2_Quote output
pub pcr_signature: Vec<u8>, // host-key signature over pcr_quote || nonce
pub nonce: [u8; 32], // CP-issued, anti-replay (delivered in prior checkin response)
pub measured_digest: Digest, // computed locally from quote
pub firmware_generation: u32, // host's declared generation
}
}
The CP-side response (CheckinResponse) gains a next_attestation_nonce: [u8; 32] field that the agent caches and replays on the next checkin’s boot_evidence.nonce. Anti-replay is bounded by the freshness window (RFC-0011).
Verification happens twice. Agent-side: a sanity check that the locally-measured digest matches the locally-quoted one (catches local tooling failures, not malice). CP-side: compare measured_digest against expectedBootEvidence.expectedDigest from the host’s fleet.resolved entry; verify the host-key signature on the quote+nonce; emit one of three outcomes:
AttestationOK- digest matches. Soft-recorded; no action.AttestationDrift- digest mismatches but signature is valid. The host is honestly reporting an unexpected boot state. NewReportEvent::AttestationDrift { hostname, expected, measured }(additive wire variant per the RFC-0003 idiom). Triggers RFC-0010 §4 quarantine if persistent.AttestationInvalid- signature does not verify. Host is lying or impersonating. SameReportEventpayload but distinct status; triggers immediate quarantine, not threshold-based.
Hosts with no expectedBootEvidence declared in fleet.resolved and no boot_evidence in their checkin are pre-attestation hosts; the CP records attestation_status = none and proceeds normally. Migration is per-host, not all-or-nothing.
4.5 Closure-derived expectations
mkFleet is extended to produce, per host, the expectedBootEvidence block deterministically from:
- The host’s
configuration.config.system.build.toplevel(kernel + initrd + cmdline reachable from there). - The host’s
firmwareGenerationinteger. - The PCR set declared in
nixfleet.keyslots.tpm.keys.host-identity.pcrPolicy(single source of truth for which PCRs matter for this host).
This generalizes a property the framework already has: closure hashes are deterministic functions of inputs. Boot-evidence prediction is the same property applied to TPM measurements. Implementation detail - the digest computation may need to call out to a small Rust helper (nixfleet-pcr-predict) because pure-Nix PCR prediction for systemd-stub measurements is non-trivial; that helper runs as a derivation builder, not at agent runtime.
5. Trust analysis
Properties added.
- Extraction-resistant host signing keys: stealing a disk yields no usable signing capability.
- Boot-state proof in the evidence chain: every probe signature now also attests “the boot chain at signing time matched the declared expectation.”
- Secret access bound to boot state: a tampered kernel produces a TPM authorization failure before plaintext is reachable.
- Detectable kernel/initrd tampering before secrets are decrypted, before probes are signed.
Properties not added.
- Protection against runtime tampering after a successful unsealed boot. Root post-boot operates within the unsealed-key scope until reboot. Mitigation requires confidential computing (AMD SEV / Intel TDX) - a future RFC.
- Protection against TPM-bus physical attacks. Out of scope; well-funded attackers with sustained physical access can attack the bus. Confidential computing again.
- Trust in the TPM manufacturer beyond what the EK certificate verification policy specifies. RFC-0010 picks a default policy.
Failure cases (per ../design/architecture.md §5 idiom).
- TPM unavailable on a host. Enrollment refuses unless
enrollAsHostIdentityis left unset; the resulting deployment continues using the v0.2 SSH-host-key path. Visible in fleet status assigningBackend: ssh-host-key(vstpm-keyslot/host-identity). - PCR drift from firmware update. Operator tests new firmware on a staging host, runs
nix run .#capture-boot-evidence, bumpsfirmwareGeneration, commits. CI reissuesfleet.resolved.json. Until then, agents on updated firmware emitAttestationDrift; if persistent, RFC-0010 §4 quarantines them. - TPM hardware failure. Host is re-enrolled as a new host (new EK, new keyslot pubkey, new mTLS cert via the existing enrollment flow). The old record is retired. Procedure documented in RFC-0010 §7 rotation runbooks.
- Operator forgot to declare a legitimate change. Same as drift: visible, blocking, recoverable. The framework refuses silent acceptance.
6. Wire-protocol additions
| Artifact | Addition | Type |
|---|---|---|
fleet.resolved.json host entries | expectedBootEvidence: Option<...> | additive (RFC-0001 §4.1, no version bump) |
CheckinRequest | boot_evidence: Option<BootEvidence> | additive (RFC-0003 §4.1, no version bump) |
CheckinResponse | next_attestation_nonce: Option<[u8; 32]> | additive (RFC-0003 §4.1, no version bump) |
ReportEvent | AttestationDrift, AttestationInvalid | additive variants (RFC-0003 §4.3) |
HostStatusEntry (CP /v1/hosts) | attestation_status, boot_state_age | additive |
PROTOCOL_MAJOR_VERSION does not change. Pre-RFC-0009 agents and CPs interoperate with RFC-0009 components transparently - they simply lack attestation enforcement.
7. Migration
Per-host opt-in.
- Enable
nixfleet.keyslots.tpmon the host. First-boot generates ahost-identitykeyslot at the declared handle. - Capture initial expected-boot-evidence:
nix run .#capture-boot-evidence -- --hostname <h>. Commit the fragment. - Set
enrollAsHostIdentity = trueon the keyslot, changehosts.<h>.pubkeyto reference the keyslot. Commit. - CI rebuilds, signs new
fleet.resolved.json. Agent on next checkin starts includingboot_evidence. - CP starts logging attestation outcomes; advisory only until the operator promotes to enforcement (via RFC-0010 §4 quarantine policy).
There is no fleet-wide flag day. The framework supports a mixed fleet (some hosts attested, some not) indefinitely - the per-host expectedBootEvidence field’s presence is the per-host opt-in.
8. Work items
expectedBootEvidenceschema + mkFleet derivation. Schema lands in RFC-0001’s evaluation contract;mkFleetproduces the field for hosts that have declaredenrollAsHostIdentity. CI’snixfleet-releasesigns over the new field. Deliverable:fleet.resolved.jsonfor an attestation-opted host carries a validexpectedBootEvidenceblock; bit-flipping it fails verification.- Host-identity keyslot.
enrollAsHostIdentityflag + agent’s switch from SSH-key signing to TPM-wrapper signing for probe outputs. Deliverable: a host with the flag set produces probe-output signatures that the offline auditor (nixfleet-verify-artifact probe) verifies against the TPM-derived pubkey, and an attempt to sign a fake probe with on-disk material is rejected by the same auditor. - Boot-evidence collection (advisory). Agent collects + signs PCR quote on every checkin; CP logs
AttestationOK / Drift / Invalidtohost_reports(the existing SQLite table covered by the CP-resident-state recovery profile in docs/design/architecture.md §6). No gating yet. Deliverable: a tampered kernel produces a visibleAttestationDriftevent in fleet status. - PCR-bound secrets. Two consumers of the same TPM keyslot policy: agenix-style application secrets (§4.3.1) and LUKS volume keys (§4.3.2). One PCR-binding implementation shared across both. Deliverable: a secret encrypted to a host’s
@bootPCR policy fails to decrypt on a tampered boot chain, AND the host’s root filesystem fails to auto-unlock on the same tamper, without any agent-side or CP-side intervention. The escrow declaration (§4.3.4) is documented as part of the operator workflows in RFC-0010.
Enforcement (boot-evidence as a wave-promotion gate) is the subject of RFC-0010 §4 - kept separate so the mechanism (this RFC) and the policy (lifecycle RFC) ship independently.
9. Falsifiable done criteria
- Disk extraction from a host enrolled with the host-identity keyslot yields no usable signing capability; an attempt to use the on-disk material to sign a fake probe is rejected by
nixfleet-verify-artifact probeagainst the host’s TPM-derived pubkey. - Booting a host with boot-evidence collection active and a modified kernel or initrd produces a PCR mismatch that the CP detects on the next checkin and emits as
AttestationDrift. - A secret encrypted to a host’s
@bootPCR policy fails to decrypt when the boot chain is modified, without operator intervention. - A host with TPM hardware failure can be re-enrolled and rejoin the fleet under the documented procedure (RFC-0010 §7) in under 30 minutes.
- A firmware update that the operator has tested and declared via
firmwareGenerationrolls out without triggering attestation drift. - A v0.3 host’s disk extracted and mounted on another machine produces no readable filesystem; the LUKS keyslot cannot be unsealed without either the original TPM under matching PCRs or the recovery key.
- A deliberate kernel modification on a v0.3 host prevents the root filesystem from auto-unlocking on next boot; the host enters the recovery-key prompt.
- The documented recovery procedure restores boot in under 15 minutes given access to the escrowed material.
10. Open questions
- PCR set defaults. Proposal
[0 2 4 7 8 9 11 12 13 14]covers firmware + secure-boot databases + bootloader + kernel + initrd + cmdline. Tighter sets are possible (just[0 7]for firmware + secure-boot DBs) but lose useful attestation surface. Lean: declared-per-host with a[0 2 4 7 8 9 11 12 13 14]default. - Auto-derived
firmwareGeneration. Manual is v0.3; the natural follow-up is a capture tool that writes a checked-in evidence filemkFleetreads, removing the operator-forgot-to-bump failure mode. Out of scope for v0.3 but on the v0.4 shortlist. - PCR prediction tooling.
nixfleet-pcr-predictas a small Rust derivation builder vs calling out tosystemd-measuredirectly. Lean: wrapsystemd-measurefor v0.3, replace with native code only if reproducibility issues appear. - agenix integration shape. Wrapper around
agevs Clevis-style pluggable backend. Lean: wrapper for v0.3 (smaller surface), Clevis if a customer needs it. - ARM SBC compatibility. Many target-vertical edge devices are ARM without TPM. SSH-host-key fallback exists indefinitely; OP-TEE-backed identity is a separate future RFC, not blocking v0.3.
- EK certificate verification policy. Manufacturer-chain vs. self-signed inventory at enrollment time. RFC-0010 §3 picks a default.
11. One-sentence summary
The host’s signing key lives in the TPM, the boot chain is measured into the signature, and the same boot-chain match gates every secret on the host - application secrets, application data, and the root filesystem itself; a tampered host cannot speak in the fleet’s evidence chain, and a stolen disk yields no readable filesystem.
RFC-0010: Trust lifecycle
Status. Draft. Targets. v0.3. Depends on. RFC-0001, RFC-0003, RFC-0009, ../design/architecture.md §4. Scope. Specify the lifecycle of every key, credential, and authorization in the v0.2/v0.3 trust model: how each is created, held, used, rotated, and retired. Add (a) EK-bound bootstrap tokens, (b) host-attestation quarantine policy, (c) opt-in threshold-signed channels, (d) tested key-rotation runbooks. Most of this RFC is documentation + small tooling; the only meaningful new mechanism is multi-signer release coordination.
1. Motivation
../design/architecture.md §4 describes the trust model statically - four roots, derivation rules, verification posture. What is documented and tested:
- Pre-announced rotation slots (
current/previous/successor/retireAt) on the trust contract -contracts/trust.nixenforces the paired-options invariant. - Bootstrap tokens with hostname + pubkey-fingerprint scoping, single-use via the
token_replaySQLite table, signed by the org root key - RFC-0003 §4.5. - Closure-hash quarantine after activation failure (
ClosureQuarantinedevent, agent-side state-dir record). - Cert revocation via the signed
revocations.jsonsidecar replayed intocert_revocationson every reconcile tick.
What is missing and what auditors ask for first:
- How operators physically hold the four root keys.
- How the org root key is generated, witnessed, and escrowed.
- How CI uses its release key without holding it directly.
- How a host’s first mTLS cert is bound to its actual hardware (not just the operator’s claim about it).
- What happens to a host that persistently fails attestation (RFC-0009 §4.4) or probes - beyond the existing closure-level quarantine.
- Tested rotation procedures for each of the four root keys.
The documentation gap is the larger work. The mechanism additions are: EK binding on bootstrap tokens (small), host-attestation quarantine policy (small, reuses cert-lifetime as revocation horizon), threshold-signed channels (the only nontrivial new mechanism).
1.5 Trust-root wiring (v0.2 baseline)
The lifecycle work in this RFC sits on top of an existing v0.2 wiring path that carries declared trust roots from the Nix layer to the runtime verify call. That path is the load-bearing assumption every subsequent section makes; this section captures the shape so readers do not need to reconstruct it from source.
Declarations live under nixfleet.trust.{ciReleaseKey,cacheKeys,orgRootKey} in the Nix layer (modules/contracts/trust.nix). Each entry is a KeySlot with current, previous, and rejectBefore fields - current is the active key, previous covers the rotation grace window, and rejectBefore is the compromise-incident switch that refuses any artifact whose meta.signedAt predates the cutoff regardless of which key produced the signature. The CP-host NixOS module (modules/scopes/nixfleet/_control-plane.nix) materialises the declared attrset as /etc/nixfleet/cp/trust.json at activation time and passes --trust-file on the CP binary’s command line. Agents follow the same pattern through /etc/nixfleet/agent/trust.json. The on-disk file is world-readable because it contains only public material; schemaVersion: 1 is required at the top level and binaries refuse to start on unknown versions.
At runtime the CP deserialises the file into proto::TrustConfig, and on every fleet.resolved load calls slot.active_keys() to get the &[TrustedPubkey] slice handed to reconciler::verify_artifact. The verify function iterates the slice and matches on each entry’s algorithm tag, which is what makes cross-algorithm rotation work end-to-end - the same call site verifies ed25519-signed and ecdsa-p256-signed artifacts as long as both keys are present in the active slot pair. The CP never holds trust private keys; the org root, CI release key, and attic signing key all live with operator hardware or CI signing tooling outside the CP host. Rotation happens declaratively in fleet.nix and reaches the CP via the normal nixos-rebuild activation path; no separate trust-state replication channel exists, and the CP is reconstructible from git plus agent check-ins by design (docs/design/contracts.md §IV).
1.5.1 Amendment (2026-05-17) — CA-issuance signing key
The three trust roots above (ciReleaseKey, cacheKeys, orgRootKey) remain outside the CP host. A fourth signing key has since been added to CP’s responsibility surface: the fleet CA issuance key, used by /v1/enroll and /v1/agent/renew-cert to sign agent mTLS client certs. This key was introduced in feat(cp,trust): cert issuance (commit 4808d4dc) but the trust-model wording in this section, in RFC-0005 §2.1, and in RFC-0006 §6 N5/N6 was not amended at the time. This subsection is the canonical statement; the other RFCs reference it.
CP supports two backends for this key, selected at startup by build_signer_from_args (crates/nixfleet-control-plane/src/auth/issuance.rs:264):
| Backend | Flags | In-memory key material |
|---|---|---|
| TPM-backed (production-grade) | --tpm-ca-pubkey-raw + --tpm-ca-sign-wrapper | None. CP holds a 64-byte raw P-256 pubkey + a path to a tpm-sign-<name> wrapper. Each signing op shells out to tpm2_sign; the TPM key is created with fixedtpm | fixedparent | sensitivedataorigin | sign attributes — non-exportable, optionally PCR-bound (see impls/keyslots/tpm/default.nix). |
| File-backed (dev / fallback) | --fleet-ca-key | An agenix-decrypted PEM. make_key_pair() reads + parses the file per issuance; the private key transits CP memory at each call. |
The runtime precedence in build_signer_from_args is TPM wins when both flag triples are supplied. With the TPM backend, the §3.3 blast-radius claim (“SSH access to the CP host has the same blast radius as SSH access to any production NixOS box”) holds in full — compromising the CP filesystem yields no usable signing material; the attacker would additionally need to subvert the TPM hardware policy. With the file backend, that blast radius extends to the agenix-encrypted CA key PEM, whose at-rest protection reduces to the operator’s age/SOPS posture; the activated key plus the agenix identity together yield a working fleet-CA forge.
Production fleets SHOULD configure the TPM backend. The runtime precedence enforces it whenever both backends are present, but does not refuse-to-start when only the file backend is configured. Per RFC-0004 §1 invariant 3 (mechanical trust over advisory trust), v0.2.x adds an additive --strict enforcement gate that refuses file-only CA configurations without an explicit --allow-file-ca-key opt-in. This converts the operator-facing recommendation above into a build-time check, in the same shape as --strict’s existing gates on revocations_required and bootstrap_nonces_required.
The CA-issuance key never appears in TrustConfig; it is not a verifier key. It is a signer key, and that distinction is why the original “CP never holds trust private keys” claim was not literally violated by 4808d4dc (the key is not a trust private key in the original sense — it does not appear in any trust slot, does not sign artifacts, and the manifest pipeline does not consult it). The amendment is necessary because operator-facing language (“CP signs nothing”, “CP holds no signing key”) read as a stronger universal than the original technical claim warranted; the universal must now be qualified or made mechanical via --strict.
2. Design principle
Every authorization in the system has an explicit lifecycle: who creates it, where it lives, how long it lasts, how it is revoked, what happens when it is lost. No silent state, no implicit trust, no procedure that exists only in a single operator’s head.
When in doubt: prefer hardware-bound, short-lived, narrow-scope, observably-revocable.
3. Operator workflow specification
Three operator roles. Each has a documented hardware requirement, a stated maximum lifetime, and a defined revocation path. Procedures live in docs/runbooks/ (new directory - documentation work item); ceremony tooling in tools/keys/ (new directory - documentation work item).
3.1 Release operator
- What they hold. A YubiKey 5+ enrolled as a release-channel signer (PIV slot 9c, ECDSA P-256 - interoperates with the existing
ciReleaseKeyslot type that already supportsecdsa-p256). - What they do. Touch the YubiKey to authorize a release-signing operation. Holds no decryption capability. The CI runner’s signing process blocks on operator touch; without it, nothing is signed as a release.
- Default rotation cadence. 12 months; new YubiKey enrolled, old removed from
nixfleet.trust.ciReleaseKeyvia the existingcurrent->previousrotation slot pattern. - Loss procedure. Revoke from
nixfleet.trust.ciReleaseKey.previousimmediately;successorbecomescurrentif pre-announced, else operator runs an out-of-band rotation ceremony.
3.2 Org root operator
- What they hold. One Shamir share of the org root key, default 2-of-3 threshold. Each share lives on a hardware token (X25519 key on YubiKey or equivalent - same hardware family as 3.1 but a distinct slot).
- What they do. Reconstruct the threshold to sign bootstrap tokens (host enrollment), CI key rotation envelopes, and major trust changes. Org-root signatures are timestamped and committed to a transparency log - for v0.3, an append-only file in the fleet repo (
trust/transparency.log); future iterations may integrate with an external transparency service. - Default rotation cadence. 24 months for individual shares, on a major incident for the root itself.
- Single share lost. Routine; below threshold has no impact. Share is reissued at the next routine ceremony.
- Threshold lost. Catastrophic. Re-genesis ceremony + full fleet re-enrollment. Documented as a 24-48 hour recovery procedure. The framework does not pretend this is fast.
3.3 Infrastructure operator
- What they hold. An SSH key for break-glass access to the coordinator-class hosts.
- What they do. Diagnose and recover the CP host itself. Not part of the framework’s trust chain - the CP holds no secrets, so SSH access to the CP host has the same blast radius as SSH access to any production NixOS box.
- Default rotation cadence. 12 months. Mentioned for completeness because audit will ask.
4. Active host-attestation quarantine
When a host’s RFC-0009 boot attestation or runtime probes fail persistently, the CP stops issuing fresh mTLS certs to it. Existing certs are short-lived (default 30-day per RFC-0003 §2; renewed at 50% TTL); within one renewal cycle the host falls out of the active fleet view.
This is a distinct state machine from closure-hash quarantine (ClosureQuarantined). To keep the two clearly separate:
| Mechanism | Trigger | Scope | Origin |
|---|---|---|---|
ClosureQuarantined | Same closure_hash fails activation 24h | Per-closure, per-host | v0.2 baseline |
HostAttestationQuarantined | Persistent attestation drift or probe failure | Per-host, all closures | RFC-0010 |
Closure quarantine prevents wasted activation cycles on a known-broken release. Attestation quarantine declares “this host is no longer trusted to act in the fleet.” Different lifecycles, different operator surfaces.
4.1 Declarative thresholds
channel.production.attestationQuarantine = {
attestationFailureThreshold = 3; # consecutive AttestationDrift / Invalid
probeFailureThreshold = 5; # consecutive non-Pass under enforce mode
unquarantine = "manual"; # or "auto-after-N-successes"
autoUnquarantineSuccesses = 10;
};
Default off per channel. Operators tune thresholds for their environment before promoting to default-on (likely a v0.4 default).
4.2 State recovery classification
Per docs/design/architecture.md §6’s soft/hard recovery taxonomy (CP-resident state by recovery profile): HostAttestationQuarantined is soft state. After CP rebuild, repeated attestation failures from the same host re-trigger the quarantine within the threshold window. No signed-artifact replay needed because the trigger is observable from agent inputs.
The quarantine threshold configuration is hard state (lives in fleet.resolved.json, signed). The quarantine occurrence record is soft (rebuilt from continued failures).
4.3 Visibility
A quarantined host stays in /v1/hosts output as quarantined since <timestamp>, reason <attestation-drift|probe-failure>, observable to operators and auditors. The framework prefers visible failure to silent eviction.
4.4 Reversibility
For unquarantine = "manual", an operator runs:
nix run .#unquarantine-host -- --hostname <h> --reason "<rationale>"
(matches the no-big-CLI convention - flake app, not a binary subcommand). The action is logged in the host_reports ring with event_kind = HostUnquarantined. For unquarantine = "auto", the CP resumes cert issuance after N consecutive successful checkins with passing attestation.
This is policy on top of v0.2’s existing short-cert design and §1 cert-revocation infrastructure. No new revocation channel is required - the cert lifetime is the revocation horizon, the same way it is for explicit revocations.
5. EK-bound bootstrap tokens
Bootstrap tokens already exist (RFC-0003 §4.5, nixfleet mint-token subcommand, BootstrapToken + TokenClaims in nixfleet-proto). RFC-0010 extends the token claims with one field:
#![allow(unused)]
fn main() {
pub struct TokenClaims {
pub hostname: String,
pub pubkey_fingerprint: String,
pub expected_ek_fingerprint: Option<String>, // NEW - RFC-0010
pub channel: String,
pub expiry: DateTime<Utc>,
pub nonce: [u8; 32],
}
}
When expected_ek_fingerprint is set:
- Operator records the host’s TPM EK pubkey via OOB tooling when the hardware is unboxed (typed into
fleet.nixnext to the host’s other declarations). nixfleet mint-tokenincludes the EK fingerprint in the signed claims.- The agent’s enrollment flow (
POST /v1/enroll) presents an EK quote alongside the bootstrap token + CSR. - The CP verifies: token signature against
orgRootKey, token unused (existingtoken_replay), CSR pubkey matchespubkey_fingerprint, EK in the quote matchesexpected_ek_fingerprint. Mismatch on any of these -> 403 +EnrollmentFailedevent.
expected_ek_fingerprint = None is the v0.2-compatible behavior. Per-host opt-in; once a host enrolls with EK binding, future re-enrollments require a token bound to the same EK (or a fresh token signed after the operator records the new EK following hardware replacement).
This closes “rogue host enrolls itself given a leaked bootstrap token”: even with the token, the attacker would need either the original host’s TPM (impractical) or a token re-issued after the operator recorded the attacker’s EK (operator action, audit-trail visible).
6. Threshold-signed channels
Opt-in per channel. A channel declares:
channel.gov-prod = {
releaseSigners.threshold = "2-of-3";
releaseSigners.signers = [
{ name = "alice"; pubkey = "ssh-ed25519 AAAA..."; }
{ name = "bob"; pubkey = "ssh-ed25519 AAAA..."; }
{ name = "charlie"; pubkey = "ssh-ed25519 AAAA..."; }
];
};
For releases targeting this channel, CI refuses to publish until N hardware-key signatures have been collected on the same canonical bytes.
6.1 Mechanism
The current nixfleet-release pipeline calls a single --sign-cmd hook (../design/architecture.md §10.3). Threshold signing extends this with a multi-process signing session:
- CI evaluates the fleet, builds closures, canonicalizes
fleet.resolved.json(existing pipeline through step 7). - Instead of calling
--sign-cmddirectly, CI writes a signing session to disk:signing-sessions/<session-id>/canonical.jsonplus ametadata.jsondescribing which signers must sign, the diff against the previous release, and the build provenance. - The signing session is published via the existing CI artifact mechanism (Forgejo Actions artifact, or pushed to a known location).
- Each signer runs
nix run .#sign-release -- --session <session-id>on their own workstation. The CLI fetches the session artifact, displays a per-artifact summary (changed hosts, changed compliance frameworks, diff against the previous release on the channel), prompts for YubiKey touch, signs the canonical bytes, uploads the signature back. - When N signatures have arrived, a CI follow-up job stitches them into the release artifact (
fleet.resolved.json+fleet.resolved.threshold.sigcontaining the N signatures + a manifest of which signer signed which). - The CP verifies on fetch: each signature in the threshold sig matches a signer in the channel’s
releaseSigners, the count meets the threshold.
The CI release key (the existing ciReleaseKey) continues to sign automation-friendly artifacts (revocations, rollout manifests). Threshold signing applies only to fleet.resolved.json for opted-in channels.
6.2 Failure cases
- Signer YubiKey lost. That signer is removed from the channel’s
releaseSigners; the threshold continues with N−1 until a replacement is enrolled. IfN − 1 < threshold, the channel cannot release until replacement. - Signer collusion at threshold. If N signers collude, they can sign a malicious release. The framework does not prevent this; it makes it visible (every signature is in the transparency log) and rare (hardware-key requirement). Mitigation is organizational, not cryptographic.
- Signing session expires. Sessions have a 7-day default expiry (per-channel override). Stale sessions are deleted; CI emits a
SigningSessionExpiredevent.
6.3 Out of scope for v0.3
Web-based review UX. v0.3 ships the CLI-based session viewer; richer UI is a separate project (the framework’s scope stops at “the protocol exists and works from a terminal”).
7. Key rotation runbooks
One runbook per root key, each tested in a microvm.nix scenario under tests/harness/scenarios/key-rotation/ (new directory). Runbooks live at docs/runbooks/<key>-rotation.md (new directory).
7.1 CI release key rotation
Uses the existing ciReleaseKey.successor + retireAt mechanism. Operator:
- Generates a new key (typically on a fresh YubiKey).
- Sets
nixfleet.trust.ciReleaseKey.successor = { algorithm; public; }andretireAt = "<RFC3339>"in the flake. Commits. - CP verifiers begin accepting both
currentandsuccessorduring the overlap window. - After
retireAt, the reconciler emitsAction::RotateTrustRoot; operator’s tooling rotatescurrent -> previous,successor -> currentin the next commit. - Old key removed from
previousafter the 30-day grace window per CONTRACTS §II #1.
7.2 Attic cache key rotation
- Generate new attic key on the cache host.
- Stand up a parallel attic publishing closures under the new key (existing
nixfleet.trust.cacheKeysalready a list - both keys present during overlap). - Trigger a CI rebuild that re-pushes all in-use closures to the new cache.
- Once all hosts have converged on closures signed by the new key, remove the old key from
cacheKeysand decommission the old attic.
7.3 Org root key rotation
Catastrophic procedure (24-48 hour recovery). Re-genesis ceremony per §3.2; new threshold shares distributed to operators; bootstrap tokens going forward signed with new key; old key kept valid for in-flight tokens until expiry; then revoked. All hosts re-enrolled with new bootstrap tokens issued under the new root.
7.4 Host TPM key rotation
Operator-initiated re-enrollment (TPM hardware change) or scheduled (every N years per policy).
- New TPM keyslot generated on the host (existing first-boot flow).
- Operator captures new pubkey + EK; updates
fleet.nix; mints new bootstrap token bound to new EK. - Host re-enrolls; old mTLS cert revoked via
revocations.json. - Old host record retired in
dispatch_history.
Each runbook has a microvm scenario (tests/harness/scenarios/key-rotation/<key>.nix) that exercises the procedure end-to-end. Scenarios are part of the nightly test fabric, not the per-PR fast suite.
8. Trust analysis
Lifecycle properties added.
- Each authorization has a documented creation procedure, holding requirement, rotation cadence, and revocation path.
- Bootstrap is single-use, time-bounded, and (with EK binding) hardware-bound.
- Quarantine is observable, reversible, and bounded (one cert renewal cycle to take effect; no new infrastructure needed).
- Threshold signing distributes release authority without introducing a new central authority.
Failure cases not stated above.
- Operator collusion at any threshold. The framework does not prevent it. Mitigation is organizational (separation of duties), not cryptographic.
- Quarantine misclassification (host healthy but attestation flapping due to legitimate issue). Operator unquarantines with rationale logged; investigation drives a fix to the policy or the host. The framework prefers visible failure to silent passing.
9. Migration
Most of RFC-0010 is additive documentation. Mechanism additions are per-host or per-channel opt-in:
- Bootstrap-token EK binding is opt-in via
expected_ek_fingerprint. Pre-RFC-0010 tokens (and re-issued tokens for hosts whose hardware was provisioned without EK capture) work unchanged. - Host-attestation quarantine is opt-in per channel via
attestationQuarantineblock. Default off. - Threshold-signed channels are opt-in per channel. Default is single-signer (the existing
ciReleaseKeyflow). Operators opt in by declaringreleaseSigners. - Operator workflow documentation is the bulk of the work and applies retroactively; no code change required.
10. Work items
- Operator workflow documentation. Runbooks for the four key types in
docs/runbooks/; ceremony scripts intools/keys/; hardware compatibility matrix; transparency-log file format. No new Rust code. - EK-bound bootstrap tokens. Token-claims field,
nixfleet mint-tokensubcommand flag, EK-quote verification at/v1/enroll, single-use enforcement against EK fingerprint. - Active host-attestation quarantine.
attestationQuarantinechannel schema (RFC-0001 additive), CP-side state machine and cert-issuance hook, observable status,unquarantine-hostflake app, microvm scenario. - Threshold-signed channels. Channel schema additions (
releaseSigners), signing-session protocol,sign-releaseCLI flake app, CP-side multi-signature verification. - Key rotation runbooks tested. Each rotation procedure has a microvm scenario that runs in the nightly suite.
These work items are largely independent - the documentation tooling unblocks the rest; runbook validation depends on the mechanisms shipping first.
11. Falsifiable done criteria
- Each of the four root keys has a documented rotation runbook, executed end-to-end in a microvm scenario within the last quarter.
- A bootstrap token’s second use is rejected by the CP (existing v0.2 behavior, retained); a token presented by a host whose EK quote does not match
expected_ek_fingerprintis rejected before any cert is issued. - A persistently failing host is removed from the active fleet view within one mTLS cert renewal cycle, observably and reversibly.
- A threshold-signed release tagged with N−1 signatures is rejected by the CP; the same release with N signatures from valid signers verifies.
- The org root key can be reconstructed from its threshold shares in an air-gapped session, reproducing the exact public key from the recorded share material plus the documented procedure.
- An auditor handed a hostname can produce, from records alone, the full enrollment chain: bootstrap token (signed by org root at time T), EK fingerprint, first mTLS cert issuance, all subsequent rotations.
12. Open questions
- Quarantine auto-recovery threshold. For
unquarantine = "auto", what value of N is right? Probably channel-specific; defaults could be 10 for production, 3 for staging. - Bootstrap token expiry default. 7 days for hardware in transit but not yet racked. Per-channel override allowed. Worth tightening for environments with short logistics windows.
- Threshold-signing session storage. Pushing signing sessions through the existing Forgejo Actions artifact path is the simplest answer, but it means signers need network reach to the forge. Air-gap channels (RFC-0012) need a different transport - likely a bundle in/out of the air-gap. Defer to RFC-0012 v0.4 cycle.
- Transparency log target. Git-tracked append-only file is sufficient for v0.3. v0.4+ may integrate with a public transparency service if customer environments require it.
13. One-sentence summary
Every authorization in the system has a documented birth, life, and death - and a host that lies about its boot state stops being part of the fleet within one cert cycle, observably and reversibly.
RFC-0011: Freshness window policy
Status. Draft.
Targets. v0.3.
Depends on. RFC-0001 (channel schema), RFC-0003 (agent protocol), ../design/architecture.md §5.
Scope. Make replay protection explicit, machine-checkable, and recoverable. Adds: explicit freshness fields on the agent-target wire, time-source policy per channel, operator visibility for stalled channels and long windows, TimeSourceUnavailable event class.
1. Motivation
../design/architecture.md §5 names the threat:
Control plane host is compromised. Attacker can: refuse to serve updates (DoS), serve stale-but-valid targets (replay). Mitigation: agents refuse to accept targets older than a configurable freshness window signed by CI.
v0.2 implements most of this. The CP enforces freshnessWindowMinutes on meta.signedAt at fetch time. The agent has freshness.rs and the fleet-harness-stale-target scenario verifies it. mkFleet requires freshnessWindow per channel and enforces the cross-field invariant freshnessWindow ≥ 2 × signingIntervalMinutes.
What is missing:
- Explicit freshness on the wire. The agent currently derives the window from its local
freshness.rsconfiguration. A channel that needs a tighter window for some hosts has no clean mechanism. The window should ride the signed target. - Time-source policy. v0.2 trusts the host’s local clock implicitly. A maliciously-skewed clock (or just an unsynchronized one) silently breaks the protection.
- Operator visibility. A channel approaching its window expiry should warn before agents start refusing. A long window is a configuration smell that should be visible in fleet status.
- Hard floor. Nothing in v0.2 prevents
freshnessWindow = "30m"on a channel whosesigningIntervalMinutes = 60- the existing invariant catches this case but afreshnessWindow = "5m"on a channel that signs once an hour passes the invariant and produces near-useless replay protection.
This RFC fills those four gaps. Most of v0.2’s freshness machinery is reused; the additions are surface area, not core mechanism.
2. Schema additions
2.1 Channel-level
channels.production = {
rolloutPolicy = "canary-conservative";
signingIntervalMinutes = 60;
freshnessWindow = 1440; # already required, unchanged
freshnessHardFloorMinutes = 60; # NEW - see §2.3, default 60
timeSource = { # NEW - see §4
ntp = [ "time.cloudflare.com" "time.nist.gov" ];
maxSkewSeconds = 300;
};
};
channels.gov-prod = {
# ...
freshnessWindow = 1440;
timeSource = {
signedTime = {
provider = "roughtime";
url = "https://time.gov.example/roughtime";
pubkey = "...";
};
fallback.ntp = [ "internal-ntp.example" ];
maxSkewSeconds = 60;
};
};
2.2 Defaults
| Field | Online channels | Air-gap channels (RFC-0012) |
|---|---|---|
freshnessWindow | required, suggested 24h | required, suggested 90d |
freshnessHardFloorMinutes | 60 (1h) | 60 (1h) - same; air-gap windows are about the upper bound |
timeSource.maxSkewSeconds | 300 (5min) | 60 (1min) - air-gap typically uses signed-time, can be tighter |
timeSource | NTP defaults to ["time.cloudflare.com" "time.nist.gov"] | no NTP default - operator declares signed-time or internal NTP explicitly |
2.3 Hard floor
freshnessWindow < freshnessHardFloorMinutes is rejected at mkFleet evaluation time with a clear error. The floor is per-channel-overridable (rare - for example a channel with signingIntervalMinutes = 5 may want freshnessHardFloorMinutes = 15).
There is no hard ceiling. Long windows are sometimes correct (frozen audit channels, compliance-locked baselines). The framework adds friction (§5) instead of forbidding them.
3. Wire-protocol additions
3.1 Agent target shape
RFC-0003 §4.1 CheckinResponse.target.activate gains explicit freshness fields:
#![allow(unused)]
fn main() {
pub struct ActivateBlock {
// ... existing fields ...
pub fleet_resolved_signed_at: DateTime<Utc>, // NEW
pub freshness_window_seconds: u64, // NEW
pub freshness_hard_floor_seconds: u64, // NEW
pub time_source: TimeSourcePolicy, // NEW (per-channel snapshot)
}
}
These fields are projections from meta.signedAt and the channel’s freshnessWindow / freshnessHardFloorMinutes / timeSource. The CP copies them from the signed fleet.resolved.json into the per-host target response. The CI signature on fleet.resolved.json covers them; the CP cannot widen the window or weaken the time-source policy.
Pre-RFC-0011 agents ignore the new fields (existing serde(default) convention). Post-RFC-0011 agents enforce them in addition to whatever local config they may carry - local config is used only as a fallback when target fields are absent (i.e., when serving from a pre-RFC-0011 CP).
3.2 Agent verification
On every checkin response with a target, the agent verifies, in order:
- Existing v0.2 verifications: rollout-manifest signature, content-address, host membership (RFC-0003 §4.1).
- Time-source freshness: establish local time within
time_source.maxSkewSecondsof the configured time source (§4). On failure: emitTimeSourceUnavailable, refuse to evaluate freshness, hold the current generation, do not converge to the new target. - Freshness:
now() - fleet_resolved_signed_at <= freshness_window_seconds. On failure: emitStaleTargetRejected, refuse to converge, hold the current generation. - Existing v0.2 activation flow if 1-3 pass.
The agent does not stop working on freshness or time-source failures. It stays on the current generation, continues running existing services, and continues to phone home. Only convergence to new targets is blocked. Freshness failure is a control-plane-trust signal, not a host-health problem.
3.3 Event additions (RFC-0003 §4.3)
Two new ReportEvent variants, additive:
StaleTargetRejected { observed_age_seconds, signing_timestamp, freshness_window_seconds }TimeSourceUnavailable { configured_sources, last_attempt_at, last_error }
Both are unsigned (operator-surface, no fleet gate reads them - matching the existing ActivationDeferred / ClosureQuarantined precedent per the v0.2 changelog).
4. Time-source policy
The agent does not trust the CP for time (pull-only model, RFC-0003 §1). It validates its local clock against an independent source declared per channel.
4.1 NTP source
timeSource = {
ntp = [ "time.cloudflare.com" "time.nist.gov" ];
maxSkewSeconds = 300;
};
Agent behavior: verify that the host’s chronyd / systemd-timesyncd reports synchronized within maxSkewSeconds of one of the declared sources. If the host has its own timesync daemon configured (likely - most NixOS hosts do), the agent reads the daemon’s reported skew rather than running its own NTP query.
4.2 Signed-time source
For high-trust environments and air-gap (RFC-0012), a signed-time service is preferable:
timeSource = {
signedTime = {
provider = "roughtime"; # or "tlsdate" - pluggable
url = "https://time.gov.example/roughtime";
pubkey = "...";
};
fallback.ntp = [ "internal-ntp.example" ]; # optional
maxSkewSeconds = 60;
};
Roughtime is the recommended protocol (open spec, deployable). v0.3 ships a generic signed-time fetcher with Roughtime support and a documented adapter pattern for other providers.
4.3 Failure mode
If the agent cannot establish a time source within maxSkewSeconds:
- Refuses to evaluate freshness - neither accepts nor rejects targets.
- Logs
TimeSourceUnavailableevent with last-attempt details. - Continues running the current generation - services do not stop.
This is preferred to silent acceptance: an agent with an unverifiable clock is in an undefined state, and undefined-state agents do not move forward. Operators see this in fleet status (§5) as “unable to assess freshness.”
5. Operator visibility
The CP’s /v1/hosts and /v1/channels/<name> endpoints, and the nixfleet status CLI, surface:
- Current target’s age.
now - fleet_resolved_signed_at. - Distance from window expiry. e.g., “expires in 3h17m”.
- Hosts that have rejected the current target as stale. Count and list. Sourced from the
host_reportsring (StaleTargetRejectedevents). - Hosts with
TimeSourceUnavailable. Count and list.
A channel that has stalled (no new commits for > 75% of freshnessWindow) emits a StaleChannelWarning to operators before the window expires. This is preventive: operators see the warning and either commit a no-op refresh or extend the window with rationale, before agents start refusing.
Channels with freshnessWindow > 7d (online) or > 90d (air-gap) are flagged in fleet status as long-freshness-window - confirm rationale. This is friction by design: long windows are sometimes correct but are also the most common configuration mistake in this area.
6. Edge cases
- Skewed local clock.
maxSkewSecondsenforcement catches it. Operator runs NTP/chrony correction; until corrected, the host stays on its current generation. - Channel intentionally stalled (frozen for audit). Operator extends the window:
channels.audit-frozen.freshnessWindow = 60dwith a rationale comment. The framework does not assume frozen channels are unintentional. - Air-gap import older than the channel’s freshness window. RFC-0012 §6 covers air-gap freshness; the bundle’s signing time is what matters.
- Wave promotion when freshness will expire mid-rollout. The reconciler warns at rollout-start time if
target.age + estimated_rollout_duration > freshness_window. The operator either restarts the rollout against a fresh target or extends the window. - Agent online but unable to reach NTP (firewalled environment). Falls into
TimeSourceUnavailable. Operator either provides an internal NTP source or moves the channel tosignedTime.
7. Trust analysis
What this RFC adds. A concrete, machine-checkable replay-protection contract. A compromised CP serving a stale-but-valid target now fails closed within freshnessWindow of the original signing time - without operator action, regardless of agent-local configuration drift.
What it does not add. Protection against an attacker who can also tamper with the agent’s time source. Mitigation is the channel-declarable time-source policy: high-trust channels use signed time, low-trust channels use public NTP. The operator has explicit visibility into which choice each channel made (it’s in the channel definition, git-tracked, in the protected-branch review path).
Failure mode worth stating. A misconfigured freshnessWindow = "1y" on a production channel turns this protection off in practice. The hard floor (default 1h) prevents the obvious mistake; long-window flagging in fleet status is the protection against the less-obvious one. Channel definitions are git-tracked; freshnessWindow should be in the protected-branch review path.
8. Deliverable
This RFC is small enough to land as one deliverable.
- Freshness hardening. Five sub-pieces, parallelizable:
- Schema additions (
freshnessHardFloorMinutes,timeSource); mkFleet enforcement. - Wire-shape additions (
AgentTargetfreshness fields); CP populates from signed source. - Agent enforcement of
freshness_window_secondsfrom the target (in addition to local config). - Time-source policy: NTP synchronization check via the host’s existing timesync daemon;
TimeSourceUnavailableevent class. - Operator-visibility additions:
StaleChannelWarning, long-window flagging,nixfleet statuscolumns.
- Schema additions (
Roughtime / signed-time-source support is a separate sub-piece (optional for v0.3) - substantial dependency, useful but not blocking the rest. Lean: ship NTP-based time-source in v0.3, Roughtime as v0.3.x or v0.4 follow-up.
9. Falsifiable done criteria
- A CP that serves an artificially-aged
fleet.resolved.json(timestamp moved back beyondfreshness_window_seconds) is detected by every agent that polls it; agents emitStaleTargetRejected; no host converges. - An agent whose host has a maliciously-skewed local clock (forward by >
maxSkewSeconds) refuses to evaluate freshness and emitsTimeSourceUnavailable. - A channel that has stalled past 75% of its window emits a
StaleChannelWarningvisible in fleet status before any agent has rejected. - The hard floor cannot be bypassed: a
freshnessWindow = "30m"declaration on a channel with the defaultfreshnessHardFloorMinutes = 60is rejected at evaluation time with a clear error. - A channel with
freshnessWindow > 7d(online) is flagged in fleet status with the long-window indicator.
10. Open questions
- Default
maxSkewSeconds. 5 minutes balances NTP accuracy against replay window. Open to tightening (1 minute) for high-trust channels - already the suggested air-gap default. - Time-source-daemon coupling. Reading skew from
chronydvssystemd-timesyncdrequires backend-specific code. v0.3 ships systemd-timesyncd support (most NixOS hosts) + a documented extension pattern. Chrony adapter as needed. - Roughtime adoption. Open spec but limited deployment. v0.3 ships the integration; whether to recommend it depends on whether customer environments have a Roughtime server reachable (most don’t yet).
- Stale-channel warning threshold. 75% is a guess. Worth surveying after a quarter of operation against real channels.
11. One-sentence summary
Every signed target carries an expiry the agent independently verifies against a declared time source; a CP that lies about freshness - or a host whose clock is wrong - is detected within a poll cycle, no operator action required.
RFC-0012: Air-gapped operation
Status. Draft. Targets. v0.3. Depends on. ../design/architecture.md (especially §5 control-plane failure case), RFC-0001 (channel schema), RFC-0003 (agent protocol), RFC-0011 (freshness in air-gap). Scope. First-class deployment mode for environments with no internet egress: energy operators, water utilities, defense-adjacent contractors, healthcare critical systems. The trust model already supports this - every artifact is self-verifying. This RFC makes the workflow explicit, the sovereign-cache transport role explicit, and ships the bundle tooling.
1. Motivation
The v0.2 trust model is air-gap-ready by accident: closures are content-addressed and signed by attic, fleet.resolved.json is signed by CI, agents verify everything against pinned trust roots, the CP holds no secrets. None of these properties depend on internet reach.
What is missing is the workflow. An operator running a regulated air-gap site needs to know:
- How releases enter the air-gap (transport, format, verification).
- What the sovereign cache’s role is (re-signing? transport-only?). This matters: re-signing means a new trust root inside the air-gap, transport-only means the existing trust roots cover everything.
- How freshness applies when bundles are days or weeks old by design (RFC-0011 cross-reference).
- How key rotation crosses the air-gap (RFC-0010 §7 cross-reference).
The mechanism is small. The contract is the bulk of the work.
2. Non-goals
- Two-way air-gap. Telemetry, support bundles, or any reverse channel from the air-gap is the customer’s responsibility. RFC-0012 covers the inbound path only.
- Real-time sync. By definition not possible. Channels in air-gap mode update at human cadence.
- Auto-discovery of new releases from inside the air-gap. The operator pulls a bundle, validates it, imports it. There is no automatic mechanism that bridges the gap.
3. Model
Three environments connected by signed bundles:
online build env air-gap entry point air-gapped fleet
───────────────── ────────────────── ─────────────────
Forgejo + CI signed-bundle inbox sovereign attic
attic + signing keys ───────▶ verification host ────▶ control plane
fleet.resolved + closures bundle import tool agents
(validates, signs receipt)
- The build environment is unchanged from v0.2.
- The air-gap entry point is a documented station (typically a kiosk machine with a known boot image) that accepts bundles from approved media (USB, one-way data diode, signed optical media), verifies them against the configured trust roots, and pushes them into the sovereign cache.
- The sovereign cache is just
attic, run inside the air-gapped environment.
3.1 Sovereign attic is transport-only
This is the load-bearing decision. The sovereign attic does not re-sign closures. Closures imported from a bundle keep their original attic-key signatures (the same key that signed them in the online build environment, declared in nixfleet.trust.cacheKeys). The sovereign attic forwards bytes; it does not re-attest to them.
Consequence: agents inside the air-gap trust the same cacheKeys they would trust online. There is no “sovereign cache key” trust root to manage. A compromised sovereign attic cannot inject malicious closures because it cannot produce signatures under a key the agents trust.
Operationally this means: the sovereign attic’s own internal signing key (attic generates one per instance) is unused by the framework - agents never check it. Setting attic up in a “no signing required” mode is the recommended deployment.
If a customer wants their sovereign cache to also re-sign for defense-in-depth (e.g., to prove “this closure passed the air-gap import check”), that can be layered on top - the agent supports multiple cacheKeys simultaneously per the existing v0.2 contract. Out of scope for the framework’s recommended deployment.
4. Bundle format
A bundle is a signed tarball containing:
bundle-2026-05-14.tar
├── manifest.json # bundle metadata, signed by CI release key
├── manifest.json.sig
├── fleet/
│ ├── fleet.resolved.json # signed per RFC-0001
│ ├── fleet.resolved.json.sig
│ ├── revocations.json # signed per the CP-resident-state recovery-profile policy in docs/design/architecture.md §6
│ └── revocations.json.sig
├── rollouts/
│ ├── <rolloutId>.json # signed per RFC-0002 §4.4
│ └── <rolloutId>.json.sig
├── closures/
│ └── <hash>.nar.xz # closure tarballs (already attic-signed inline; no separate .sig file)
└── import-instructions.md # operator-readable procedure (humans, not machines)
The manifest declares: which channels this bundle updates, the previous channel-pointer it expects (so out-of-order imports are detected), the CI commit range, and the bundle’s expiry. Bundles older than the channel’s air-gap freshness window (RFC-0011) are rejected at import.
The bundle has its own signature on the manifest, in addition to the per-artifact signatures. Reasoning: early failure detection at verify time, before any artifact is opened. A tampered bundle fails the manifest signature check immediately rather than being detected piecewise as artifacts are extracted.
Closure signatures live inline in the .nar.xz archive (per attic’s existing format); no separate .sig file in the bundle for closures. Per-closure verification happens at sovereign-attic import time AND at agent-fetch time, against the same cacheKeys trust root.
5. Tooling
A new crate, nixfleet-bundle, matches the existing nixfleet-release / nixfleet-verify-artifact pattern (single-purpose binary, no daemon, no state):
# online build environment
nixfleet-bundle export \
--channel stable \
--since <previous-bundle-ref> \
--output ./bundle-2026-05-14.tar
# air-gap entry point (offline)
nixfleet-bundle verify ./bundle-2026-05-14.tar
nixfleet-bundle import ./bundle-2026-05-14.tar \
--sovereign-cache https://attic.internal.example
bundle verify is a separate command from import deliberately: in higher-security environments the verification host and the import host are different machines with different access policies. A combined non-interactive form (nixfleet-bundle apply) is provided for one-way diode setups where verify-then-import in two steps is impractical.
The framework also exposes flake apps that wrap the binary for the common cases: nix run .#bundle-export -- --channel stable, nix run .#bundle-verify -- ./bundle.tar, etc. The binary is the lower-level interface; the flake apps are operator ergonomics.
6. Freshness in air-gap
Channels in air-gap mode declare an explicit longer freshness window per RFC-0011, plus an air-gap-specific staleness for the bundle itself:
channels.airgap-prod = {
airgap.enabled = true;
airgap.maxStaleness = "30d"; # bundle import freshness
freshnessWindow = 129600; # 90d in minutes - CI-signing-time freshness
timeSource = {
signedTime = { provider = "roughtime"; url = "..."; pubkey = "..."; };
fallback.ntp = [ "internal-ntp.example" ];
maxSkewSeconds = 60;
};
};
Two timestamps matter:
- Bundle signing time - when CI produced the artifacts. Compared against the channel’s
freshnessWindowper RFC-0011; this is the agent’s replay-protection contract. - Bundle import time - when the sovereign cache received the bundle. Compared against
airgap.maxStaleness; this is the operator’s “are we current?” contract.
The agent uses signing time, not import time, for freshness verification. Import time is operator metadata recorded in the import receipt (§7) and surfaced in fleet status; it does not gate convergence.
Agents that have been offline since before the most recent import use the import time as a recovery anchor (i.e., for computing “how long have we been operating on a stale view of the channel”); operators see this as a per-host staleness indicator distinct from the channel-level freshness window.
For time source: air-gap channels MUST NOT use the public NTP defaults from RFC-0011 §4 (Cloudflare/NIST aren’t reachable). The framework refuses to evaluate an air-gap channel without an explicit timeSource declaration. Recommended: a signed-time service (Roughtime or equivalent) with an internal NTP fallback. Internal NTP-only is acceptable for less stringent environments.
7. Control plane in air-gap
The CP runs inside the air-gap with no special configuration. It polls the sovereign cache (or receives a webhook from nixfleet-bundle import) for new channel pointers. Its signature verification continues as in v0.2: it verifies CI signatures on fleet.resolved.json, regardless of whether the bundle came over the internet or via USB.
The CP in air-gap holds the same trust roots as the online version. Trust origins (org root, CI release key, attic cache key) are deployed into the air-gap at the same enrollment time as the rest of the infrastructure. RFC-0010 §7 rotation procedures apply with one additional step: “rotation envelope traverses the air-gap as a bundle.”
The import receipt is a small signed JSON written by nixfleet-bundle import to a known location:
{
"bundleSha256": "...",
"importedAt": "2026-05-14T10:23:00Z",
"operator": "alice",
"verifiedSignatures": [ "ciReleaseKey:...", "atticKey:..." ]
}
Receipt is signed by the import operator’s key (an SSH key registered for this purpose; not part of the framework’s trust chain - purely operator-facing accountability). Surfaced in fleet status alongside channel staleness.
8. Operator procedure (compact form)
1. online: nixfleet-bundle export --channel <c> --since <prev> --output bundle.tar
2. transfer bundle to air-gap entry point via approved media
3. air-gap entry point: nixfleet-bundle verify bundle.tar
- verifies manifest signature against trusted CI key
- verifies fleet.resolved.json + revocations.json + each rollout manifest signature
- verifies bundle expiry vs current air-gap clock
- verifies channel pointer expectations (previous-pointer matches)
4. air-gap entry point: nixfleet-bundle import bundle.tar
- re-verifies (idempotent; survives operator running verify on a different host)
- pushes closures into sovereign attic (no re-signing - pass-through)
- publishes fleet/revocations/rollout artifacts to a path the CP polls
- records signed import receipt
5. CP on next poll picks up the new channel pointer, reconciles normally
6. agents on next poll fetch the new target, fetch closures from sovereign attic, activate
The full chain, online commit to first agent activation, is human-paced (typically minutes to hours depending on operator process) but is end-to-end signature-verified at every step.
9. Failure cases
- Bundle signature invalid. Rejected at verify; never enters the sovereign cache.
- Bundle expired. Rejected at verify; operators must re-export.
- Out-of-order bundle (skips an expected previous channel pointer). Rejected unless
--allow-skipis passed with a rationale; logged. - Sovereign cache compromised. Closures still verify against pinned
cacheKeyson agents; an attacker who replaces a closure cannot make agents accept it. DoS is possible (delete or block fetch); fleet stalls until the cache is restored from re-imported bundles. - Operator imports a bundle to the wrong channel. Channel-pointer signatures bind to channel name; mismatched bundle is rejected at verify.
- Bundle imported but never reaches agents (network partition inside air-gap). Agents cache last known target and continue running; the new target activates when the partition heals.
- Time-source unavailable inside air-gap. Per RFC-0011 §4.3: agents refuse to evaluate freshness, hold current generation, emit
TimeSourceUnavailable. Operator either restores the signed-time service or extends the channel’sfreshnessWindowwith rationale. - Import operator’s signing key compromised. Import receipts under that key become untrustworthy; subsequent imports use a new key. The receipts are accountability metadata, not part of the agent-verification chain - no agent action required.
10. Trust analysis
Properties retained from v0.2.
- Every artifact is self-verifying against pinned trust roots.
- The CP holds no secrets and forges no trust.
- A compromised sovereign cache cannot inject malicious closures.
Properties added.
- Documented bundle format with a manifest signature for early-failure detection.
- Explicit operator workflow with a verified import receipt.
- Explicit air-gap freshness contract that does not weaken the online freshness contract.
What this RFC does not protect against.
- A compromised CI release key signing a malicious bundle. RFC-0010 (threshold-signed channels) is the answer for high-stakes air-gap deployments.
- An operator importing a malicious bundle whose signatures verify because the attacker has the keys. Same as above.
- A leaked import receipt key being used to fake an import. Fix: rotate the receipt key.
11. Deliverable
nixfleet-bundlecrate + bundle format + verify/import + air-gap channel schema. Single deliverable, all sub-pieces tightly coupled:- Crate scaffold; bundle manifest types in
nixfleet-proto. bundle export(online side).bundle verify(offline, no network).bundle import(offline, writes to sovereign attic + CP-polled paths).- Air-gap channel schema (
airgap.enabled,airgap.maxStaleness); mkFleet enforcement of explicittimeSourcefor air-gap channels. - microvm.nix scenario simulating the full pipeline (online build -> bundle -> offline verify -> import -> agent activation).
- Crate scaffold; bundle manifest types in
12. Falsifiable done criteria
- A complete air-gap workflow can be demonstrated end-to-end: online commit -> bundle export -> physical transfer (simulated as
cpin the microvm scenario) -> verify -> import -> agent activation, with every step independently signature-verifiable. - A bundle with one bit flipped in any signed component is rejected at verify.
- A CP operating in air-gap can complete a full reconcile cycle with no DNS, no NTP egress, and no internet-bound traffic of any kind.
- The sovereign cache can be lost and rebuilt from re-imported bundles without fleet impact beyond fetch latency.
- An auditor inside the air-gap can produce the full provenance chain for any host’s current closure: which bundle imported it, when, who approved the import, what CI commit produced it.
- An air-gap channel declared without an explicit
timeSourceis rejected at evaluation time with a clear error.
13. Open questions
- Telemetry from inside the air-gap. Some customers want a one-way channel for “fleet healthy” beacons exfiltrated for upstream support. Out of scope here; deserves its own spec. Likely solution: a signed daily summary written to a documented path, picked up by the customer’s existing one-way egress process.
- Diode-friendly tooling. Some environments use one-way data diodes that prohibit bidirectional handshakes. The combined
bundle applycommand should be testable without any return path; verify this with a customer who actually uses diodes before declaring done. - Bundle compression and partial transfer. For very large fleets, full closure transfer over USB media may be impractical. Worth specifying a partial-bundle format (delta against previous) before the first large-fleet pilot. Defer to v0.4 unless a customer asks.
- Threshold signing across the air-gap. RFC-0010 §6 signing sessions assume a forge-reachable transport. Air-gap threshold signing needs a session-bundle round trip. RFC-0010 §12 lists this as an open question; resolution is a v0.4 cycle.
14. One-sentence summary
The air-gap is a USB cable’s worth of latency between commit and convergence - every artifact still self-verifies against the same trust roots, the sovereign cache forwards bytes without re-signing, and the workflow is documented as a first-class deployment mode rather than a clever derivation.