Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Agent Options

All options under services.nixfleet-agent. The module is auto-included by mkHost and disabled by default.

Top-level options

OptionTypeDefaultDescription
enableboolfalseEnable the NixFleet fleet management agent.
controlPlaneUrlstr– (required when enabled)URL of the NixFleet control plane. Example: "https://fleet.example.com".
machineIdstrconfig.networking.hostNameMachine identifier reported to the control plane.
pollIntervalint60Steady-state poll interval in seconds. The control plane may override this for individual cycles via a poll_hint field in the desired-generation response (set to 5 during active rollouts), letting the agent react to new deploys within seconds without reducing the steady-state polling rate.
retryIntervalint30Retry interval in seconds after a failed poll (network error, CP not ready, fetch failure, bootstrap race). Shorter than pollInterval so the agent recovers quickly from transient failures without flooding the CP.
cacheUrlnullOr strnullGlobal binary cache URL for fetching closures. Resolution order: (1) per-generation cache_url from the release entry; (2) this option if set; (3) if neither is set, the agent verifies the store path exists locally via nix path-info - the path must be pre-pushed out-of-band. Example: "http://cache:5000".
dbPathstr"/var/lib/nixfleet/state.db"Path to the SQLite state database.
dryRunboolfalseWhen true, check and fetch but do not apply generations.
tagslistOf str[]Tags for grouping this machine in fleet operations. Passed via NIXFLEET_TAGS environment variable.
healthIntervalint60Seconds between continuous health reports to the control plane.
allowInsecureboolfalseAllow insecure HTTP connections to the control plane. Development only.
tls.clientCertnullOr strnullPath to client certificate PEM file for mTLS authentication. Example: "/run/secrets/agent-cert.pem".
tls.clientKeynullOr strnullPath to client private key PEM file for mTLS authentication. Example: "/run/secrets/agent-key.pem".
metricsPortnullOr portnullPort for agent Prometheus metrics HTTP listener. Null disables metrics.
metricsOpenFirewallboolfalseOpen the metrics port in the firewall. Only effective when metricsPort is set.

healthChecks.systemd

List of systemd unit health checks.

Sub-optionTypeDefaultDescription
unitslistOf strSystemd units that must be active.

Example:

services.nixfleet-agent.healthChecks.systemd = [
  { units = ["nginx.service" "postgresql.service"]; }
];

healthChecks.http

List of HTTP endpoint health checks.

Sub-optionTypeDefaultDescription
urlstrURL to GET.
intervalint5Check interval in seconds.
timeoutint3Timeout in seconds.
expectedStatusint200Expected HTTP status code.

Example:

services.nixfleet-agent.healthChecks.http = [
  { url = "http://localhost:8080/health"; }
  { url = "https://localhost:443"; expectedStatus = 200; timeout = 5; }
];

healthChecks.command

List of custom command health checks.

Sub-optionTypeDefaultDescription
namestrCheck name.
commandstrShell command (exit 0 = healthy).
intervalint10Check interval in seconds.
timeoutint5Timeout in seconds.

Example:

services.nixfleet-agent.healthChecks.command = [
  {
    name = "disk-space";
    command = "test $(df --output=pcent / | tail -1 | tr -d ' %') -lt 90";
    interval = 30;
    timeout = 5;
  }
];

Prometheus Metrics

When metricsPort is set, the agent starts a Prometheus HTTP listener on that port. Null (the default) disables the listener.

Metrics exposed:

MetricDescription
nixfleet_agent_stateCurrent phase of the deploy cycle (idle, checking, fetching, applying, verifying, reporting, rolling_back) encoded as a label
nixfleet_agent_poll_duration_secondsDuration of the last poll cycle
nixfleet_agent_last_poll_timestamp_secondsUnix timestamp of the last completed poll
nixfleet_agent_health_check_duration_secondsDuration of the last health check run
nixfleet_agent_health_check_statusResult of the last health check (1 = healthy, 0 = unhealthy)
nixfleet_agent_generation_infoNix store path of the current active generation (as a label)

Metrics are served in the standard Prometheus text format at GET /metrics.

Example configuration:

services.nixfleet-agent = {
  enable = true;
  controlPlaneUrl = "https://fleet.example.com";
  metricsPort = 9101;
  metricsOpenFirewall = true;
};

Systemd service

The agent runs as a privileged root systemd service:

SettingValue
Targetmulti-user.target
Afternetwork-online.target, nix-daemon.service
Restartalways (30s delay)
StateDirectorynixfleet
NoNewPrivilegestrue
PATH${config.nix.package}/bin:${pkgs.systemd}/bin
EnvironmentXDG_CACHE_HOME=/var/lib/nixfleet/.cache

Hardening rationale. The agent runs switch-to-configuration as a subprocess, which needs full system access (/dev, /home, cgroups, kernel modules). Sandboxing (e.g. PrivateDevices, ProtectHome) would break these operations. The threat model is equivalent to sudo nixos-rebuild switch as a daemon. NoNewPrivileges = true is kept to prevent setuid escalation.

  • nix is in PATH for nix copy and nix path-info.
  • XDG_CACHE_HOME points into the state directory so nix metadata cache persists on impermanent hosts.

Health check configuration is written to /etc/nixfleet/health-checks.json and passed via --health-config.

On impermanent hosts, /var/lib/nixfleet is automatically persisted (including the XDG cache subdirectory).