NixFleet
Declarative NixOS fleet management with staged rollouts and automatic rollback.
NixFleet combines a thin configuration framework (mkHost) with an optional orchestration layer (agent + control plane) for fleet-wide deployments. The framework builds on standard NixOS tooling - it doesn’t replace nixos-rebuild or nixos-anywhere, it adds reproducible multi-host configuration and health-driven deployment safety on top.
Start with the Quick Start.
Guide
NixFleet documentation, from first deployment to fleet-wide operations.
Getting Started
- Quick Start - define hosts, deploy, enable fleet orchestration
- Design Guarantees - properties that hold across every deployment
- Installation - remote install, rebuild, macOS, ISO, VM testing
Core Concepts
- Defining Hosts - the mkHost API, hostSpec flags, scopes
- Deploying - standard tools, control plane, agent, rollouts
- Operating - fleet status, rollback, impermanence
- Extending - custom scopes, secrets, templates
Quick Start
Define a fleet, deploy your first host, and enable orchestration - all in 15 minutes.
Prerequisites
- Nix with flakes enabled (
experimental-features = nix-command flakesin~/.config/nix/nix.conf) - SSH access to at least one target machine (root login or
nixos-anywherecompatible)
1. Create a Fleet
Create a new directory and initialize a flake.nix:
# flake.nix
{
inputs = {
nixfleet.url = "github:arcanesys/nixfleet";
nixpkgs.follows = "nixfleet/nixpkgs";
};
outputs = {nixfleet, ...}: {
nixosConfigurations.web-01 = nixfleet.lib.mkHost {
hostName = "web-01";
platform = "x86_64-linux";
hostSpec = {
timeZone = "UTC";
};
modules = [
nixfleet.scopes.roles.server
./hosts/web-01/hardware-configuration.nix
./hosts/web-01/disk-config.nix
{
nixfleet.operators = {
primaryUser = "deploy";
users.deploy = {
isAdmin = true;
sshAuthorizedKeys = [
"ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAA... you@workstation"
];
};
};
}
];
};
nixosConfigurations.web-02 = nixfleet.lib.mkHost {
hostName = "web-02";
platform = "x86_64-linux";
hostSpec = {
timeZone = "UTC";
};
modules = [
nixfleet.scopes.roles.server
./hosts/web-02/hardware-configuration.nix
./hosts/web-02/disk-config.nix
{
nixfleet.operators = {
primaryUser = "deploy";
users.deploy = {
isAdmin = true;
sshAuthorizedKeys = [
"ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAA... you@workstation"
];
};
};
}
];
};
};
}
Each call to mkHost returns a full nixosSystem. The server role imports the base, operators, firewall, secrets, monitoring, and impermanence scopes. The operators scope manages user accounts - primaryUser is the identity anchor for Home Manager, secrets, and impermanence paths. The framework also injects disko and the fleet agent/control-plane service modules (disabled by default).
Tip: Run
git init && git add -Abefore anynixcommand. Flakes only see files tracked by git.
2. Deploy the First Host
Use standard NixOS tooling. No custom scripts.
# Fresh install (wipes disk, installs NixOS)
nixos-anywhere --flake .#web-01 root@192.168.1.10
# Subsequent rebuilds
nixos-rebuild switch --flake .#web-01 --target-host root@192.168.1.10
Repeat for web-02. At this point you have two independently managed NixOS machines. Everything below is optional.
3. Enable Fleet Orchestration
Add the control plane to web-01 and the fleet agent to both hosts. Create a shared module:
# modules/fleet-agent.nix
{config, ...}: {
services.nixfleet-agent = {
enable = true;
controlPlaneUrl = "http://web-01:8080";
tags = ["web"];
healthChecks.http = [
{
url = "http://localhost:80/health";
interval = 5;
timeout = 3;
expectedStatus = 200;
}
];
};
}
Then add the control plane to web-01:
# modules/control-plane.nix
{
services.nixfleet-control-plane = {
enable = true;
listen = "0.0.0.0:8080";
openFirewall = true;
};
}
Extract the operators config into a shared module so both hosts use the same user definition:
# modules/operators.nix
{
nixfleet.operators = {
primaryUser = "deploy";
users.deploy = {
isAdmin = true;
sshAuthorizedKeys = [
"ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAA... you@workstation"
];
};
};
}
Include all modules in your mkHost calls:
nixosConfigurations.web-01 = nixfleet.lib.mkHost {
hostName = "web-01";
platform = "x86_64-linux";
modules = [
nixfleet.scopes.roles.server
./hosts/web-01/hardware-configuration.nix
./hosts/web-01/disk-config.nix
./modules/fleet-agent.nix
./modules/control-plane.nix
./modules/operators.nix
];
};
nixosConfigurations.web-02 = nixfleet.lib.mkHost {
hostName = "web-02";
platform = "x86_64-linux";
modules = [
nixfleet.scopes.roles.server
./hosts/web-02/hardware-configuration.nix
./hosts/web-02/disk-config.nix
./modules/fleet-agent.nix
./modules/operators.nix
];
};
Rebuild both hosts to activate the agent and control plane.
4. Deploy to the Fleet
First-time setup - create a config file and bootstrap the admin API key:
nixfleet init \
--control-plane-url https://cp.example.com:8080 \
--ca-cert ./fleet-ca.pem \
--cache-url http://cache.example.com:5000 \
--push-to ssh://root@cache.example.com
nixfleet bootstrap
This writes .nixfleet.toml to the repo and saves the API key to ~/.config/nixfleet/credentials.toml. Subsequent commands run with no flags.
Now deploy - the one-command form builds all targeted hosts, pushes them to the cache, registers a release, and triggers a canary rollout:
nixfleet deploy --push-to ssh://root@cache.example.com --tags web --strategy canary --wait
Or split it into explicit steps if you want to inspect or replay the release:
nixfleet release create --push-to ssh://root@cache.example.com
# Output: Release rel-abc123 created (2 hosts)
nixfleet deploy --release rel-abc123 --tags web --strategy canary --wait
The --strategy flag controls rollout behavior:
all-at-once- deploy to every matching host simultaneously (default)canary- deploy to one host first, verify health, then continuestaged- deploy in configurable batch sizes (--batch-size 1,25%,100%)
The agent checks health (http://localhost:80/health) after each switch. On failure, it automatically rolls back to the previous generation. The control plane verifies each machine reports its NEW current_generation before accepting health as proof of successful deployment.
5. Check Fleet Status
nixfleet status
nixfleet status --json
Next Steps
- Design Guarantees - properties that hold across every NixFleet deployment
- Installation - detailed install methods, ISO builds, troubleshooting
- Rollouts - batch sizes, failure thresholds, pause/resume
- The mkHost API - all parameters and what the framework injects
Design Guarantees
These are not features you enable. They are properties that emerge from the architecture.
| Property | What it means | How the architecture delivers it |
|---|---|---|
| Reproducibility | Same configuration produces an identical system, every time, on any machine. | The Nix store is content-addressed - every package is identified by a cryptographic hash of its inputs. flake.lock pins every dependency to an exact revision. The follows chain ensures nixpkgs, home-manager, disko, and impermanence all resolve to one consistent version. |
| Immutability | Running systems cannot drift from their declared configuration. | The Nix store is read-only - no process can modify installed software in place. With optional ephemeral root (impermanence), the entire root filesystem is wiped and recreated from configuration on every boot, eliminating accumulated state. |
| Atomic rollback | Recover from any deployment in seconds, not minutes. | NixOS generations are atomic filesystem switches - the previous generation remains intact in the Nix store. The fleet agent auto-rolls back on health check failure. Manual rollback is a single command: nixfleet rollback --host web-01 --ssh. |
| Auditability | Every change to every system is traceable to a commit. | Configuration is Git-native - the entire system state is defined in version-controlled Nix files. The control plane maintains a deployment audit log, a release registry (immutable manifests of per-host store paths), and a rollout event timeline for every host. Releases can be diffed with nixfleet release diff <A> <B>. |
| Supply chain integrity | The complete dependency tree of every system is known and verifiable. | flake.lock records the cryptographic hash of every input. Builds are reproducible - the same inputs always produce the same output hash. No implicit dependencies, no untracked downloads during build. |
| Graceful degradation | The fleet survives a control plane outage without disruption. | The architecture uses a polling model - agents independently pull desired state on a configurable interval (default: 60s, with a poll_hint-driven fast path of 5s during active rollouts, and 30s retries on transient failures). If the control plane is unreachable, agents continue running their last-known-good generation. There is no single point of failure; each host is a self-contained NixOS system that operates independently. |
These properties hold whether you use the full orchestration layer or just mkHost with standard NixOS commands.
Installation
NixFleet uses standard NixOS/Darwin tooling for installation. No custom deploy scripts.
NixOS - Remote Install
Install a fresh machine over SSH using nixos-anywhere:
nixos-anywhere --flake .#web-01 root@192.168.1.10
The target machine needs SSH access and must be booted into a NixOS installer or any Linux with kexec support. nixos-anywhere handles disk partitioning (via disko), NixOS installation, and the first boot.
Options
# Provision extra files (e.g. host keys, pre-generated secrets)
nixos-anywhere --flake .#web-01 --extra-files ./secrets root@192.168.1.10
# Build on the remote machine (useful for aarch64 targets without cross-compilation)
nixos-anywhere --flake .#web-01 --build-on-remote root@192.168.1.10
NixOS - Rebuild
For machines already running NixOS:
# Local rebuild
sudo nixos-rebuild switch --flake .#web-01
# Remote rebuild
nixos-rebuild switch --flake .#web-01 --target-host root@192.168.1.10
macOS
For Darwin hosts (Apple Silicon or Intel), use nix-darwin:
darwin-rebuild switch --flake .#macbook
The mkHost function detects aarch64-darwin or x86_64-darwin platforms and calls darwinSystem instead of nixosSystem, injecting the appropriate Darwin core module and Home Manager integration.
Custom ISO
Build an installer ISO with your fleet’s SSH keys and base configuration pre-baked:
nix build .#iso
The resulting ISO is written to result/iso/. Flash it to USB and boot target machines for a known-good starting point before running nixos-anywhere.
VM Testing
Test host configurations in QEMU before deploying to real hardware.
Prerequisites: Your fleet must set nixfleet.isoSshKeys with a public key whose private half is on your machine (~/.ssh/id_ed25519.pub). The sshAuthorizedKeys in your hostSpec should use the same key. VM commands SSH into the ISO installer using this key - if it doesn’t match, SSH will hang.
# Install a host into a persistent VM disk (build ISO + nixos-anywhere)
nix run .#build-vm -- -h web-01
# Start the installed VM as a headless daemon
nix run .#start-vm -- -h web-01
# Full VM test cycle (build, install, reboot, verify, cleanup)
nix run .#test-vm -- -h web-01
See VM Tests for details on writing VM test assertions.
Troubleshooting
SSH connection refused
nixos-anywhere requires root SSH access on the target. Verify:
ssh root@192.168.1.10 echo ok
If the target is a fresh installer image, root login is usually enabled by default. For existing systems, ensure services.openssh.enable = true and users.users.root.openssh.authorizedKeys.keys includes your public key.
Build fails with “path not found”
Flakes only see files tracked by git. If you just created or moved files:
git add -A
Then retry the build.
Missing state on impermanent hosts
Hosts with nixfleet.impermanence.enable = true wipe root on every boot. If a service loses state after reboot, its data directory must be added to the persistence configuration. The agent and control plane modules handle this automatically - their state directories (/var/lib/nixfleet, /var/lib/nixfleet-cp) are persisted when impermanence is active.
For other services, add persist paths in your modules:
environment.persistence."/persist".directories = [
"/var/lib/my-service"
];
The mkHost API
mkHost is the single entry point for defining hosts in NixFleet. It is a closure over framework inputs (nixpkgs, home-manager, disko, impermanence, microvm) that returns a standard nixosSystem or darwinSystem.
The result is a standard NixOS/Darwin system configuration. All existing NixOS tooling (nixos-rebuild, nixos-anywhere, darwin-rebuild) works unchanged.
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
hostName | string | yes | Machine hostname |
platform | string | yes | x86_64-linux, aarch64-linux, aarch64-darwin, x86_64-darwin |
stateVersion | string | no | NixOS/Darwin state version (default: "24.11") |
hostSpec | attrset | no | Host configuration flags. See hostSpec |
modules | list | no | Additional NixOS/Darwin modules |
isVm | bool | no | Inject QEMU VM hardware (default: false) |
For the full parameter reference, injected module order, return types, Home Manager integration, and exports, see the mkHost API reference.
Examples
Single host
The simplest pattern. One machine, one repo, no fleet infrastructure.
# flake.nix
{
inputs = {
nixfleet.url = "github:arcanesys/nixfleet";
nixpkgs.follows = "nixfleet/nixpkgs";
};
outputs = {nixfleet, ...}: {
nixosConfigurations.myhost = nixfleet.lib.mkHost {
hostName = "myhost";
platform = "x86_64-linux";
hostSpec = {
userName = "alice";
timeZone = "US/Eastern";
locale = "en_US.UTF-8";
sshAuthorizedKeys = [
"ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAA..."
];
};
modules = [
./hardware-configuration.nix
./disk-config.nix
];
};
};
}
Deploy with standard NixOS tooling:
nixos-anywhere --flake .#myhost root@192.168.1.50 # fresh install
sudo nixos-rebuild switch --flake .#myhost # local rebuild
Multi-host fleet with org defaults
Define shared defaults in a let binding and merge per-host overrides. This example uses flake-parts.
# fleet.nix (flake-parts module)
{config, ...}: let
mkHost = config.flake.lib.mkHost;
acme = {
userName = "deploy";
timeZone = "America/New_York";
locale = "en_US.UTF-8";
keyboardLayout = "us";
};
in {
flake.nixosConfigurations = {
dev-01 = mkHost {
hostName = "dev-01";
platform = "x86_64-linux";
hostSpec = acme;
modules = [
nixfleet-scopes.scopes.roles.workstation
{ nixfleet.impermanence.enable = true; }
./hosts/dev-01/hardware.nix
./hosts/dev-01/disk-config.nix
];
};
prod-web-01 = mkHost {
hostName = "prod-web-01";
platform = "x86_64-linux";
hostSpec = acme;
modules = [
nixfleet-scopes.scopes.roles.server
./hosts/prod-web-01/hardware.nix
./hosts/prod-web-01/disk-config.nix
];
};
};
}
Batch hosts from a template
Standard Nix. Generate 50 identical edge devices with builtins.genList, then merge with named hosts.
# fleet.nix (flake-parts module)
{config, ...}: let
mkHost = config.flake.lib.mkHost;
acme = {
userName = "deploy";
timeZone = "America/New_York";
locale = "en_US.UTF-8";
};
edgeHosts = builtins.listToAttrs (map (i: {
name = "edge-${toString i}";
value = mkHost {
hostName = "edge-${toString i}";
platform = "aarch64-linux";
hostSpec = acme;
modules = [
nixfleet-scopes.scopes.roles.endpoint
./hosts/edge/common-hardware.nix
./hosts/edge/disk-config.nix
];
};
}) (builtins.genList (i: i + 1) 50));
namedHosts = {
control-plane = mkHost {
hostName = "control-plane";
platform = "x86_64-linux";
hostSpec = acme;
modules = [ nixfleet-scopes.scopes.roles.server ./hosts/control-plane/hardware.nix ];
};
};
in {
flake.nixosConfigurations = namedHosts // edgeHosts;
}
No special batch API needed - mkHost is a plain function, and Nix handles the rest.
Key points
- hostSpec values use
lib.mkDefault, so modules you pass inmodulescan override them. hostNameis the exception - it is set withoutmkDefaultand always matches thehostNameparameter.isDarwinis auto-detected from theplatformparameter. You never set it manually.- VM mode (
isVm = true) adds QEMU hardware, SPICE agent, DHCP, and software GL - useful for testing withnix run .#build-vmandnix run .#start-vm.
hostSpec Configuration
hostSpec is a NixOS module option that holds host identity data. It is the primary mechanism for identifying hosts in NixFleet.
Every module injected by mkHost - core, scopes, Home Manager - can read config.hostSpec to adapt behavior. Scope activation is driven by nixfleet.<scope>.enable options (set by roles from nixfleet-scopes), not by hostSpec flags.
Options
Data fields: userName (required), hostName (auto-set), home (computed), timeZone, locale, keyboardLayout, sshAuthorizedKeys, networking, secretsPath, hashedPasswordFile, rootHashedPasswordFile.
Platform flag: isDarwin (auto-set by mkHost).
For the full option reference with types, defaults, and descriptions, see hostSpec Options.
Accessing hostSpec in modules
hostSpec is available in any NixOS, Darwin, or Home Manager module injected by mkHost:
# In a NixOS/Darwin module
{config, lib, ...}: let
hS = config.hostSpec;
in {
services.myapp.dataDir = "${hS.home}/data";
networking.firewall.enable = lib.mkIf config.nixfleet.firewall.enable true;
}
# In a Home Manager module
{config, lib, ...}: let
hS = config.hostSpec;
in {
programs.git.userName = lib.mkIf (!hS.isDarwin) "linux-user";
}
Home Manager modules receive hostSpec because mkHost imports the hostSpec module into the HM evaluation and passes the effective hostSpec values.
Extending hostSpec in fleet repos
The framework defines only the options above. Fleet repos add their own flags as plain NixOS modules:
# modules/hostspec-extensions.nix (in your fleet repo)
{lib, ...}: {
options.hostSpec = {
isDev = lib.mkOption {
type = lib.types.bool;
default = false;
description = "Enable development tools and Docker.";
};
isGraphical = lib.mkOption {
type = lib.types.bool;
default = false;
description = "Enable graphical desktop (audio, fonts, display manager).";
};
};
}
Then use them in fleet-level scopes:
# modules/scopes/dev.nix (in your fleet repo)
{config, lib, pkgs, ...}: let
hS = config.hostSpec;
in {
config = lib.mkIf hS.isDev {
virtualisation.docker.enable = true;
environment.systemPackages = with pkgs; [gcc gnumake];
};
}
Include the extension module in your mkHost calls via the modules parameter. No framework changes needed.
Org defaults pattern
Define shared defaults in a let binding and merge per-host:
let
orgDefaults = {
userName = "deploy";
timeZone = "America/New_York";
locale = "en_US.UTF-8";
keyboardLayout = "us";
sshAuthorizedKeys = [
"ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAA... ops-team"
];
};
in {
web-01 = mkHost {
hostName = "web-01";
platform = "x86_64-linux";
hostSpec = orgDefaults;
modules = [nixfleet-scopes.scopes.roles.server ./hosts/web-01/hardware.nix];
};
}
All hostSpec values passed to mkHost use lib.mkDefault, so modules in the modules list can override them if needed.
Cross-Platform
NixFleet supports NixOS and macOS from a single API. mkHost detects the platform from the platform parameter and builds the appropriate system type.
Supported platforms
| Platform | System builder | Init system | Notes |
|---|---|---|---|
x86_64-linux | nixosSystem | systemd | Full feature set |
aarch64-linux | nixosSystem | systemd | Full feature set (ARM servers, edge devices) |
aarch64-darwin | darwinSystem | launchd | Apple Silicon Macs |
x86_64-darwin | darwinSystem | launchd | Intel Macs |
Automatic platform detection
mkHost sets hostSpec.isDarwin based on the platform parameter. You never set it manually. The home option also auto-computes:
- Linux:
/home/<userName> - Darwin:
/Users/<userName>
What differs by platform
| Concern | NixOS | Darwin |
|---|---|---|
| Core module | _nixos.nix - boot, systemd-boot, NetworkManager, polkit, SSH | _darwin.nix - system defaults, TouchID sudo, dock management |
| User config | users.users.<name>.isNormalUser | users.users.<name>.home, .isHidden |
| Services | systemd services (systemd.services.*) | launchd agents (launchd.agents.*) |
| Impermanence | Btrfs root wipe, /persist bind mounts | Not applicable |
| Base scope packages | ifconfig, netstat, xdg-utils (system) | dockutil, mas (system) |
| Home Manager | HM NixOS module + impermanence HM module | HM Darwin module (no impermanence) |
| Nix daemon | Managed by NixOS (nix.gc.automatic, etc.) | Determinate-compatible (nix.enable = false) |
| Trusted users | @admin + user (non-server) | @admin + user |
Platform guards in modules
Use hostSpec.isDarwin (or pkgs.stdenv) for platform-specific logic:
# Using hostSpec (available in all mkHost modules)
{config, lib, ...}: let
hS = config.hostSpec;
in {
config = lib.mkIf (!hS.isDarwin) {
# Linux-only configuration
services.openssh.enable = true;
};
}
# Using stdenv (standard Nix pattern)
{lib, pkgs, ...}: {
home.packages = lib.optionals pkgs.stdenv.isLinux [pkgs.strace]
++ lib.optionals pkgs.stdenv.isDarwin [pkgs.darwin.apple_sdk.frameworks.Security];
}
Both approaches work. hostSpec.isDarwin is preferred in NixFleet modules because it is available without pkgs and is consistent with the hostSpec-driven activation pattern.
Scopes and platform support
Not all framework scopes apply to both platforms:
| Scope | NixOS | Darwin |
|---|---|---|
| base | NixOS module + HM module | Darwin module + HM module |
| impermanence | NixOS module + HM module | Not included |
| nixfleet-agent | NixOS service (systemd) | Not available |
| nixfleet-control-plane | NixOS service (systemd) | Not available |
The agent and control-plane services are NixOS-only (systemd). macOS hosts are managed through standard darwin-rebuild and do not participate in fleet orchestration.
Design principle
Prefer simple platform-specific implementations over complex cross-platform abstractions. If a feature only makes sense on one platform, keep it there. The framework handles the platform split at the mkHost level - individual modules should stay focused on their target platform rather than adding conditionals for every difference.
Mixed fleet example
let
org = {
userName = "ops";
timeZone = "UTC";
sshAuthorizedKeys = ["ssh-ed25519 AAAA..."];
};
in {
# NixOS server
web-01 = mkHost {
hostName = "web-01";
platform = "x86_64-linux";
hostSpec = org;
modules = [nixfleet-scopes.scopes.roles.server ./hosts/web-01/hardware.nix];
};
# macOS developer laptop
dev-mac = mkHost {
hostName = "dev-mac";
platform = "aarch64-darwin";
hostSpec = org;
modules = [./hosts/dev-mac/extras.nix];
};
# ARM edge device
sensor-01 = mkHost {
hostName = "sensor-01";
platform = "aarch64-linux";
hostSpec = org;
modules = [nixfleet-scopes.scopes.roles.endpoint ./hosts/sensor/hardware.nix];
};
}
All three hosts share org defaults and use the same mkHost call. The framework selects the right system builder, core module, and scope set based on platform.
Scopes & Roles
NixFleet uses a scope system to compose host configurations. Scopes ship in the nixfleet-scopes companion repository - a standalone collection of infrastructure modules, roles, and disk templates that work with any NixFleet-managed host.
Scopes are NixOS modules that self-activate based on configuration flags. Each scope wraps its config block in lib.mkIf so it produces no configuration when its condition is false. Options live under nixfleet.*. Roles compose scopes and set defaults with lib.mkDefault - consumers override with lib.mkForce when needed.
Repository: github.com/arcanesys/nixfleet-scopes - MIT licensed, works standalone or via
inputs.nixfleet.scopesre-export.
Framework Service Scopes
These ship with NixFleet and are auto-included by mkHost (disabled by default).
| Scope | Options | Description |
|---|---|---|
| Agent | services.nixfleet-agent.* | Deploy cycle daemon - polls CP, applies generations, reports health |
| Agent (Darwin) | services.nixfleet-agent.* | macOS variant using launchd |
| Control Plane | services.nixfleet-control-plane.* | Axum HTTP with mTLS, SQLite, RBAC for fleet orchestration |
| Cache Server | services.nixfleet-cache-server.* | Harmonia-based Nix binary cache serving from local store |
| Cache | services.nixfleet-cache.* | Nix substituter pointing to fleet cache |
| MicroVM Host | services.nixfleet-microvm-host.* | MicroVM hypervisor with bridge networking, DHCP, and NAT |
The impermanence scope from nixfleet-scopes is also auto-imported by mkHost. It is inert unless nixfleet.impermanence.enable is set.
Infrastructure Scopes
From nixfleet-scopes. Import via roles or individually.
| Scope | Namespace | Description |
|---|---|---|
| base | nixfleet.base | Universal CLI tools (ifconfig, netstat, xdg-utils). Darwin and HM variants available. |
| operators | nixfleet.operators | Multi-user management - primary user, SSH keys, sudo, shell, HM routing, role groups |
| firewall | nixfleet.firewall | nftables backend, SSH rate limiting (5/min), drop logging, microVM bridge forwarding |
| secrets | nixfleet.secrets | Backend-agnostic identity paths for agenix/sops-nix, boot ordering, key validation |
| backup | nixfleet.backup | Timer scaffolding with restic and borgbackup backends, pre/post hooks, health pings |
| monitoring | nixfleet.monitoring | Prometheus node exporter with fleet-tuned collector defaults |
| monitoring-server | nixfleet.monitoring.server | Prometheus server with scrape configs, retention, and built-in alert rules |
| impermanence | nixfleet.impermanence | Btrfs root wipe + system persist paths (/etc/nixos, /var/lib/systemd, /var/log, etc.) |
| home-manager | nixfleet.home-manager | HM integration - useGlobalPkgs/useUserPackages defaults, fans out profileImports to HM-enabled operators |
| disko | nixfleet.disko | Disko NixOS module injection (inert without disko.devices) |
| o11y | nixfleet.o11y | Metrics remote-write (vmagent to VictoriaMetrics/Mimir) + journal log shipping |
| vpn | nixfleet.vpn | Profile-driven VPN framework with wireguard driver |
| compliance | nixfleet.compliance | Filesystem integration for compliance evidence - persists evidence dir, sets configurationRevision |
| generation-label | nixfleet.generationLabel | Rich boot entry labels from flake metadata (date, rev, deterministic codename) |
| remote-builders | nixfleet.distributedBuilds | Cross-platform distributed build delegation (handles Determinate Nix on Darwin) |
| hardware | nixfleet.hardware | Auto-imports hardware sub-modules: microcode, bluetooth, nvidia, wake-on-LAN, memory/zram, legacy boot |
| terminal-compat | nixfleet.terminalCompat | Terminfo for modern terminals (kitty, alacritty) + headless tools (curl, wget, unzip) |
Platform variants exist for: base (Darwin, HM), operators (Darwin), backup (Darwin), impermanence (HM), home-manager (Darwin).
Operators
The operators scope manages user accounts declaratively. One operator is designated primaryUser - the identity anchor for Home Manager, secrets, and impermanence paths.
Each operator (users.<name>) supports:
isAdmin- adds wheel group (sudo access)sshAuthorizedKeys- SSH public keys for authorized_keysshell- login shell (default: bash)homeManager.enable- apply the profile’s HM stack to this operatorhashedPassword/hashedPasswordFile- password authenticationextraGroups- additional groups on top of roleGroups
Top-level options:
primaryUser- identity anchor (auto-detected when only one operator exists)roleGroups- groups added to all operators (set by roles, e.g. workstation adds networkmanager/video/audio/docker)rootSshKeys- root SSH access, independent of operator accountsmutableUsers- allow imperative passwd changes (default: false)
nixfleet.operators = {
primaryUser = "alice";
users.alice = {
isAdmin = true;
sshAuthorizedKeys = [ "ssh-ed25519 AAAA... alice@workstation" ];
homeManager.enable = true;
shell = pkgs.zsh;
};
users.bob = {
sshAuthorizedKeys = [ "ssh-ed25519 BBBB... bob@laptop" ];
};
rootSshKeys = config.nixfleet.operators._adminSshKeys;
};
Roles
Roles compose scopes with sensible defaults. Import one role per host.
| Role | Type | Scopes imported | Key defaults |
|---|---|---|---|
| server | Headless | base, operators, firewall, secrets, monitoring, impermanence, o11y, generation-label, terminal-compat, hardware | Firewall on, secrets on, monitoring on, o11y metrics on, no user key, no roleGroups |
| workstation | Interactive | base, operators, firewall, secrets, home-manager, backup, impermanence, o11y, generation-label, terminal-compat, hardware | Firewall on, secrets on, HM on, o11y metrics on, zram swap, roleGroups: networkmanager/video/audio/docker |
| endpoint | Locked-down | base, operators, secrets, impermanence | Secrets on with user key enabled. Consumer provides firewall, HM, and hardware. |
| microvm-guest | VM guest | base, operators, impermanence | Minimal - host owns firewall, backup, and networking |
Disk Templates
Pre-built disko configurations for common partition layouts.
| Template | Boot | Filesystem | Impermanence |
|---|---|---|---|
| btrfs | UEFI | btrfs | No |
| btrfs-bios | Legacy BIOS | btrfs | No |
| btrfs-impermanence | UEFI | btrfs | Yes |
| btrfs-impermanence-bios | Legacy BIOS | btrfs | Yes |
| ext4 | UEFI | ext4 | No |
| luks-btrfs-impermanence | UEFI | LUKS + btrfs | Yes |
Access via inputs.nixfleet-scopes.scopes.disk-templates.<name>.
What Belongs Where
| Content | Belongs in |
|---|---|
| Framework API (mkHost) | nixfleet |
| Service modules (agent, CP, cache, microvm) | nixfleet |
| Infrastructure scopes and roles | nixfleet-scopes |
| Disk templates | nixfleet-scopes |
| Compliance controls and frameworks | nixfleet-compliance |
| Opinionated fleet scopes (dev, graphical, theming) | Your fleet repo |
| Hardware configs and dotfiles | Your fleet repo |
Compliance
NixFleet’s compliance layer ships in the nixfleet-compliance companion repository - a standalone collection of regulatory controls, framework presets, and evidence probes for NixOS hosts.
Each control enforces a security measure and produces machine-readable evidence via probes. Evidence is collected on a schedule and written to /var/lib/nixfleet-compliance/evidence.json. The governance engine lets fleet operators set enforcement levels, host-type scoping, and per-rule exceptions with mandatory rationale.
Repository: github.com/arcanesys/nixfleet-compliance - MIT licensed, works standalone or alongside nixfleet and nixfleet-scopes.
Quick Start
{
inputs.compliance.url = "github:arcanesys/nixfleet-compliance";
# In your mkHost modules:
modules = [
compliance.nixosModules.nis2
{
compliance.frameworks.nis2 = {
enable = true;
entityType = "essential";
};
}
];
}
Frameworks
| Framework | Regulation | Controls | Differentiation |
|---|---|---|---|
| NIS2 | Directive 2022/2555 | 12 | essential vs important |
| DORA | Regulation 2022/2554 | 9 | critical provider vs standard |
| ISO 27001 | ISO/IEC 27001:2022 | 14 | full vs partial scope |
| ANSSI | BP-028 v2.0 | 7 | minimal / intermediary / reinforced / high |
Controls
| Control | What it enforces |
|---|---|
| access-control | SSH key-only auth, root login disabled, idle session timeout |
| asset-inventory | Host, service, and network inventory from running system |
| audit-logging | Journald persistence, auditd with execve tracking, log retention |
| authentication | MFA policy, PAM modules, SSH certificate auth |
| backup-retention | Backup service verification, last backup age, retention compliance |
| baseline-hardening | Kernel sysctl, IOMMU, filesystem permissions (ANSSI R7-R14) |
| change-management | System rebuild freshness, generation frequency |
| disaster-recovery | Generation retention, RTO target, recovery test interval |
| encryption-at-rest | LUKS verification, encrypted swap, tmpfs /tmp |
| encryption-in-transit | TLS minimum version, certificate inventory and expiry |
| incident-response | Rollback readiness, journal availability, alert retention |
| key-management | SSH host key age and algorithm, LUKS key slots, rotation policy |
| network-segmentation | Firewall status, VLAN detection, interface inventory |
| secure-boot | EFI support, secure boot status, signed unified kernel images |
| supply-chain | flake.lock pinning, SBOM generation, nixpkgs staleness |
| vulnerability-mgmt | Nixpkgs freshness, scan interval, critical vulnerability blocking |
Governance
| Option | Values | Description |
|---|---|---|
enforceMode | enforce, report | Enforce applies NixOS config and runs probes; report only runs probes |
level | minimal, standard, strict, paranoid | Rules above this severity threshold are auto-disabled |
hostType | server, workstation, appliance | Rules scope themselves to matching host types |
excludes | list of tags | Tag-based rule exclusions (e.g., ["no-ipv6"]) |
exceptions | attrs with rationale | Per-rule exceptions with mandatory reason, included in audit report |
compliance.governance = {
enforceMode = "enforce";
level = "standard";
hostType = "server";
exceptions.BH-07 = {
rationale = "IPv6 required for internal mesh networking";
};
};
Evidence Collection
Probes run on a configurable schedule - hourly for essential/critical entities, daily for important/standard - and produce JSON. The compliance-check CLI runs all probes interactively:
compliance-check # colored summary
VERBOSE=1 compliance-check # detailed JSON per control
Framework Mappings
For detailed article-by-article regulatory mappings:
NixOS Advantage
NixOS provides unique compliance properties. flake.lock is a cryptographically verifiable supply chain manifest - every input is pinned by hash. Content-addressing makes binary tampering detectable. Impermanence prevents malware persistence by wiping the root filesystem on every reboot. Declarative configuration means the audit configuration IS the actual running configuration - there is no drift between what was approved and what is deployed.
Standard Tools
NixFleet builds on standard NixOS tooling. Every host produced by mkHost is a regular nixosSystem or darwinSystem output, so the standard deployment commands work unchanged.
Fresh install (with disk partitioning)
nixos-anywhere --flake .#hostname root@192.168.1.42
Disko partitions the disk according to the host’s disk config, then installs the NixOS closure.
Local rebuild
sudo nixos-rebuild switch --flake .#hostname
Remote rebuild
nixos-rebuild switch --flake .#hostname --target-host root@192.168.1.42
Evaluates locally, copies the closure to the target, and activates it.
macOS rebuild
darwin-rebuild switch --flake .#hostname
When to reach for more
These commands work because mkHost returns standard nixosSystem/darwinSystem outputs. The orchestration layer (control plane + agent) is additive - use it when your fleet grows beyond manual rebuilds.
Control Plane
The control plane is a lightweight HTTP server that coordinates fleet deployments. It provides:
- Machine registry - agents auto-register on first report; machines are tracked with tags and lifecycle states
- Rollout orchestration - staged, canary, and all-at-once deployment strategies with health-check gates
- Tag storage - group machines by role, environment, or any arbitrary label
- Deployment audit log - every action (deploy, rollback, tag change, lifecycle transition) is recorded
- REST API - all operations available programmatically at
/api/v1/
Enabling the service
services.nixfleet-control-plane = {
enable = true;
listen = "0.0.0.0:8080";
dbPath = "/var/lib/nixfleet-cp/state.db";
openFirewall = true;
};
Options
See Control Plane Options for the full option reference including TLS, metrics, and systemd service details.
Verify
systemctl status nixfleet-control-plane
curl http://localhost:8080/health
What it manages
Machines auto-register when the agent sends its first report to the control plane. Each machine has:
- A unique ID (defaults to hostname)
- Tags for grouping (
web,prod,eu-west, etc.) - A lifecycle state (
pending→provisioning→active↔maintenance→decommissioned)
Releases are immutable manifests mapping each host to its built Nix store path. A release captures “what the flake means for each host at a point in time”. Created via nixfleet release create, they can be inspected, diffed, listed, and referenced by rollouts multiple times (e.g., staging then prod, or rollback to a previous release). See CLI reference.
Rollouts coordinate fleet-wide deployments across batches with health gates between each batch. Every rollout references a release - the CP resolves each target machine’s store path from the release entries at batch execution time. See Rollouts for details.
Audit events record every mutation (deployment, rollback, tag change, lifecycle transition) with actor, timestamp, and detail. Query them with:
curl http://localhost:8080/api/v1/audit # JSON
curl http://localhost:8080/api/v1/audit/export # CSV
Monitoring
The /metrics endpoint is available on the CP’s listen address with no extra configuration. It is always active when the service is running.
Add a scrape target to your Prometheus configuration:
scrape_configs:
- job_name: nixfleet-control-plane
static_configs:
- targets: ["fleet.example.com:8080"]
See Control Plane Options for the full list of exposed metrics.
Security
The control plane uses two independent auth layers: the TLS layer (authentication) and the API layer (authorization).
| Layer | Mechanism | Who | Purpose |
|---|---|---|---|
| TLS | mTLS client certs | Agents + admin clients | Authenticate the connection |
| API | API keys (role-gated) | Admin clients only | Authorize specific operations |
API keys have one of three roles: admin (full access), deploy (create releases and rollouts), readonly (read-only). The bootstrap endpoint creates an admin key.
Configuration
services.nixfleet-control-plane = {
enable = true;
tls.cert = "/run/secrets/cp-cert.pem"; # enables HTTPS
tls.key = "/run/secrets/cp-key.pem";
tls.clientCa = "/run/secrets/fleet-ca.pem"; # enables required mTLS
};
When tls.clientCa is set, all connections must present a valid client certificate:
- Agents authenticate via client cert alone (no API key)
- Admin clients require both a client cert AND an API key (
Authorization: Bearer <key>)
See Control Plane Options for full TLS option details.
Bootstrap
On first deployment, create the initial admin key via the bootstrap endpoint (only works when no keys exist):
curl -X POST https://cp-host:8080/api/v1/keys/bootstrap \
--cacert fleet-ca.pem --cert client-cert.pem --key client-key.pem \
-H 'Content-Type: application/json' -d '{"name":"admin"}'
# Returns: {"key":"nfk-...","name":"admin","role":"admin"}
Save the returned key - it’s only shown once. Subsequent calls return 409 Conflict.
Production recommendation: Always enable TLS. Set
tls.clientCato require mTLS from all clients. Admin clients need both a client certificate and an API key.
Persistence
State is stored in a single SQLite database at dbPath. On impermanent NixOS hosts, the module automatically persists /var/lib/nixfleet-cp across reboots.
A background task cleans up health reports older than 24 hours to prevent unbounded database growth.
Agent
The agent runs on each managed host as a systemd service. It polls the control plane for a desired generation, applies changes when a mismatch is detected, runs health checks, reports status, and automatically rolls back on failure.
Enabling the agent
services.nixfleet-agent = {
enable = true;
controlPlaneUrl = "https://fleet.example.com";
tags = ["web" "prod" "eu-west"];
pollInterval = 60;
healthInterval = 60;
healthChecks = {
systemd = [{ units = ["nginx.service" "postgresql.service"]; }];
http = [{
url = "http://localhost:8080/health";
expectedStatus = 200;
timeout = 3;
interval = 5;
}];
command = [{
name = "disk-space";
command = "test $(df --output=pcent / | tail -1 | tr -d '% ') -lt 90";
timeout = 5;
interval = 10;
}];
};
};
Agent options
See Agent Options for the full option reference including TLS, metrics, health checks, and systemd service details.
Deploy cycle
On every poll tick the agent runs a single sequential deploy cycle (run_deploy_cycle) to completion - no cooperative state machine, no interruptible transitions:
- Check -
GET /api/v1/machines/<id>/desired-generationreturns{hash, cache_url, poll_hint}. Ifhashmatches/run/current-system, the cycle reports “up-to-date” and returns. Ifpoll_hintis set (active rollout), the next poll is scheduled at that shorter interval. - Fetch - if the generation differs, the agent runs
nix copy --from <cache_url> <hash>. With no cache URL, it falls back tonix path-infoto verify the path was pre-pushed out-of-band. - Apply - runs
<hash>/bin/switch-to-configuration switchas a subprocess. The agent is a privileged root service - sandboxing is minimal becauseswitch-to-configurationneeds access to/dev,/home,/root, cgroups, and kernel modules to do its job. - Verify - runs all configured health checks. If any fail, the agent transitions to rollback.
- Report - posts a
Reportto the CP withcurrent_generation,success, andmessage. The executor usescurrent_generationto verify the machine has actually applied the new generation before accepting health-gated completion.
On any failure (network, fetch, apply, or verify), the cycle returns PollOutcome::Failed and the main loop reschedules the next poll to retryInterval (30s by default) instead of the full pollInterval. This handles bootstrap races (agent polls before the CP has a release), transient network failures, and flaky fetches.
Periodic health reports run on a separate healthInterval tick (default 60s) independent of the deploy cycle. The executor only counts a health report toward batch completion when the machine’s current_generation matches the desired store path from the release entry.
Health checks
Three types of health check are supported, all configured declaratively in Nix:
Systemd units
Verify that critical systemd units are in the active state.
healthChecks.systemd = [{
units = ["nginx.service" "postgresql.service"];
}];
HTTP endpoints
Send a GET request and verify the response status code.
| Suboption | Type | Default | Description |
|---|---|---|---|
url | string | - (required) | URL to GET |
expectedStatus | int | 200 | Expected HTTP status code |
timeout | int | 3 | Timeout in seconds |
interval | int | 5 | Check interval in seconds |
healthChecks.http = [{
url = "http://localhost:3000/healthz";
expectedStatus = 200;
timeout = 5;
}];
Custom commands
Run an arbitrary shell command. Exit code 0 means healthy.
| Suboption | Type | Default | Description |
|---|---|---|---|
name | string | - (required) | Check name (used in reports) |
command | string | - (required) | Shell command to execute |
timeout | int | 5 | Timeout in seconds |
interval | int | 10 | Check interval in seconds |
healthChecks.command = [{
name = "disk-space";
command = "test $(df --output=pcent / | tail -1 | tr -d '% ') -lt 90";
timeout = 5;
}];
Continuous health reporting
The agent sends periodic health reports at healthInterval (default: 60s), independent of deploy cycles. The CP uses these to track fleet health, evaluate rollout health gates, and surface issues in nixfleet status.
Prometheus Metrics
Enable the agent metrics listener by setting metricsPort:
services.nixfleet-agent = {
enable = true;
controlPlaneUrl = "https://fleet.example.com";
metricsPort = 9101;
metricsOpenFirewall = true;
};
Scrape from Prometheus at http://agent-host:9101/metrics. See Agent Options for the full list of exposed metrics.
Registration & tags
Agents auto-register on first report (gated by mTLS). Tags from services.nixfleet-agent.tags sync on every report - change the NixOS config, rebuild, and the CP picks up the new tags automatically. Admins can pre-register machines via nixfleet machines register <id>.
nixfleet machines list # verify enrollment
nixfleet machines list --tags prod # filter by tag
Persistence
Agent state is stored in a SQLite database at dbPath. On impermanent NixOS hosts, the module automatically persists /var/lib/nixfleet across reboots.
Security
Configure mTLS via the NixOS module options tls.clientCert and tls.clientKey. Set allowInsecure = true for dev-only HTTP mode.
The systemd service runs without sandboxing because switch-to-configuration needs full system access. See Agent Options - Systemd service for the full hardening rationale.
Binary Cache
A fleet binary cache means agents fetch closures from your own infrastructure instead of rebuilding or pulling from cache.nixos.org on every deploy.
NixFleet ships with harmonia as the default cache server. Harmonia serves paths directly from the local Nix store over HTTP - no separate storage backend, database, or push protocol. Paths are signed on-the-fly using the host’s Nix signing key.
Server setup
Enable the cache server on a dedicated host (or any always-on fleet member):
services.nixfleet-cache-server = {
enable = true;
port = 5000; # default
openFirewall = true;
signingKeyFile = "/run/secrets/cache-signing-key";
};
Generating a signing key
nix-store --generate-binary-cache-key cache.fleet.example.com secret-key.pem public-key.pem
Store secret-key.pem as an encrypted secret (agenix/sops). Note the public-key.pem contents - clients need it.
Populating the cache
Harmonia serves whatever is in the local Nix store. To populate it, copy closures to the cache host after building:
# Push closures to the cache host's Nix store
nixfleet release create --push-to ssh://root@cache.fleet.example.com
# Or with nix copy directly
nix copy --to ssh://root@cache.fleet.example.com /nix/store/...
Client setup
Enable on agent hosts to configure Nix substituters:
services.nixfleet-cache = {
enable = true;
cacheUrl = "http://cache.fleet.example.com:5000";
publicKey = "cache.fleet.example.com:AAAA...="; # contents of public-key.pem
};
This adds cacheUrl to nix.settings.substituters and the public key to nix.settings.trusted-public-keys.
Agent fetch workflow
When a deploy is triggered, each agent resolves the closure from substituters in order:
cacheUrlfromservices.nixfleet-cache(orservices.nixfleet-agent.cacheUrl)- Default Nix substituters (
cache.nixos.org, etc.)
Agents automatically benefit from the fleet cache once the client module is enabled and the signing key is trusted - no additional configuration on the agent side is needed.
To override the cache URL per-deploy from the CLI:
nixfleet deploy --tags web --release REL-xxx --cache-url http://cache.fleet.example.com:5000
Advanced: custom cache backends
For Attic, Cachix, or other cache backends that need a custom push command, use the --push-hook CLI flag:
# Attic example
nixfleet release create --push-to ssh://root@cache --push-hook "attic push fleet {}"
# Cachix example
nixfleet release create --push-hook "cachix push my-cache {}"
The {} placeholder is replaced with each store path. When combined with --push-to, the hook runs on the remote host via SSH. Without --push-to, it runs locally.
Fleet repos that want Attic can add it as their own flake input and configure it via plain NixOS modules.
Attic and upstream dependencies
Attic is a push-only cache - it does not proxy upstream caches like cache.nixos.org. When you push a closure with attic push, Attic skips store paths that already exist in upstream caches to save bandwidth and storage. This means your private cache may not have every path needed to fetch a full closure.
The agent handles this automatically: if nix copy --from <cache_url> fails (e.g. a dependency like kmod exists on cache.nixos.org but not in your Attic cache), it falls back to nix-store --realise which uses the system-configured substituters. Your custom-built paths are still served from LAN (Attic), while standard nixpkgs dependencies fall through to cache.nixos.org.
For air-gapped fleets (no WAN access), you must push complete closures including all upstream dependencies. Use nix copy --to instead of attic push - it copies every path regardless of upstream availability:
# Push complete closure (all paths, no upstream skip)
nix copy --to http://cache:8081/fleet /nix/store/...-nixos-system-...
# Or via SSH to the cache host's store (harmonia serves it directly)
nix copy --to ssh://root@cache /nix/store/...-nixos-system-...
See also
- Cache Options - full option reference
- Secrets - managing the signing key with agenix
Rollouts
A rollout is a fleet-wide deployment coordinated by the control plane. Instead of pushing new code to every machine at once and hoping for the best, rollouts deploy in batches with health-check gates between each batch. If something breaks, the rollout pauses or reverts automatically.
Every rollout targets a release - an immutable CP-managed manifest mapping each host to its built Nix store path. This enables per-host deployment in heterogeneous fleets where every machine’s closure is different (different hardware, hostSpec, modules, certificates). You create a release once (nixfleet release create), then trigger one or more rollouts against it.
The two-step flow
nixfleet release create --push-to ssh://root@cache # build + push + register
nixfleet deploy --release rel-abc123 --tags web --strategy canary --wait
Or use the convenience shorthand - nixfleet deploy with --push-to / --copy implicitly creates a release first:
nixfleet deploy --push-to ssh://root@cache --tags web --strategy canary --wait
Both forms do the same thing. The explicit form is useful when you want to deploy the same release multiple times (e.g., staging then prod, or rolling forward then back).
Strategies
All-at-once
Deploy to every targeted machine simultaneously. No batching, no gates. Suitable for dev/staging environments or non-critical updates.
nixfleet deploy --release rel-abc123 --tags staging --strategy all-at-once
Canary
Deploy to a single machine first. If that machine passes health checks within the timeout, deploy to all remaining machines. Suitable for production environments where you want a quick smoke test.
nixfleet deploy --release rel-abc123 --tags prod --strategy canary \
--health-timeout 120 --wait
This creates two batches: batch 0 with 1 machine, batch 1 with the rest.
Staged
Define explicit batch sizes for fine-grained control. Batch sizes can be absolute numbers or percentages.
nixfleet deploy --release rel-abc123 --tags prod --strategy staged \
--batch-size 1,25%,100% \
--health-timeout 300 --wait
This creates three batches:
- Batch 0: 1 machine (canary)
- Batch 1: 25% of remaining machines
- Batch 2: all remaining machines (100%)
How rollouts work
-
Create - The CLI posts a rollout to the control plane with the
release_id, target filter (tags or hosts), and strategy. The CP loads the release entries, intersects them with the target machine set (machines not in the release are skipped with a warning), randomizes the order, and splits them into batches. -
Execute batches - The rollout executor (a background task in the CP) processes batches sequentially:
- For each machine in the current batch, looks up the per-host store path from the release entries
- Captures the machine’s current generation into the batch’s
previous_generationsmap (for per-machine rollback) - Sets the desired generation on each machine via the internal
generationstable - Returns
poll_hint: 5in the agent’s next desired-generation response so agents react within seconds instead of waiting the fullpollInterval - Agents poll, detect the mismatch, fetch the closure, apply, run health checks, and report back with their new
current_generation
-
Health gate - The executor evaluates each machine’s health by verifying TWO conditions:
- The machine’s latest report’s
current_generationmatches the desired store path from the release entry (proves the agent actually applied the new generation) - A health report with
all_passed = truehas been received since the batch started
This two-step gate prevents false-positive completion from stale health reports: a health report from a previous generation cannot count toward the new batch.
- The machine’s latest report’s
-
Complete or fail - When all batches succeed, the rollout status moves to
completed. If a health gate fails, the rollout transitions topausedorfaileddepending on the--on-failuresetting.
Health gates
After each batch deploys, the control plane waits for agents to report health. The gate evaluates based on two parameters:
--health-timeout(default:300seconds) - Maximum time to wait for health reports after a batch deploys. Machines that do not report within this window are marked as timed out. Set this higher thanpollIntervalso agents have time to notice the deploy (or rely onpoll_hintto react within 5s).--failure-threshold(default:0) - Maximum number of unhealthy/timed-out machines before triggering the failure action.0means zero tolerance - any single failure pauses the rollout. Can be absolute ("3") or a percentage of the batch ("30%").
When the threshold is exceeded:
--on-failure pause(default) - The rollout pauses. Investigate, fix the issue, then resume withnixfleet rollout resume <id>. Machines in the failed batch that did deploy are left in place (the agent already rolled back individually if its own health checks failed).--on-failure revert- The rollout fails and the CP reads each completed batch’sprevious_generationsmap, reverting every machine in those batches to the store path it was running before the rollout started. Each machine rolls back to its OWN previous state - not a single shared generation - which is the correct behavior for heterogeneous fleets.
CLI flags
See CLI reference - deploy for the full flag list with defaults and descriptions.
Monitoring rollouts
Stream progress in real time with --wait:
nixfleet deploy --release rel-abc123 --tags prod --strategy canary --wait
If --on-failure pause triggers, --wait exits immediately with an actionable message instead of blocking until timeout:
Rollout r-xxx paused: batch 1 health check failed (2/3 unhealthy)
Resume with: nixfleet rollout resume r-xxx
Monitor with: nixfleet rollout status r-xxx --watch
List rollouts:
nixfleet rollout list
nixfleet rollout list --status running
nixfleet rollout list --status paused
Inspect a specific rollout with per-batch and per-machine detail:
nixfleet rollout status <rollout-id>
Managing rollouts
Resume a paused rollout (after investigating and fixing the issue):
nixfleet rollout resume <rollout-id>
Cancel a rollout (stops further batches, leaves already-deployed machines as-is):
nixfleet rollout cancel <rollout-id>
SSH fallback
For environments without a control plane (small fleets, bootstrapping, or air-gapped networks), the CLI can deploy directly over SSH without using a release:
nixfleet deploy --ssh --hosts "web*" --flake .
This builds each matching host’s closure locally, copies it to the target via nix-copy-closure, and runs switch-to-configuration switch. No rollout orchestration, no release manifest, no health gates - just a direct push. Useful for initial bootstrap or quick one-off deploys.
Worked example: canary deploy to production
Step 1 - build all production hosts and register a release. If you use harmonia as a binary cache, --push-to ssh:// copies the closures to the cache host’s /nix/store where harmonia serves them immediately:
nixfleet release create \
--flake . \
--hosts 'web-*,db-*' \
--push-to ssh://root@cache
Output includes the release ID, for example rel-abc123-....
Step 2 - deploy with canary strategy, 2-minute health timeout, auto-pause on failure:
nixfleet deploy \
--release rel-abc123 \
--tags prod,web \
--strategy canary \
--health-timeout 120 \
--failure-threshold 1 \
--on-failure pause \
--wait
What happens:
- The CP loads the release entries, filters by
prodANDwebtags, intersects with the release’s host list (skipping any tagged machine not in the release), and randomizes the order. - Batch 0: 1 machine receives its per-host store path as desired. The CP starts returning
poll_hint=5in the agent’s desired-generation response. - Within ~5s, the agent polls, sees the mismatch, fetches the closure via
nix copy --from http://cache:5000, runsswitch-to-configuration switch, runs health checks, reports back. - The CP verifies the agent’s report shows the new
current_generation(not a stale report from before the deploy), then waits for a passing health report. - If healthy within 120s: Batch 1 deploys to all remaining machines in parallel.
- If unhealthy: the rollout pauses. The canary machine’s agent has already rolled back locally. Run
nixfleet rollout status <id>to investigate, thennixfleet rollout resume <id>ornixfleet rollout cancel <id>.
Step 3 - same release, different environment:
# Same release, redeploy to a different subset with a different strategy
nixfleet deploy --release rel-abc123 --tags staging --strategy all-at-once --wait
Fleet Status
Day-2 operations for monitoring your fleet through the CLI and control plane.
Fleet overview
nixfleet status
Shows a summary of all machines known to the control plane: hostname, current generation (from the agent’s most recent report), desired generation (from the active rollout’s release entry, if any), lifecycle state, last report time, and tags.
For machine-readable output:
nixfleet status --json
Listing machines
nixfleet machines list
Filter by tag:
nixfleet machines list --tags prod
nixfleet machines list --tags web
Tags
Tags group machines for targeted deployments and filtering. They can be set in two places.
Via NixOS configuration
Declare tags in the agent service config. These are baked into the system closure and reported on every poll:
services.nixfleet-agent = {
enable = true;
controlPlaneUrl = "https://fleet.example.com";
tags = ["prod" "web" "region-eu"];
};
Tags are stored in the control plane database. NixOS-configured tags (from services.nixfleet-agent.tags) are reported by the agent on every poll and synced to the control plane.
Machine lifecycle
Every machine has a lifecycle state that determines how the control plane treats it.
| State | Description |
|---|---|
pending | Pre-registered, no agent report yet |
provisioning | Install in progress |
active | Agent reporting normally |
maintenance | Manually paused |
decommissioned | Removed from fleet |
Lifecycle is informational - rollouts target machines by tag or hostname regardless of lifecycle state. Use lifecycle to track operational status and filter with nixfleet machines list.
Transitions
Not all transitions are valid. The control plane enforces these rules:
pending --> provisioning --> active
pending --> active (agent reports directly)
pending --> decommissioned (never used)
provisioning --> pending (reset)
active <--> maintenance (pause/resume)
active --> decommissioned (retire)
maintenance --> decommissioned (retire while paused)
Invalid transitions (e.g., decommissioned to active, or active to pending) are rejected by the control plane.
Changing lifecycle state
Use the control plane API directly:
curl -X PATCH "$NIXFLEET_CONTROL_PLANE_URL/api/v1/machines/web-01/lifecycle" \
-H "Content-Type: application/json" \
-d '{"lifecycle": "maintenance"}'
When the control plane is unavailable
The CLI’s status and machines list commands require a running control plane. If the CP is down:
- Agents continue running with their last-known generation
- Agents do not receive new deployments
- Use SSH for direct machine access (
ssh root@hostname) - Use standard NixOS tools for local inspection (
nixos-rebuild list-generations,systemctl status)
Rollback
Four mechanisms exist for rolling back, from fully automatic to fully manual.
1. Automatic (agent health checks)
When the agent applies a new generation, it runs the configured health checks (systemd units, HTTP endpoints, custom commands). If any check fails, the agent automatically:
- Rolls back to the previous generation (
switch-to-configuration switch) - Reports the failure to the control plane with
success: false - Includes the rollback reason in the report message
No operator action required. During a rollout, this failure report triggers the rollout’s health gate, which may pause or revert the entire rollout depending on --on-failure settings.
2. Rollout-level revert (on_failure = revert)
When a rollout is created with --on-failure revert and a later batch fails, the control plane reads each completed batch’s previous_generations map (captured at batch start) and sets each machine’s desired generation back to the store path it was running BEFORE the rollout started. This is per-machine - each host reverts to its own previous state, not a single shared generation. The rollout status becomes failed and agents pull the revert on their next poll (within ~5s due to poll_hint).
This is the correct rollback mechanism for heterogeneous fleets where each machine has a unique closure.
3. Manual via CLI (SSH mode)
nixfleet rollback is an SSH-only operation - it switches a single machine to a previous generation directly over SSH, bypassing the control plane.
# Rollback to the previous generation (reads from system-1-link on the target)
nixfleet rollback --host web-01 --ssh
# Rollback to a specific store path
nixfleet rollback --host web-01 --ssh --generation /nix/store/abc123-nixos-system
This runs switch-to-configuration switch on the target via SSH. Useful when the control plane is unreachable or during bootstrap before the agent is running.
For CP-driven rollback of a bad deploy discovered after health checks pass, deploy an older release:
git checkout <old-commit>
nixfleet release create --push-to ssh://root@cache
git checkout -
nixfleet deploy --release <old-id> --tags prod --wait
4. Manual via NixOS
Standard NixOS rollback mechanisms work regardless of NixFleet.
Command-line rollback
# On the target machine
sudo nixos-rebuild switch --rollback
# Or switch to a specific generation
sudo nix-env -p /nix/var/nix/profiles/system --switch-generation 42
sudo /nix/var/nix/profiles/system/bin/switch-to-configuration switch
Boot menu
systemd-boot lists previous generations at boot. Select an older entry to boot into a previous configuration. This is the last resort when SSH access is unavailable or the current generation fails to boot.
When to use which
| Scenario | Mechanism |
|---|---|
| Deployment health check fails | Automatic (agent rolls back per-machine) |
Mid-rollout batch failure with --on-failure revert | Automatic (CP reverts completed batches from per-machine previous_generations) |
| Bad deploy discovered after health checks pass | Create a release pointing at the old closures, nixfleet deploy --release <old> |
| Control plane is down | SSH rollback (nixfleet rollback --host <h> --ssh) or NixOS boot menu |
| Machine won’t boot | Boot menu (select previous generation) |
| Rollout affecting multiple machines | nixfleet rollout cancel + individual rollbacks if needed |
Impermanence
Impermanent hosts wipe their root filesystem on every boot. Only explicitly persisted paths survive. This eliminates configuration drift and forces every piece of state to be declared.
What ephemeral root gives you
- No drift - the root filesystem is always a clean slate. Undeclared state cannot accumulate.
- Forced explicitness - if you forget to persist something, you notice on the next reboot. No hidden state.
- Reproducibility - two machines with the same closure and the same persisted data behave identically.
How the btrfs wipe works
On boot, an initrd script runs before the root filesystem is mounted:
- Mounts the btrfs partition by label (
root) - Renames the current
@rootsubvolume toold_roots/<timestamp> - Deletes old root snapshots older than 30 days (recursive subvolume deletion)
- Creates a fresh
@rootsubvolume - Unmounts
The /persist filesystem is marked neededForBoot = true so it is available during early boot before the wipe completes.
What the framework persists
System-level (/persist)
| Path | Purpose |
|---|---|
/etc/nixos | NixOS configuration |
/etc/NetworkManager/system-connections | WiFi/VPN connections |
/var/lib/systemd | systemd state (timers, journals) |
/var/lib/nixos | NixOS UID/GID maps |
/var/log | System logs |
/etc/machine-id | Stable machine identity (file) |
User-level (/persist via Home Manager)
The framework persists common user paths. Fleet repos extend this list with their own application state via scope-aware persistence (see below).
| Path | Purpose |
|---|---|
.keys | Encryption/decryption keys |
.local/share/nix | Nix user state |
.ssh/known_hosts | SSH known hosts (file) |
The framework also persists paths for tools included in the base scope (shell history, plugin state, CLI auth). See modules/scopes/_impermanence.nix for the full list.
User-level mounts are hidden (hideMounts = true) to keep ls output clean.
Service-level (auto-persist)
The agent and control plane modules automatically persist their state directories when impermanence is enabled:
- Agent:
/var/lib/nixfleet(SQLite state database) - Control plane:
/var/lib/nixfleet-cp(SQLite state database)
No manual configuration needed. The service modules detect nixfleet.impermanence.enable and add the persist entries.
Scope-aware persistence
Persist paths belong next to the program they support, not in a centralized list. When you write a scope that installs a program with state, co-locate the persist declaration:
{config, lib, pkgs, ...}: let
hS = config.hostSpec;
in {
config = lib.mkIf hS.isGraphical {
programs.firefox.enable = true;
# Persist Firefox profile alongside its config
home.persistence."/persist" = lib.mkIf config.nixfleet.impermanence.enable {
directories = [".mozilla/firefox"];
};
};
}
This prevents the persistence list from drifting out of sync with installed programs.
Opting in
Enable nixfleet.impermanence.enable (or use a role that sets it) and use a btrfs disk layout with separate persist subvolumes:
nixfleet.lib.mkHost {
hostName = "myhost";
platform = "x86_64-linux";
hostSpec = {
userName = "alice";
};
modules = [
nixfleet-scopes.scopes.roles.workstation
{ nixfleet.impermanence.enable = true; }
# Use the framework's btrfs-impermanence disko template
nixfleet.diskoTemplates.btrfs-impermanence
./hardware-configuration.nix
];
}
The framework provides two disko templates:
diskoTemplates.btrfs- standard btrfs layout without impermanencediskoTemplates.btrfs-impermanence- btrfs layout with@root,@persist, and@nixsubvolumes
Ownership and activation
The framework runs an activation script that ensures /persist/home/<userName> exists with correct ownership. If a .keys directory exists in the persist home, it is recursively chowned to the primary user.
Custom Scopes
Scopes are plain NixOS/HM modules that self-activate based on enable options. The framework provides base, impermanence, and the service modules. Your fleet repo adds scopes for everything else.
Step 1: Define a hostSpec flag
Extend hostSpec in your fleet repo with a plain NixOS module:
# modules/host-spec-extensions.nix
{lib, ...}: {
options.hostSpec.isDev = lib.mkOption {
type = lib.types.bool;
default = false;
description = "Enable development tools.";
};
}
Include this module in your mkHost modules list (or use an import-tree pattern).
Step 2: Create the scope module
Write a NixOS module that activates only when the flag is true:
# modules/scopes/dev.nix
{config, lib, pkgs, ...}: let
hS = config.hostSpec;
in {
config = lib.mkIf hS.isDev {
virtualisation.docker.enable = true;
environment.systemPackages = with pkgs; [gcc gnumake];
};
}
Step 3: Add Home Manager config
If the scope needs user-level configuration, use the HM module pattern. You can define it as a separate module or combine it with the NixOS module depending on your import strategy.
In a multi-module pattern (returned as an attrset):
# modules/scopes/dev.nix
{
nixos = {config, lib, pkgs, ...}: let
hS = config.hostSpec;
in {
config = lib.mkIf hS.isDev {
virtualisation.docker.enable = true;
};
};
homeManager = {config, lib, pkgs, ...}: let
hS = config.hostSpec;
in {
home.packages = lib.optionals hS.isDev (with pkgs; [
nodejs
python3
rustup
]);
};
}
Step 4: Add persist paths
If the scope installs programs with state on impermanent hosts, co-locate the persistence declaration:
{config, lib, pkgs, ...}: let
hS = config.hostSpec;
in {
config = lib.mkIf hS.isDev {
virtualisation.docker.enable = true;
# Persist Docker data on impermanent hosts
environment.persistence."/persist".directories =
lib.mkIf config.nixfleet.impermanence.enable [
"/var/lib/docker"
];
};
}
For user-level persistence (in an HM module):
home.persistence."/persist" = lib.mkIf config.nixfleet.impermanence.enable {
directories = [".cargo" ".rustup" ".npm"];
};
Step 5: Import in mkHost
Add the scope module to your host definitions:
nixfleet.lib.mkHost {
hostName = "workstation";
platform = "x86_64-linux";
hostSpec = {
userName = "alice";
isDev = true;
};
modules = [
./modules/host-spec-extensions.nix
./modules/scopes/dev.nix
./hardware-configuration.nix
];
}
If you use an import-tree or similar auto-discovery pattern, the scope is picked up automatically without explicit imports.
Conventions
- One concern per scope -
dev,graphical,desktop, notdev-and-graphical lib.mkIfon enable options - scopes produce no config when their enable is false- Co-locate persistence - persist paths live in the scope that needs them
- Framework vs fleet - generic infrastructure (base, impermanence, agent, CP) belongs in NixFleet. Opinionated tools and theming belong in your fleet repo.
Secrets
NixFleet provides a secrets wiring scope that handles identity path management, impermanence persistence, and boot ordering. Fleet repos bring their own backend (agenix, sops-nix) and wire it to the framework.
Enabling the secrets scope
nixfleet.secrets.enable = true;
The scope computes config.nixfleet.secrets.resolvedIdentityPaths based on its options:
- Servers (
enableUserKey = false, the default for the server role): host SSH key only (/etc/ssh/ssh_host_ed25519_key) - Workstations (
enableUserKey = true, the default for the workstation role): host SSH key + user key fallback (~/.keys/id_ed25519)
On impermanent hosts, identity keys are automatically persisted.
agenix example
# flake.nix inputs
inputs.agenix.url = "github:ryantm/agenix";
inputs.agenix.inputs.nixpkgs.follows = "nixfleet/nixpkgs";
# In your host modules
{inputs, config, ...}: {
imports = [inputs.agenix.nixosModules.default];
# Use framework-computed identity paths
age.identityPaths = config.nixfleet.secrets.resolvedIdentityPaths;
age.secrets.root-password.file = "${inputs.secrets}/org/root-password.age";
hostSpec = {
hashedPasswordFile = config.age.secrets.root-password.path;
rootHashedPasswordFile = config.age.secrets.root-password.path;
};
}
sops-nix example
# flake.nix inputs
inputs.sops-nix.url = "github:Mic92/sops-nix";
inputs.sops-nix.inputs.nixpkgs.follows = "nixfleet/nixpkgs";
# In your host modules
{inputs, config, ...}: {
imports = [inputs.sops-nix.nixosModules.sops];
sops = {
defaultSopsFile = ./secrets/secrets.yaml;
# sops-nix also uses age keys - resolvedIdentityPaths works here too
age.keyFile = builtins.head config.nixfleet.secrets.resolvedIdentityPaths;
};
sops.secrets.root-password.neededForUsers = true;
hostSpec = {
hashedPasswordFile = config.sops.secrets.root-password.path;
rootHashedPasswordFile = config.sops.secrets.root-password.path;
};
}
Extension points
hostSpec provides three options for wiring secrets into the framework:
| Option | Type | Purpose |
|---|---|---|
secretsPath | nullOr str | Hint for the path to your secrets repo/directory. |
hashedPasswordFile | nullOr str | Path to a hashed password file for the primary user. |
rootHashedPasswordFile | nullOr str | Path to a hashed password file for root. |
When hashedPasswordFile or rootHashedPasswordFile is non-null, the core NixOS module sets users.users.<name>.hashedPasswordFile accordingly.
Bootstrapping
New machines need a decryption key before they can decrypt secrets. Two approaches:
–extra-files (nixos-anywhere)
Pass the key during initial install:
mkdir -p /tmp/extra/etc/ssh
cp /path/to/ssh_host_ed25519_key /tmp/extra/etc/ssh/ssh_host_ed25519_key
chmod 600 /tmp/extra/etc/ssh/ssh_host_ed25519_key
nixos-anywhere --flake .#myhost --extra-files /tmp/extra root@192.168.1.50
The build-vm and test-vm apps do this automatically when a key is found at ~/.keys/id_ed25519 or ~/.ssh/id_ed25519. You can also pass a key explicitly with --identity-key PATH. For real hardware, pass --extra-files to nixos-anywhere to inject the key during install.
The secrets scope’s nixfleet-host-key-check service auto-generates the host key at /etc/ssh/ssh_host_ed25519_key on first boot if the key is missing, so bootstrapping without a pre-provisioned key is safe.
Generate on target
SSH into the machine and the host key will be generated automatically by nixfleet-host-key-check before sshd starts. Alternatively, generate one manually and add it to your secrets configuration:
ssh root@192.168.1.50
ssh-keygen -t ed25519 -f /etc/ssh/ssh_host_ed25519_key -N ""
Then extract the public key, add it to your secrets configuration (e.g., secrets.nix for agenix), and re-encrypt the affected secrets.
Key placement on impermanent hosts
On impermanent hosts, the secrets scope automatically persists:
/etc/ssh/ssh_host_ed25519_key(and.pub)- The user key directory (
~/.keys) whenenableUserKeyis true
The impermanence scope also persists ~/.keys independently, providing defense in depth.
Templates & Patterns
NixFleet ships flake templates for common fleet structures. Initialize a new project with:
nix flake init -t github:arcanesys/nixfleet
Available templates
| Template | Command | Description |
|---|---|---|
default / standalone | nix flake init -t nixfleet | Single NixOS machine, no flake-parts |
fleet | nix flake init -t nixfleet#fleet | Multi-host fleet with flake-parts |
batch | nix flake init -t nixfleet#batch | Batch of identical hosts from a template |
standalone
Minimal setup for a single machine. No flake-parts, no import-tree. Just nixfleet + one mkHost call:
{
inputs = {
nixfleet.url = "github:arcanesys/nixfleet";
nixpkgs.follows = "nixfleet/nixpkgs";
};
outputs = {nixfleet, ...}: {
nixosConfigurations.myhost = nixfleet.lib.mkHost {
hostName = "myhost";
platform = "x86_64-linux";
hostSpec = {
userName = "alice";
timeZone = "US/Eastern";
locale = "en_US.UTF-8";
sshAuthorizedKeys = [
"ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAA..."
];
};
modules = [
./hardware-configuration.nix
./disk-config.nix
];
};
};
}
fleet
Multi-host fleet using flake-parts for structure. Imports NixFleet’s flakeModules for apps, tests, formatter, and ISO generation.
batch
Generate many identical hosts from a template. Useful for edge devices, kiosks, or lab machines where the only difference between hosts is the hostname and network config.
The follows chain
Every template uses this pattern:
inputs = {
nixfleet.url = "github:arcanesys/nixfleet";
nixpkgs.follows = "nixfleet/nixpkgs";
};
The follows directive means your fleet uses the same nixpkgs revision that NixFleet was tested against. This is important because:
- Consistency - framework modules, core config, and your fleet code all evaluate against the same package set
- No diamond dependency - without
follows, you would have two separate nixpkgs evaluations (NixFleet’s and yours), doubling memory usage and causing subtle version mismatches - Tested combination - NixFleet’s CI validates against its pinned nixpkgs
NixFleet’s own follows chain
NixFleet pins and follows these inputs internally:
nixpkgs (nixos-unstable)
darwin follows nixpkgs
home-manager follows nixpkgs
disko follows nixpkgs
impermanence follows nixpkgs
lanzaboote follows nixpkgs
microvm follows nixpkgs
nixos-anywhere follows nixpkgs, flake-parts, disko, treefmt-nix
treefmt-nix follows nixpkgs
All major inputs share a single nixpkgs, ensuring consistent package versions throughout the dependency tree.
When to follow vs pin independently
| Scenario | Recommendation |
|---|---|
| Standard fleet | Follow NixFleet’s nixpkgs (follows = "nixfleet/nixpkgs") |
| Need a specific nixpkgs fix not yet in NixFleet | Pin your own nixpkgs, accept potential mismatches, update NixFleet soon |
| Fleet-specific inputs (secrets tool, hardware modules) | Follow your fleet’s nixpkgs for consistency |
| NixFleet’s bundled inputs (disko, HM, etc.) | Always use the versions bundled in NixFleet - they are tested together |
Disko templates
NixFleet also provides reusable disk layout templates, separate from flake templates:
| Template | Import path | Description |
|---|---|---|
btrfs | nixfleet.diskoTemplates.btrfs | Standard btrfs layout |
btrfs-impermanence | nixfleet.diskoTemplates.btrfs-impermanence | Btrfs with @root, @persist, @nix subvolumes for impermanence |
Use them in your mkHost modules list:
modules = [
nixfleet.diskoTemplates.btrfs-impermanence
./hardware-configuration.nix
];
CLI
Flat reference for all nixfleet CLI commands and flags.
Global options
| Flag | Env var | Default | Description |
|---|---|---|---|
--control-plane-url | NIXFLEET_CONTROL_PLANE_URL | http://localhost:8080 | Control plane URL |
--api-key | NIXFLEET_API_KEY | "" | API key for control plane authentication |
--client-cert | NIXFLEET_CLIENT_CERT | "" | Client certificate for mTLS authentication |
--client-key | NIXFLEET_CLIENT_KEY | "" | Client key for mTLS authentication |
--ca-cert | NIXFLEET_CA_CERT | "" | CA certificate for TLS verification (uses system trust store if omitted) |
--json | - | false | Output structured JSON (on commands that produce tables/detail views) |
--config | - | - | Path to .nixfleet.toml (default: walk up from cwd) |
-v, --verbose | - | 0 | Verbosity: -v shows INFO milestones + subprocess rolling window + progress bar; -vv shows raw passthrough (debug) |
Logging is controlled via RUST_LOG (overrides -v/--verbose when set).
Configuration sources
The CLI reads connection settings from four layers, in priority order (highest wins):
- CLI flags (
--control-plane-url,--api-key, …) - Environment variables (
NIXFLEET_*shown above) ~/.config/nixfleet/credentials.toml- user-level API keys, keyed by CP URL (auto-saved bynixfleet bootstrap).nixfleet.toml- repo-level config, from--config <path>or discovered by walking up from cwd
This means the same CLI commands run with no flags from any fleet repo, inheriting the repo’s connection settings and the user’s bootstrapped credentials. See .nixfleet.toml format below.
mTLS example (with config file):
# One-time setup (creates .nixfleet.toml)
nixfleet init \
--control-plane-url https://cp-01:8080 \
--ca-cert modules/_config/fleet-ca.pem \
--client-cert '/run/agenix/agent-${HOSTNAME}-cert' \
--client-key '/run/agenix/agent-${HOSTNAME}-key' \
--cache-url http://cache:5000 \
--push-to ssh://root@cache
# Bootstrap first admin key (auto-saves to ~/.config/nixfleet/credentials.toml)
nixfleet bootstrap
# Subsequent commands: no flags needed
nixfleet machines list
nixfleet release create
nixfleet deploy --release rel-abc123 --hosts 'web-*' --wait
deploy
Deploy config to fleet hosts.
nixfleet deploy [FLAGS]
| Flag | Type | Default | Description |
|---|---|---|---|
--release <ID> | string | – | Deploy an existing release (required for rollout mode unless using --push-to / --copy) |
--push-to <URL> | string | – | Build all hosts, push to a Nix binary cache URL, and register a release implicitly (e.g., ssh://root@cache, s3://bucket) |
--hook | bool | false | Use hook mode: push via [cache.hook] push-cmd instead of nix copy. Requires [cache.hook] in .nixfleet.toml or --hook-push-cmd |
--hook-push-cmd <CMD> | string | – | Override hook push command ({} = store path). Requires --hook |
--hook-url <URL> | string | – | Override hook cache URL for agents to pull from. Requires --hook |
--copy | bool | false | Build all hosts, push to each target via nix-copy-closure (no binary cache needed), and register a release implicitly |
--hosts <PATTERN> | string (comma-separated or repeatable) | * | Host glob patterns. In SSH mode: hosts to deploy. In rollout mode: target machines directly (alternative to --tags) |
--tags <TAG> | string (comma-separated or repeatable) | – | Target machines by tag - filters both the release build and rollout targeting (only hosts with a matching services.nixfleet-agent.tags value are built) |
--dry-run | bool | false | Build closures and show plan, do not push or register |
--ssh | bool | false | SSH fallback mode: build locally, copy via SSH, run switch-to-configuration (no CP needed) |
--target <SSH> | string | – | SSH target override (e.g., root@192.168.1.10). Only valid with --ssh and a single host. |
--flake <REF> | string | . | Flake reference |
--strategy <STRATEGY> | string | all-at-once | Rollout strategy: canary, staged, all-at-once |
--batch-size <SIZES> | string (comma-separated) | – | Batch sizes (e.g., 1,25%,100%) |
--failure-threshold <N> | string | 0 | Max unhealthy machines per batch before pausing/reverting. Accepts absolute count or percentage (e.g. 30%) |
--on-failure <ACTION> | string | pause | Action on batch failure: pause (stop and wait for rollout resume) or revert (roll back to previous generation) |
--health-timeout <SECS> | u64 | 300 | Seconds to wait for health reports per batch |
--wait | bool | false | Stream rollout progress; exits non-zero if rollout pauses or fails |
--wait-timeout <SECS> | u64 | 300 | Timeout in seconds for --wait (0 = wait forever) |
--cache-url <URL> | string | – | Binary cache URL for agents to fetch closures from (overrides the release’s cache_url) |
Modes:
- SSH mode (
--ssh): Builds locally, copies closures via SSH, activates on target. No control plane required. Platform-aware: NixOS hosts useswitch-to-configuration switch, Darwin hosts usenix-env --set+activate(auto-detected from the host’s platform).
Note:
--sshdeploys directly vianix-copy-closureand activation, bypassing the control plane entirely. Lifecycle state is not checked - a machine inmaintenancewill still receive the deploy. Use--sshas an emergency escape hatch when the CP is unavailable, not as a routine deployment method.
Darwin SSH deploy requirements: SSH deploy to Darwin hosts connects as
$USER@host(notroot@- macOS disables root SSH login). This requires:
- Username match: The operator’s local username must exist on the Darwin target with SSH key access. Override with
--target user@hostfor single-host deploys if usernames differ.- Passwordless sudo: Activation requires root. The target must allow passwordless sudo for
nix-envand the activation script:# nix-darwin: security.sudo.extraConfig s33d ALL=(root) NOPASSWD: /nix/var/nix/profiles/default/bin/nix-env * s33d ALL=(root) NOPASSWD: /nix/store/*/activate- SSH key access: The operator’s SSH public key must be in the target user’s authorized keys.
For production mixed-fleet deploys, prefer the CP rollout path - the agent runs as root (launchd daemon), pulls from cache, and activates locally with no SSH user/sudo requirements.
- Rollout mode (requires a release): Creates a rollout on the control plane with the specified strategy. Specify an existing release with
--release <ID>, or use--push-to <url>/--hook/--copyto build + push + register implicitly in one command. - Hook mode (
--hook): Uses[cache.hook] push-cmdfrom.nixfleet.tomlto push closures (e.g.,attic push mycache {}). Overrides--push-toand uses[cache.hook] urlas the cache URL for agents. Flags--hook-push-cmdand--hook-urloverride the config values. - Targeting: Use
--tags <TAG>or--hosts <pattern>to select machines. Both are intersected with the release’s host list (machines not in the release are skipped with a warning).
init
Create a .nixfleet.toml config file in the current directory. Run this once per fleet repo to set the connection and deploy defaults.
nixfleet init [FLAGS]
| Flag | Type | Default | Description |
|---|---|---|---|
--control-plane-url <URL> | string | – (required) | Control plane URL |
--ca-cert <PATH> | string | – | CA certificate path (relative to config file or absolute) |
--client-cert <PATH> | string | – | Client certificate path (supports ${HOSTNAME} expansion) |
--client-key <PATH> | string | – | Client key path (supports ${HOSTNAME} expansion) |
--cache-url <URL> | string | – | Default binary cache URL for agents |
--push-to <URL> | string | – | Default push destination for release create |
--hook-url <URL> | string | – | Hook mode cache URL (e.g., http://cache:8081/mycache for Attic) |
--hook-push-cmd <CMD> | string | – | Hook mode push command ({} = store path, e.g., attic push mycache {}) |
--strategy <STRATEGY> | string | – | Default deploy strategy (canary, staged, all-at-once) |
--on-failure <ACTION> | string | – | Default deploy failure action (pause, revert) |
After init, run nixfleet bootstrap to create and auto-save the first admin API key.
release create
Build host closures, distribute them, and register a release manifest in the control plane. A release is an immutable mapping of hostnames to built store paths that subsequent rollouts can target.
nixfleet release create [FLAGS]
| Flag | Type | Default | Description |
|---|---|---|---|
--flake <REF> | string | . | Flake reference |
--hosts <PATTERN> | string | * | Host glob pattern or comma-separated list |
--push-to <URL> | string | – | Push closures to this Nix cache URL via nix copy --to (e.g., ssh://root@cache, s3://bucket) |
--hook | bool | false | Use hook mode: push via [cache.hook] push-cmd instead of nix copy |
--hook-push-cmd <CMD> | string | – | Override hook push command ({} = store path). Requires --hook |
--hook-url <URL> | string | – | Override hook cache URL. Requires --hook |
--copy | bool | false | Push closures directly to each target host via nix-copy-closure (no binary cache) |
--cache-url <URL> | string | – | Override the cache URL recorded in the release (defaults to --push-to URL, or config file) |
--eval-only | bool | false | Evaluate config.system.build.toplevel.outPath without building. Assumes closures are already in the cache (e.g., CI-built). Incompatible with --push-to, --hook, --copy |
--dry-run | bool | false | Build and show the manifest without pushing or registering |
--allow-dirty | bool | false | Skip the dirty working tree check |
Output prints the release ID, host count, and per-host store paths. Use the ID with nixfleet deploy --release <ID>.
release list
List recent releases.
nixfleet release list [FLAGS]
| Flag | Type | Default | Description |
|---|---|---|---|
--limit <N> | u32 | 20 | Number of releases to show (newest first) |
--host <HOSTNAME> | string | – | Filter releases to those containing entries for this hostname |
release show
Show a release’s full metadata and per-host entries.
nixfleet release show <ID>
| Argument | Type | Description |
|---|---|---|
<ID> | string | Release ID |
release diff
Diff two releases: added hosts, removed hosts, changed store paths, unchanged.
nixfleet release diff <ID_A> <ID_B>
| Argument | Type | Description |
|---|---|---|
<ID_A> | string | First release ID |
<ID_B> | string | Second release ID |
release delete
Delete a release. Fails with exit code 1 if the release is still referenced by a rollout - the control plane returns 409 in that case to prevent breaking rollout history.
nixfleet release delete <RELEASE_ID>
| Argument | Type | Description |
|---|---|---|
<RELEASE_ID> | string | ID of the release to delete |
Exit codes:
0- release deleted (CP returned 204)1- release still referenced by a rollout (CP returned 409), release not found (CP returned 404), or another non-2xx status
status
Show fleet status from the control plane.
nixfleet status [FLAGS]
| Flag | Type | Default | Description |
|---|---|---|---|
--stale-threshold <SECS> | u64 | 600 | Seconds without a report before a machine is marked stale |
--watch | bool | false | Continuously refresh the display (clears screen, Ctrl+C to exit). Incompatible with --json |
--interval <SECS> | u64 | 2 | Refresh interval in seconds (requires --watch) |
Outputs a table of all machines. Pass --json (global flag) for structured JSON output.
rollback
Rollback a single machine to a previous generation via SSH. Activates the previous generation directly on the target, then notifies the control plane so desired generation stays in sync.
nixfleet rollback --host <HOST> --ssh [FLAGS]
| Flag | Type | Default | Description |
|---|---|---|---|
--host <HOST> | string | – (required) | Target host name |
--generation <PATH> | string | – | Store path to roll back to (default: previous generation from system-1-link) |
--target | string | - | SSH target override (e.g. root@192.168.1.10) |
--darwin | bool | false | Target is a Darwin (macOS) host - uses $USER@host, sudo activate instead of switch-to-configuration |
Rollback always operates via SSH. The --ssh flag is accepted for backwards compatibility but hidden from --help. For CP-driven rollback, use --on-failure revert on rollouts, or deploy an older release. After a successful rollback, the CP is notified (best-effort) so nixfleet status shows the machine in sync.
Darwin rollback: Use --darwin for macOS hosts. This runs nix-env --set + activate instead of switch-to-configuration:
nixfleet rollback --host aether --ssh --darwin
host add
Scaffold a new host.
nixfleet host add --hostname <NAME> [FLAGS]
| Flag | Type | Default | Description |
|---|---|---|---|
--hostname <NAME> | string | – (required) | Host name for the new machine |
--org <ORG> | string | my-org | Organization name |
--role <ROLE> | string | workstation | Host role (workstation, server, edge, kiosk) |
--platform <PLATFORM> | string | x86_64-linux | Target platform |
--target <SSH> | string | – | SSH target to fetch hardware config (e.g., root@192.168.1.42) |
rollout list
List rollouts.
nixfleet rollout list [FLAGS]
| Flag | Type | Default | Description |
|---|---|---|---|
--status <STATUS> | string | – | Filter by status (e.g., running, paused, completed) |
--sort <FIELD> | string | created | Sort by: created (newest first), status, strategy |
rollout status
Show rollout detail with batch breakdown.
nixfleet rollout status <ID> [FLAGS]
| Argument/Flag | Type | Default | Description |
|---|---|---|---|
<ID> | string | – | Rollout ID |
--wait | bool | false | Block until rollout completes, fails, is cancelled, or pauses. Exits non-zero on failure or pause |
--wait-timeout <SECS> | u64 | 300 | Timeout in seconds for --wait (0 = wait forever) |
--watch | bool | false | Continuously refresh the display (clears screen, Ctrl+C to exit). Incompatible with --wait and --json |
--interval <SECS> | u64 | 2 | Refresh interval in seconds (requires --watch) |
rollout resume
Resume a paused rollout.
nixfleet rollout resume <ID>
| Argument | Type | Description |
|---|---|---|
<ID> | string | Rollout ID |
rollout cancel
Cancel a rollout.
nixfleet rollout cancel <ID>
| Argument | Type | Description |
|---|---|---|
<ID> | string | Rollout ID |
bootstrap
Create the first admin API key. Only works when no keys exist in the control plane.
nixfleet bootstrap [FLAGS]
| Flag | Type | Default | Description |
|---|---|---|---|
--name <NAME> | string | admin | Name for the admin key |
--save-key <KEY> | string | – | Save an existing API key without calling the CP (for setting up additional machines) |
Output: Human-friendly info to stderr, raw key to stdout. Scriptable:
API_KEY=$(nixfleet bootstrap)
Returns exit code 1 with an error message if keys already exist (409).
Note: No --api-key needed (chicken-and-egg). mTLS is still required when the CP has --client-ca set.
Multi-machine setup: Bootstrap once on your primary machine, then use --save-key on additional machines to share the same API key without re-bootstrapping:
# On the primary machine:
nixfleet bootstrap
# On additional machines (same fleet):
nixfleet bootstrap --save-key nfk-abc123...
completions
Generate a shell completion script.
nixfleet completions <SHELL>
| Argument | Type | Description |
|---|---|---|
<SHELL> | string | Target shell: zsh, bash, or fish |
Source the output in your shell profile:
# zsh
nixfleet completions zsh > ~/.zsh/completions/_nixfleet
# bash
nixfleet completions bash > /etc/bash_completion.d/nixfleet
# fish
nixfleet completions fish > ~/.config/fish/completions/nixfleet.fish
machines register
Register a machine with the control plane (admin endpoint).
nixfleet machines register <ID> [FLAGS]
| Argument/Flag | Type | Description |
|---|---|---|
<ID> | string | Machine ID |
--tags <TAG> | string (comma-separated or repeatable) | Initial tags |
Agents auto-register on first health report, so manual registration is optional. Use this to pre-register machines before they come online.
machines list
List machines.
nixfleet machines list [FLAGS]
| Flag | Type | Default | Description |
|---|---|---|---|
--tags <TAG> | string (comma-separated or repeatable) | – | Filter by tags (machines matching any listed tag are shown) |
--watch | bool | false | Refresh the list on an interval (clears screen, Ctrl+C to exit). Incompatible with --json |
--interval <SECS> | u64 | 2 | Refresh interval in seconds (requires --watch) |
machines set-lifecycle
Change a machine’s lifecycle state.
nixfleet machines set-lifecycle <ID> <STATE>
| Argument | Type | Description |
|---|---|---|
<ID> | string | Machine ID |
<STATE> | string | Lifecycle state: active, pending, provisioning, maintenance, decommissioned |
Only active machines participate in rollouts. Machines in maintenance or
decommissioned state are excluded even when explicitly targeted by hostname.
Use maintenance to temporarily remove a machine from fleet operations without
deregistering it.
machines clear-desired
Clear a machine’s stale desired generation. Use this when an agent is stuck polling for a generation that will never be fulfilled (e.g., after a cancelled rollout).
nixfleet machines clear-desired <ID>
| Argument | Type | Description |
|---|---|---|
<ID> | string | Machine ID |
Exit codes:
0- desired generation cleared (CP returned 204)1- machine not found (CP returned 404), or another non-2xx status
machines notify-deploy
Notify the control plane of an out-of-band deploy (e.g. SSH). Sets the machine’s desired generation to the deployed store path so nixfleet status shows the machine in sync once the agent confirms.
Called automatically by deploy --ssh after a successful switch. Also available manually for other out-of-band deploy workflows.
nixfleet machines notify-deploy <ID> <STORE_PATH>
| Argument | Type | Description |
|---|---|---|
<ID> | string | Machine ID |
<STORE_PATH> | string | Store path that was deployed |
Requires deploy or admin role.
rollout delete
Delete a terminal rollout (completed, cancelled, or failed). The control plane rejects deletion of active rollouts with 409.
nixfleet rollout delete <ID>
| Argument | Type | Description |
|---|---|---|
<ID> | string | Rollout ID |
Exit codes:
0- rollout deleted (CP returned 204)1- rollout is still active (CP returned 409), rollout not found (CP returned 404), or another non-2xx status
Operation logs
All CLI operations (deploy, release create, rollout commands) write persistent logs to:
~/.local/state/nixfleet/logs/
Each operation creates a JSONL file with timestamped entries covering subprocess invocations (command, stdout, stderr, exit code), tracing events, and host context. Logs are written regardless of verbosity level.
.nixfleet.toml format
Committed to the fleet repo root. Discovered by walking up from the CLI’s current working directory. All fields optional - CLI flags and environment variables always override.
[control-plane]
url = "https://cp.example.com:8080"
ca-cert = "modules/_config/fleet-ca.pem" # relative to config file location
[tls]
client-cert = "/run/agenix/agent-${HOSTNAME}-cert"
client-key = "/run/agenix/agent-${HOSTNAME}-key"
[cache]
url = "http://cache.example.com:5000" # default --cache-url for rollouts
push-to = "ssh://root@cache.example.com" # default --push-to for release create
[cache.hook] # used when --hook is passed
url = "http://cache.example.com:8081/mycache" # overrides cache.url for the release
push-cmd = "attic push mycache {}" # {} is replaced with the store path
[deploy]
strategy = "staged" # default rollout strategy
health-timeout = 300 # default health timeout in seconds
failure-threshold = "0"
on-failure = "pause"
Environment variable expansion: values support ${VAR} expansion. ${HOSTNAME} and ${HOST} fall back to the gethostname() syscall if not set in the environment (so they work from zsh where $HOST is a shell builtin, not exported). This lets the same .nixfleet.toml work across every fleet host when agent cert paths follow a per-hostname convention.
Relative paths (like ca-cert = "modules/_config/fleet-ca.pem") are resolved relative to the .nixfleet.toml location, not the CLI’s working directory.
~/.config/nixfleet/credentials.toml format
User-level, mode 600, not checked into any repo. Written automatically by nixfleet bootstrap and keyed by CP URL to support multiple clusters.
["https://cp.example.com:8080"]
api-key = "nfk-73c713cc..."
["https://cp-staging.example.com:8080"]
api-key = "nfk-abc..."
On impermanent NixOS hosts, add .config/nixfleet to home-manager persistence so the credentials file survives reboots.
mkHost API
Parameters
nixfleet.lib.mkHost {
hostName = "myhost";
platform = "x86_64-linux";
stateVersion = "24.11"; # optional
hostSpec = { ... }; # optional
modules = [ ... ]; # optional
isVm = false; # optional
}
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
hostName | string | yes | – | Machine hostname. Forced into hostSpec.hostName (not overridable). |
platform | string | yes | – | Target platform: x86_64-linux, aarch64-linux, aarch64-darwin, x86_64-darwin. |
stateVersion | string | no | "24.11" | NixOS state version (set with lib.mkDefault). Not used for Darwin - consumers set it in their host modules. |
hostSpec | attrset | no | {} | Host configuration flags. Values are set with lib.mkDefault (overridable by modules). hostName is always forced to match the parameter. |
modules | list | no | [] | Additional NixOS or Darwin modules appended after framework modules. |
isVm | bool | no | false | Inject QEMU VM hardware config (virtio disk, SPICE, DHCP, software GL). NixOS only. |
Return type
- Linux platforms (
x86_64-linux,aarch64-linux): Returns the result ofnixpkgs.lib.nixosSystem. - Darwin platforms (
aarch64-darwin,x86_64-darwin): Returns the result ofdarwin.lib.darwinSystem.
Platform detection is automatic based on platform.
Injected modules
mkHost injects framework modules before user-provided modules. These are mechanism-only - no opinions about packages, services, or user environment.
NixOS (Linux)
system.stateVersion(mkDefault)nixpkgs.hostPlatformset toplatform- hostSpec module (option declarations)
- hostSpec values set with
lib.mkDefault(overridable by consumer modules) hostSpec.hostNameforced to match thehostNameparameter- Impermanence scope from nixfleet-scopes (declares options only - inert unless
nixfleet.impermanence.enable = true) - Core NixOS module (
_nixos.nix) - Agent service module (disabled by default)
- Control plane service module (disabled by default)
- Cache server service module (disabled by default)
- Cache client module (disabled by default)
- MicroVM host module (disabled by default)
- User-provided
modules
When isVm = true, additionally injects:
- QEMU disk config and hardware configuration
- SPICE agent (
services.spice-vdagentd.enable) - Forced DHCP (
networking.useDHCP = lib.mkForce true) - Software GL (
LIBGL_ALWAYS_SOFTWARE, mesa)
Why impermanence is auto-imported: NixFleet’s internal service modules (agent, control-plane, microvm-host) conditionally contribute to environment.persistence. The NixOS module system validates option paths even inside lib.mkIf false, so the impermanence scope must be present to declare those options. The scope is inert (zero cost) until explicitly enabled.
Darwin (macOS)
nixpkgs.hostPlatformset toplatform- hostSpec module (option declarations)
- hostSpec values set with
lib.mkDefault(overridable by consumer modules) hostSpec.hostNameforced to match thehostNameparameterhostSpec.isDarwin = true- Core Darwin module (
_darwin.nix) - Agent Darwin module (disabled by default)
- User-provided
modules
NOT auto-included
These are consumer responsibilities - import them via roles or explicitly in modules:
- disko - disk partitioning (import from nixfleet-scopes or use
diskoTemplates) - base scope - opinionated system defaults (import from nixfleet-scopes)
- home-manager - user environment management (import from nixfleet-scopes)
- operators scope - multi-user inventory (import from nixfleet-scopes)
- All other infrastructure scopes - firewall, secrets, backup, monitoring, etc.
The typical pattern is to import a role, which bundles the relevant scopes:
modules = [
inputs.nixfleet.scopes.roles.workstation # includes base, HM, operators, etc.
./hardware-configuration.nix
];
Framework inputs
Framework inputs are passed via specialArgs = {inherit inputs;}. Modules can access them as the inputs argument. These are NixFleet’s own inputs (nixpkgs, home-manager, disko, impermanence, etc.), not fleet-level inputs.
Home Manager
Home Manager is a scope from nixfleet-scopes. It is not auto-injected by mkHost.
Import it via a role (workstation and endpoint roles include it) or manually:
modules = [
nixfleet.scopes.home-manager
{ nixfleet.home-manager.enable = true; }
];
The scope fans out profileImports to all operators with homeManager.enable = true.
Scope re-exports
NixFleet re-exports nixfleet-scopes so consumers do not need a separate flake input:
# These are equivalent:
inputs.nixfleet-scopes.scopes.roles.workstation
inputs.nixfleet.scopes.roles.workstation
Available under inputs.nixfleet.scopes:
scopes.roles.*- workstation, server, endpoint, microvm-guestscopes.base- opinionated system defaultsscopes.home-manager- HM integrationscopes.impermanence- impermanence supportscopes.disk-templates.*- disko disk layouts- All other nixfleet-scopes exports
Exports
All exports from the NixFleet flake:
| Export | Access path | Description |
|---|---|---|
lib.mkHost | inputs.nixfleet.lib.mkHost | Host definition function |
lib.mkVmApps | inputs.nixfleet.lib.mkVmApps | VM helper apps generator |
nixosModules.nixfleet-core | inputs.nixfleet.nixosModules.nixfleet-core | Raw core NixOS module (without mkHost) |
scopes | inputs.nixfleet.scopes | Re-export of nixfleet-scopes (no separate input needed) |
diskoTemplates | inputs.nixfleet.diskoTemplates | Alias for scopes.disk-templates |
flakeModules.apps | inputs.nixfleet.flakeModules.apps | VM lifecycle apps (for fleet repos) |
flakeModules.tests | inputs.nixfleet.flakeModules.tests | Eval and VM test infrastructure (for fleet repos) |
flakeModules.iso | inputs.nixfleet.flakeModules.iso | ISO builder (for fleet repos) |
flakeModules.formatter | inputs.nixfleet.flakeModules.formatter | Treefmt config - alejandra + shfmt (for fleet repos) |
templates.default | nix flake init -t nixfleet | Single-host template (same as standalone) |
templates.standalone | nix flake init -t nixfleet#standalone | Single NixOS machine |
templates.batch | nix flake init -t nixfleet#batch | Batch of identical hosts from a template |
templates.fleet | nix flake init -t nixfleet#fleet | Multi-host fleet with flake-parts |
hostSpec Options
All options declared in the framework’s hostSpec module. Fleet repos can extend hostSpec with additional options via plain NixOS modules.
Data fields
| Option | Type | Default | Description |
|---|---|---|---|
hostName | str | – (required) | The hostname of the host. Set automatically by mkHost. |
userName | str | – (required) | The username of the primary user. |
home | str | /home/<userName> (Linux) or /Users/<userName> (Darwin) | Home directory path. Computed from userName and isDarwin. |
timeZone | str | "UTC" | IANA timezone (e.g., Europe/Paris). |
locale | str | "en_US.UTF-8" | System locale. |
keyboardLayout | str | "us" | XKB keyboard layout. |
networking | attrsOf anything | {} | Attribute set of networking information (e.g., { interface = "enp3s0"; }). |
sshAuthorizedKeys | listOf str | [] | SSH public keys added to authorized_keys for both the primary user and root. |
secretsPath | nullOr str | null | Hint for secrets repo path. Framework-agnostic - no tool coupling. |
hashedPasswordFile | nullOr str | null | Path to hashed password file for the primary user. When non-null, sets users.users.<userName>.hashedPasswordFile. |
rootHashedPasswordFile | nullOr str | null | Path to hashed password file for root. When non-null, sets users.users.root.hashedPasswordFile. |
Platform flag
| Option | Type | Default | Description |
|---|---|---|---|
isDarwin | bool | false | Darwin (macOS) host. Set automatically by mkHost for Darwin platforms. |
Note: Earlier revisions of NixFleet had
isMinimal,isImpermanent, andisServerflags here. These have been removed. Their roles are now played by scope enable options (nixfleet.impermanence.enable,nixfleet.firewall.enable, etc.) set by roles in nixfleet-scopes.
Extending hostSpec
Fleet repos add custom flags via plain NixOS modules:
{lib, ...}: {
options.hostSpec = {
isDev = lib.mkOption {
type = lib.types.bool;
default = false;
description = "Enable development tools.";
};
isGraphical = lib.mkOption {
type = lib.types.bool;
default = false;
description = "Enable graphical environment.";
};
};
}
Include the extension module in your mkHost modules list. Framework-level hostSpec options and fleet-level extensions merge naturally through the NixOS module system.
Agent Options
All options under services.nixfleet-agent. The module is auto-included by mkHost and disabled by default.
Top-level options
| Option | Type | Default | Description |
|---|---|---|---|
enable | bool | false | Enable the NixFleet fleet management agent. |
controlPlaneUrl | str | – (required when enabled) | URL of the NixFleet control plane. Example: "https://fleet.example.com". |
machineId | str | config.networking.hostName | Machine identifier reported to the control plane. |
pollInterval | int | 60 | Steady-state poll interval in seconds. The control plane may override this for individual cycles via a poll_hint field in the desired-generation response (set to 5 during active rollouts), letting the agent react to new deploys within seconds without reducing the steady-state polling rate. |
retryInterval | int | 30 | Retry interval in seconds after a failed poll (network error, CP not ready, fetch failure, bootstrap race). Shorter than pollInterval so the agent recovers quickly from transient failures without flooding the CP. |
cacheUrl | nullOr str | null | Global binary cache URL for fetching closures. Resolution order: (1) per-generation cache_url from the release entry; (2) this option if set; (3) if neither is set, the agent verifies the store path exists locally via nix path-info - the path must be pre-pushed out-of-band. Example: "http://cache:5000". |
dbPath | str | "/var/lib/nixfleet/state.db" | Path to the SQLite state database. |
dryRun | bool | false | When true, check and fetch but do not apply generations. |
tags | listOf str | [] | Tags for grouping this machine in fleet operations. Passed via NIXFLEET_TAGS environment variable. |
healthInterval | int | 60 | Seconds between continuous health reports to the control plane. |
allowInsecure | bool | false | Allow insecure HTTP connections to the control plane. Development only. |
tls.clientCert | nullOr str | null | Path to client certificate PEM file for mTLS authentication. Example: "/run/secrets/agent-cert.pem". |
tls.clientKey | nullOr str | null | Path to client private key PEM file for mTLS authentication. Example: "/run/secrets/agent-key.pem". |
metricsPort | nullOr port | null | Port for agent Prometheus metrics HTTP listener. Null disables metrics. |
metricsOpenFirewall | bool | false | Open the metrics port in the firewall. Only effective when metricsPort is set. |
healthChecks.systemd
List of systemd unit health checks.
| Sub-option | Type | Default | Description |
|---|---|---|---|
units | listOf str | – | Systemd units that must be active. |
Example:
services.nixfleet-agent.healthChecks.systemd = [
{ units = ["nginx.service" "postgresql.service"]; }
];
healthChecks.http
List of HTTP endpoint health checks.
| Sub-option | Type | Default | Description |
|---|---|---|---|
url | str | – | URL to GET. |
interval | int | 5 | Check interval in seconds. |
timeout | int | 3 | Timeout in seconds. |
expectedStatus | int | 200 | Expected HTTP status code. |
Example:
services.nixfleet-agent.healthChecks.http = [
{ url = "http://localhost:8080/health"; }
{ url = "https://localhost:443"; expectedStatus = 200; timeout = 5; }
];
healthChecks.command
List of custom command health checks.
| Sub-option | Type | Default | Description |
|---|---|---|---|
name | str | – | Check name. |
command | str | – | Shell command (exit 0 = healthy). |
interval | int | 10 | Check interval in seconds. |
timeout | int | 5 | Timeout in seconds. |
Example:
services.nixfleet-agent.healthChecks.command = [
{
name = "disk-space";
command = "test $(df --output=pcent / | tail -1 | tr -d ' %') -lt 90";
interval = 30;
timeout = 5;
}
];
Prometheus Metrics
When metricsPort is set, the agent starts a Prometheus HTTP listener on that port. Null (the default) disables the listener.
Metrics exposed:
| Metric | Description |
|---|---|
nixfleet_agent_state | Current phase of the deploy cycle (idle, checking, fetching, applying, verifying, reporting, rolling_back) encoded as a label |
nixfleet_agent_poll_duration_seconds | Duration of the last poll cycle |
nixfleet_agent_last_poll_timestamp_seconds | Unix timestamp of the last completed poll |
nixfleet_agent_health_check_duration_seconds | Duration of the last health check run |
nixfleet_agent_health_check_status | Result of the last health check (1 = healthy, 0 = unhealthy) |
nixfleet_agent_generation_info | Nix store path of the current active generation (as a label) |
Metrics are served in the standard Prometheus text format at GET /metrics.
Example configuration:
services.nixfleet-agent = {
enable = true;
controlPlaneUrl = "https://fleet.example.com";
metricsPort = 9101;
metricsOpenFirewall = true;
};
Systemd service
The agent runs as a privileged root systemd service:
| Setting | Value |
|---|---|
| Target | multi-user.target |
| After | network-online.target, nix-daemon.service |
| Restart | always (30s delay) |
| StateDirectory | nixfleet |
| NoNewPrivileges | true |
| PATH | ${config.nix.package}/bin:${pkgs.systemd}/bin |
| Environment | XDG_CACHE_HOME=/var/lib/nixfleet/.cache |
Hardening rationale. The agent runs switch-to-configuration as a subprocess, which needs full system access (/dev, /home, cgroups, kernel modules). Sandboxing (e.g. PrivateDevices, ProtectHome) would break these operations. The threat model is equivalent to sudo nixos-rebuild switch as a daemon. NoNewPrivileges = true is kept to prevent setuid escalation.
nixis inPATHfornix copyandnix path-info.XDG_CACHE_HOMEpoints into the state directory so nix metadata cache persists on impermanent hosts.
Health check configuration is written to /etc/nixfleet/health-checks.json and passed via --health-config.
On impermanent hosts, /var/lib/nixfleet is automatically persisted (including the XDG cache subdirectory).
Control Plane Options
All options under services.nixfleet-control-plane. The module is auto-included by mkHost and disabled by default.
Options
| Option | Type | Default | Description |
|---|---|---|---|
enable | bool | false | Enable the NixFleet control plane server. |
listen | str | "0.0.0.0:8080" | Address and port to listen on. |
dbPath | str | "/var/lib/nixfleet-cp/state.db" | Path to the SQLite state database. |
openFirewall | bool | false | Open the control plane port in the firewall. The port is parsed from the listen value. |
tls.cert | nullOr str | null | Path to TLS certificate PEM file. Enables HTTPS when set (requires tls.key). Example: "/run/secrets/cp-cert.pem". |
tls.key | nullOr str | null | Path to TLS private key PEM file. Example: "/run/secrets/cp-key.pem". |
tls.clientCa | nullOr str | null | Path to client CA PEM file. When set, all TLS connections must present a valid client certificate signed by this CA (required mTLS). Admin clients must present both a client cert and an API key. Example: "/run/secrets/fleet-ca.pem". |
Prometheus Metrics
The control plane exposes a GET /metrics endpoint on its listen address. No separate port or additional configuration is required - the endpoint is always available when the service is running.
No authentication is required for /metrics (same as /health). Restrict access at the network level if needed.
Metrics exposed:
| Metric | Description |
|---|---|
nixfleet_fleet_size | Total number of registered machines |
nixfleet_machines_by_lifecycle | Machine count grouped by lifecycle state (label: lifecycle) |
nixfleet_machine_last_seen_timestamp_seconds | Unix timestamp of each machine’s last report (label: machine_id) |
nixfleet_http_requests_total | HTTP request count by method, path, and status code |
nixfleet_http_request_duration_seconds | HTTP request latency histogram |
nixfleet_rollouts_total | Rollout count by status (label: status) |
nixfleet_rollouts_active | Number of currently active rollouts (created, running, or paused) |
Example:
curl http://localhost:8080/metrics
Systemd service
| Setting | Value |
|---|---|
| Target | multi-user.target |
| After | network-online.target |
| Restart | always (10s delay) |
| StateDirectory | nixfleet-cp |
| NoNewPrivileges | true |
| ProtectHome | true |
| PrivateTmp | true |
| PrivateDevices | true |
| ProtectKernelTunables | true |
| ProtectKernelModules | true |
| ProtectControlGroups | true |
| ReadWritePaths | /var/lib/nixfleet-cp |
Example
services.nixfleet-control-plane = {
enable = true;
listen = "0.0.0.0:8080";
openFirewall = true;
};
On impermanent hosts, /var/lib/nixfleet-cp is automatically persisted.
Secrets Options
This module is provided by nixfleet-scopes. It is documented here as part of the NixFleet ecosystem reference.
All options under nixfleet.secrets. The module is auto-included by mkHost and disabled by default. Enable with nixfleet.secrets.enable = true.
Options
| Option | Type | Default | Description |
|---|---|---|---|
enable | bool | false | Enable NixFleet secrets wiring (identity paths, persist, boot ordering). |
identityPaths.hostKey | nullOr str | "/etc/ssh/ssh_host_ed25519_key" | Primary decryption identity (host SSH key). Used on all hosts. |
identityPaths.userKey | nullOr str | "<home>/.keys/id_ed25519" | Fallback decryption identity (user key). Used on workstations only. Computed from hostSpec.home. |
identityPaths.enableUserKey | bool | true | Include user key in resolved paths. The server role overrides this to false. |
identityPaths.extra | listOf str | [] | Additional identity paths appended to the resolved list. |
resolvedIdentityPaths | listOf str | (computed) | Read-only. Computed identity paths. Fleet modules pass this to agenix/sops. |
resolvedIdentityPaths computation
The computed list is:
hostKey(if non-null)userKey(ifenableUserKeyis true anduserKeyis non-null)- Each entry in
extra
resolvedIdentityPaths is always computed, even when the scope is disabled, so fleet modules can read it without requiring nixfleet.secrets.enable.
Systemd service
When enable = true and identityPaths.hostKey is non-null:
| Setting | Value |
|---|---|
| Unit | nixfleet-host-key-check.service |
| Type | oneshot |
| WantedBy | multi-user.target |
| Before | sshd.service |
| Condition | ConditionPathExists = !<hostKey> (runs only if key is missing) |
| Action | Generates ed25519 SSH key at identityPaths.hostKey |
A non-fatal activation script (nixfleet-secrets-check) warns at activation if any identity path is missing.
Impermanence
On impermanent hosts (nixfleet.impermanence.enable = true), the scope automatically adds to environment.persistence."/persist":
files:hostKeyandhostKey.pubdirectories: parent directory ofuserKey(whenenableUserKeyis true)
Example
{config, ...}: {
nixfleet.secrets = {
enable = true;
# Defaults are sufficient for most hosts.
# Servers: resolvedIdentityPaths = ["/etc/ssh/ssh_host_ed25519_key"]
# Workstations: resolvedIdentityPaths = ["/etc/ssh/ssh_host_ed25519_key" "~/.keys/id_ed25519"]
};
# Wire to agenix
age.identityPaths = config.nixfleet.secrets.resolvedIdentityPaths;
}
To add a hardware security key as an extra identity:
nixfleet.secrets.identityPaths.extra = ["/run/user/1000/gnupg/S.gpg-agent.ssh"];
Backup Options
This module is provided by nixfleet-scopes. It is documented here as part of the NixFleet ecosystem reference.
All options under nixfleet.backup. The module is auto-included by mkHost and disabled by default. Enable with nixfleet.backup.enable = true.
The backup scope is backend-agnostic. It creates the systemd timer and service skeleton. Set backend to "restic" or "borgbackup" to use a built-in backend, or set systemd.services.nixfleet-backup.serviceConfig.ExecStart directly to use any other tool.
Options
| Option | Type | Default | Description |
|---|---|---|---|
enable | bool | false | Enable NixFleet backup scaffolding (timer, health, persistence). |
backend | nullOr enum ["restic" "borgbackup"] | null | Backup backend. Null = fleet sets ExecStart manually. |
paths | listOf str | ["/persist"] | Directories to back up. |
exclude | listOf str | ["/persist/nix" "*.cache"] | Patterns to exclude from backup. |
schedule | str | "daily" | Systemd calendar expression (e.g., daily, weekly, *-*-* 02:00:00). |
retention.daily | int | 7 | Number of daily snapshots to keep. |
retention.weekly | int | 4 | Number of weekly snapshots to keep. |
retention.monthly | int | 6 | Number of monthly snapshots to keep. |
healthCheck.onSuccess | nullOr str | null | URL to GET on successful backup (e.g., Healthchecks.io ping URL). |
healthCheck.onFailure | nullOr str | null | URL to GET on backup failure. |
preHook | lines | "" | Shell commands to run before backup. |
postHook | lines | "" | Shell commands to run after successful backup. |
stateDirectory | str | "/var/lib/nixfleet-backup" | Directory for backup state/cache. Persisted on impermanent hosts. |
restic backend options
Active when backend = "restic". The restic package is added to environment.systemPackages automatically.
| Option | Type | Default | Description |
|---|---|---|---|
restic.repository | str | "" | Restic repository URL or path. Example: "/mnt/backup/restic". |
restic.passwordFile | str | "" | Path to file containing the repository password. Example: "/run/secrets/restic-password". |
borgbackup backend options
Active when backend = "borgbackup". The borgbackup package is added to environment.systemPackages automatically.
| Option | Type | Default | Description |
|---|---|---|---|
borgbackup.repository | str | "" | Borg repository path or ssh://user@host/path. |
borgbackup.passphraseFile | nullOr str | null | Path to file containing the repository passphrase. Null = repokey without passphrase. |
borgbackup.encryption | str | "repokey" | Borg encryption mode (repokey, repokey-blake2, none, etc.). |
Systemd timer
| Setting | Value |
|---|---|
| Unit | nixfleet-backup.timer |
| WantedBy | timers.target |
| OnCalendar | value of schedule |
| Persistent | true (catch up on missed runs) |
| RandomizedDelaySec | 1h (stagger across fleet) |
Systemd service
| Setting | Value |
|---|---|
| Unit | nixfleet-backup.service |
| Type | oneshot |
| After | network-online.target |
| Wants | network-online.target |
| StateDirectory | nixfleet-backup |
| ExecStart | (set by fleet module) |
After a successful backup run, the service writes status.json to stateDirectory:
{"lastRun": "2025-01-15T02:00:00+00:00", "status": "success", "hostname": "web-01"}
When healthCheck.onFailure is set, a companion nixfleet-backup-failure.service is registered as the OnFailure handler.
Impermanence
On impermanent hosts (nixfleet.impermanence.enable = true), the scope automatically persists stateDirectory.
Example - restic (built-in backend)
nixfleet.backup = {
enable = true;
backend = "restic";
paths = ["/persist/home" "/persist/var/lib"];
schedule = "*-*-* 03:00:00";
retention = { daily = 7; weekly = 4; monthly = 3; };
healthCheck.onSuccess = "https://hc-ping.com/your-uuid-here";
restic = {
repository = "s3:s3.amazonaws.com/my-bucket/backups";
passwordFile = "/run/secrets/restic-password";
};
};
Example - borgbackup (built-in backend)
nixfleet.backup = {
enable = true;
backend = "borgbackup";
paths = ["/persist/home" "/persist/var/lib"];
schedule = "weekly";
retention = { daily = 7; weekly = 4; monthly = 6; };
borgbackup = {
repository = "ssh://backup-user@backup-host/var/backups/myhost";
passphraseFile = "/run/secrets/borg-passphrase";
encryption = "repokey-blake2";
};
};
Example - custom backend (manual ExecStart)
{config, pkgs, ...}: {
nixfleet.backup = {
enable = true;
paths = ["/persist/home" "/persist/var/lib"];
schedule = "*-*-* 03:00:00";
retention = { daily = 7; weekly = 4; monthly = 3; };
healthCheck.onSuccess = "https://hc-ping.com/your-uuid-here";
};
systemd.services.nixfleet-backup.serviceConfig.ExecStart = let
resticCmd = "${pkgs.restic}/bin/restic";
repo = "s3:s3.amazonaws.com/my-bucket/backups";
in ''
${resticCmd} -r ${repo} backup \
${builtins.concatStringsSep " " config.nixfleet.backup.paths} \
${builtins.concatStringsSep " " (map (p: "--exclude=${p}") config.nixfleet.backup.exclude)} \
--forget \
--keep-daily ${toString config.nixfleet.backup.retention.daily} \
--keep-weekly ${toString config.nixfleet.backup.retention.weekly} \
--keep-monthly ${toString config.nixfleet.backup.retention.monthly}
'';
}
Cache Options
Options for services.nixfleet-cache-server and services.nixfleet-cache. Both modules are auto-included by mkHost and disabled by default.
The cache server uses harmonia, which serves paths directly from the local Nix store over HTTP. No separate storage backend, database, or push protocol is needed.
services.nixfleet-cache-server
| Option | Type | Default | Description |
|---|---|---|---|
enable | bool | false | Enable the NixFleet binary cache server (harmonia). |
port | port | 5000 | Port to listen on. |
openFirewall | bool | false | Open the cache server port in the firewall. |
signingKeyFile | str | - (required) | Path to the Nix signing key file for on-the-fly signing. Must be readable by the harmonia user (set age.secrets.<name>.owner = "harmonia" with agenix, or sops.secrets.<name>.owner = "harmonia" with sops-nix). Example: "/run/secrets/cache-signing-key". |
services.nixfleet-cache
| Option | Type | Default | Description |
|---|---|---|---|
enable | bool | false | Enable the NixFleet binary cache client. |
cacheUrl | str | - (required) | URL of the binary cache server. Example: "https://cache.fleet.example.com". |
publicKey | str | - (required) | Cache signing public key in name:base64 format. Example: "cache.fleet.example.com:AAAA...=". |
Systemd service (server)
| Setting | Value |
|---|---|
| Unit | nixfleet-cache-server.service |
| WantedBy | multi-user.target |
| After | network-online.target, nix-daemon.service |
| Restart | always (10s delay) |
| NoNewPrivileges | true |
| ProtectHome | true |
| PrivateTmp | true |
| PrivateDevices | true |
| ProtectKernelTunables | true |
| ProtectKernelModules | true |
| ProtectControlGroups | true |
Harmonia is stateless - it serves directly from the local Nix store. No state directory or persistence configuration is needed.
Using a different cache backend
Fleet repos that need Attic, Cachix, or another cache backend can add them as their own flake input and configure them via plain NixOS modules. The --push-hook CLI flag supports custom push commands for any backend.
MicroVM Host Options
All options under services.nixfleet-microvm-host. The module is auto-included by mkHost and disabled by default. Enable with services.nixfleet-microvm-host.enable = true.
The module imports the upstream microvm.nixosModules.host module. MicroVMs themselves are defined via the standard microvm.vms option from the microvm.nix framework; this module only provides the bridge networking, DHCP, and NAT infrastructure for the host.
Options
| Option | Type | Default | Description |
|---|---|---|---|
enable | bool | false | Enable the NixFleet MicroVM host. |
bridge.name | str | "nixfleet-br0" | Bridge interface name for microVM networking. |
bridge.address | str | "10.42.0.1/24" | Bridge IP address with CIDR prefix. |
dhcp.enable | bool | true | Run a dnsmasq DHCP server on the bridge. |
dhcp.range | str | "10.42.0.10,10.42.0.254,1h" | DHCP range in dnsmasq format (start,end,lease-time). |
What the module configures
When enabled, the module:
- Creates a systemd-networkd bridge interface (
bridge.name) with the given IP address. - Enables
net.ipv4.ip_forwardfor NAT. - Configures
networking.natwith the bridge as an internal interface so microVMs can reach the outside. - Optionally starts dnsmasq on the bridge with the configured DHCP range and the bridge IP as the default router.
Impermanence
On impermanent hosts (nixfleet.impermanence.enable = true), the module automatically persists /var/lib/microvms across reboots.
Example
services.nixfleet-microvm-host = {
enable = true;
bridge.address = "10.42.0.1/24";
dhcp.range = "10.42.0.10,10.42.0.100,12h";
};
# Define a microVM using the upstream microvm.nix API
microvm.vms.my-vm = {
config = { ... };
};
Monitoring Options
This module is provided by nixfleet-scopes. It is documented here as part of the NixFleet ecosystem reference.
All options under nixfleet.monitoring.nodeExporter. The module is auto-included by mkHost and disabled by default. Enable with nixfleet.monitoring.nodeExporter.enable = true.
Options
| Option | Type | Default | Description |
|---|---|---|---|
nodeExporter.enable | bool | false | Enable Prometheus node exporter with fleet-tuned defaults. |
nodeExporter.port | port | 9100 | Port for node exporter metrics endpoint. |
nodeExporter.openFirewall | bool | false | Open the node exporter port in the firewall. |
nodeExporter.enabledCollectors | listOf str | (see below) | Collectors to enable. Fleet repos can override. |
nodeExporter.disabledCollectors | listOf str | (see below) | Collectors to disable. |
Default enabled collectors
| Collector | Metrics |
|---|---|
systemd | Systemd unit state and timing |
filesystem | Disk usage per mountpoint |
cpu | CPU utilization |
meminfo | Memory usage |
netdev | Network interface statistics |
diskstats | Disk I/O statistics |
loadavg | System load averages |
pressure | Linux PSI (pressure stall information) |
time | System time and NTP sync status |
Default disabled collectors
| Collector | Reason |
|---|---|
textfile | Requires external file management - opt-in per host |
wifi | Irrelevant on servers |
infiniband | Not used in typical fleets |
nfs | Not used in typical fleets |
zfs | Framework uses btrfs |
Systemd service
The scope delegates to NixOS’s services.prometheus.exporters.node module. The resulting service is prometheus-node-exporter.service.
Example
nixfleet.monitoring.nodeExporter = {
enable = true;
port = 9100;
openFirewall = true; # allow Prometheus scrape from monitoring host
};
To add a collector not in the default set:
nixfleet.monitoring.nodeExporter.enabledCollectors =
config.nixfleet.monitoring.nodeExporter.enabledCollectors ++ ["textfile"];
Fleet repos that use a Prometheus stack typically scrape all hosts on port 9100. Pair with a firewall rule on the monitoring host to restrict access to the scrape network.
Firewall Scope
This module is provided by nixfleet-scopes. It is documented here as part of the NixFleet ecosystem reference.
The firewall scope applies SSH rate limiting, connection drop logging, and the nftables backend to all non-minimal hosts. It has no user-configurable options.
Activation
The scope activates when nixfleet.firewall.enable = true. Roles like server and workstation set this automatically. Minimal roles (endpoint, microvm-guest) leave it disabled by default.
What it provides
nftables backend
Sets networking.nftables.enable = true. This is the forward-compatible choice: Linux 6.17+ drops the ip_tables kernel module. Fleet repos using networking.firewall.extraCommands (iptables syntax) will receive an assertion failure at evaluation time, forcing migration before the kernel forces it.
SSH rate limiting
Adds nftables input rules that accept at most 5 new SSH connections per minute per source IP and drop the rest:
tcp dport 22 ct state new limit rate 5/minute accept
tcp dport 22 ct state new drop
This limits brute-force attempts without blocking legitimate access.
Drop logging
Enables networking.firewall.logRefusedConnections and networking.firewall.logReversePathDrops. Dropped packets appear in the system journal under kernel, making it straightforward to diagnose connectivity issues and detect port scans.
No user-configurable options
The firewall scope is intentionally opinionated. These settings are appropriate for any production NixOS host and require no per-host tuning. Fleet repos needing custom firewall rules add them via standard NixOS options (networking.firewall.extraInputRules, networking.firewall.allowedTCPPorts, etc.) alongside the scope.
Core NixOS Module
Everything configured by _nixos.nix, imported automatically by mkHost for Linux platforms.
Nixpkgs
| Setting | Value |
|---|---|
allowUnfree | true |
allowBroken | false |
allowInsecure | false |
allowUnsupportedSystem | true |
Nix settings
| Setting | Value |
|---|---|
nixPath | [] (mkDefault) |
allowed-users | [<userName>] |
trusted-users | ["@admin"] + <userName> (unless the server role is active) |
substituters | ["https://nix-community.cachix.org" "https://cache.nixos.org"] |
trusted-public-keys | nix-community + cache.nixos.org keys |
auto-optimise-store | true |
experimental-features | nix-command flakes |
gc.automatic | true |
gc.dates | weekly |
gc.options | --delete-older-than 7d |
Boot
| Setting | Value |
|---|---|
loader.systemd-boot.enable | true |
loader.systemd-boot.configurationLimit | 42 |
loader.efi.canTouchEfiVariables | true |
initrd.availableKernelModules | xhci_pci, ahci, nvme, usbhid, usb_storage, sd_mod |
kernelPackages | linuxPackages_latest |
kernelModules | ["uinput"] |
Localization
| Setting | Source |
|---|---|
time.timeZone | hostSpec.timeZone |
i18n.defaultLocale | hostSpec.locale |
console.keyMap | hostSpec.keyboardLayout (mkDefault) |
Networking
| Setting | Value |
|---|---|
hostName | hostSpec.hostName |
useDHCP | false |
networkmanager.enable | true |
firewall.enable | true |
| Interface DHCP | Enabled for hostSpec.networking.interface when set |
Programs
| Program | Setting |
|---|---|
gnupg.agent | Enabled with SSH support |
dconf | Enabled |
git | Enabled |
zsh | Enabled, completion disabled (managed by HM) |
Security
| Setting | Value |
|---|---|
polkit.enable | true |
sudo.enable | true |
| Sudo NOPASSWD | reboot for wheel group |
Users
Primary user (hostSpec.userName)
| Setting | Value |
|---|---|
isNormalUser | true |
extraGroups | wheel + audio, video, docker, git, networkmanager (if groups exist) |
shell | zsh |
openssh.authorizedKeys.keys | hostSpec.sshAuthorizedKeys |
hashedPasswordFile | hostSpec.hashedPasswordFile (when non-null) |
Root
| Setting | Value |
|---|---|
openssh.authorizedKeys.keys | hostSpec.sshAuthorizedKeys |
hashedPasswordFile | hostSpec.rootHashedPasswordFile (when non-null) |
SSH hardening
| Setting | Value |
|---|---|
services.openssh.enable | true |
PermitRootLogin | prohibit-password |
PasswordAuthentication | false |
KbdInteractiveAuthentication | false |
Other services
| Setting | Value |
|---|---|
services.printing.enable | false |
services.xserver.xkb.layout | hostSpec.keyboardLayout (mkDefault) |
hardware.ledger.enable | true |
System packages
gitinetutils
State version
system.stateVersion = "24.11" (mkDefault)
Core Darwin Module
Everything configured by _darwin.nix, imported automatically by mkHost for Darwin platforms.
Nixpkgs
| Setting | Value |
|---|---|
allowUnfree | true |
allowBroken | false |
allowInsecure | false |
allowUnsupportedSystem | true |
Nix settings
| Setting | Value |
|---|---|
nix.enable | false (Determinate installer compatible) |
trusted-users | ["@admin" "<userName>"] |
substituters | ["https://nix-community.cachix.org" "https://cache.nixos.org"] |
trusted-public-keys | nix-community + cache.nixos.org keys |
auto-optimise-store | true |
experimental-features | nix-command flakes |
Programs
| Program | Setting |
|---|---|
zsh | Enabled, completion disabled (managed by HM) |
Users
| Setting | Value |
|---|---|
users.users.<userName>.name | <userName> |
users.users.<userName>.home | hostSpec.home |
users.users.<userName>.isHidden | false |
users.users.<userName>.shell | zsh |
TouchID sudo
| Setting | Value |
|---|---|
security.pam.services.sudo_local.touchIdAuth | true |
| PAM config | pam_reattach.so (ignore_ssh) + pam_tid.so |
TouchID works for sudo in terminal sessions, including through tmux via pam_reattach.
System defaults
NSGlobalDomain
| Key | Value |
|---|---|
AppleShowAllExtensions | true |
ApplePressAndHoldEnabled | false |
KeyRepeat | 2 |
InitialKeyRepeat | 15 |
com.apple.mouse.tapBehavior | 1 |
com.apple.sound.beep.feedback | 0 |
Dock
| Key | Value |
|---|---|
autohide | true |
show-recents | false |
launchanim | true |
orientation | bottom |
tilesize | 48 |
Finder
| Key | Value |
|---|---|
AppleShowAllExtensions | true |
AppleShowAllFiles | true |
ShowPathbar | true |
_FXSortFoldersFirst | true |
_FXShowPosixPathInTitle | false |
Trackpad
| Key | Value |
|---|---|
Clicking | true |
TrackpadThreeFingerDrag | true |
Dock management
The module includes a local.dock option for declarative Dock management using dockutil:
| Option | Type | Default | Description |
|---|---|---|---|
local.dock.enable | bool | true | Enable dock management |
local.dock.entries | listOf submodule | – (readOnly) | Dock entries |
Each entry has:
| Sub-option | Type | Default |
|---|---|---|
path | str | – |
section | str | "apps" |
options | str | "" |
The activation script diffs current Dock state against the declared entries and only resets when they differ.
Other
| Setting | Value |
|---|---|
system.stateVersion | 4 |
system.checks.verifyNixPath | false |
system.primaryUser | <userName> |
hostSpec.isDarwin | true |
Apps
Flake apps provided by NixFleet. Available via nix run .#<app>. VM lifecycle apps (build-vm, start-vm, stop-vm, clean-vm, test-vm) are exported via nixfleet.lib.mkVmApps for fleet repos.
validate
Runs the full validation suite: formatting, eval tests, host builds, and optionally VM tests.
nix run .#validate # format + flake check + eval + hosts (fast)
nix run .#validate -- --rust # + cargo test + clippy + rust package builds
nix run .#validate -- --vm # + every vm-* check (slow)
nix run .#validate -- --all # everything
| Flag | What it adds to the base |
|---|---|
| (none) | format + flake check + eval + hosts only |
--rust | + cargo test + clippy + rust package builds |
--vm | + every vm-* check (dynamically discovered) |
--all | everything |
See Testing Overview for the full check list, duration estimates, and how to drill into specific failures.
build-vm
Install a host into a persistent QEMU disk via nixos-anywhere. Linux and macOS.
nix run .#build-vm -- -h web-02
nix run .#build-vm -- -h web-02 --rebuild
nix run .#build-vm -- --all
Steps:
- Build custom ISO
- Create disk image at
~/.local/share/nixfleet/vms/<HOST>.qcow2 - Boot QEMU from ISO (headless, SSH forwarded)
- Install via nixos-anywhere
- Stop ISO VM
If a disk already exists, the install is skipped unless --rebuild is specified. If a key is found at ~/.keys/id_ed25519 or ~/.ssh/id_ed25519, it is provisioned into the VM for secrets decryption.
Flags
| Flag | Type | Default | Description |
|---|---|---|---|
-h <HOST> | string | – | Host config to install |
--all | bool | – | Install all hosts in nixosConfigurations |
--rebuild | bool | – | Wipe and reinstall existing disk |
--identity-key <PATH> | string | – | Path to identity key for secrets decryption |
--ssh-port <N> | string | auto | Override SSH port (default: auto-assigned by index) |
--ram <MB> | string | 4096 | RAM in MB |
--cpus <N> | string | 2 | CPU count |
--disk-size <S> | string | 20G | Disk size |
start-vm
Start an installed VM. Runs headless by default; use --display for graphical output. Linux and macOS.
nix run .#start-vm -- -h web-02
nix run .#start-vm -- -h web-02 --display gtk --ram 4096
nix run .#start-vm -- --all
Boots from the existing disk created by build-vm. SSH is forwarded to a per-host port (auto-assigned by sorted nixosConfigurations index, base 2201).
When --display is spice or gtk, the VM runs in the foreground (no daemonize). Closing the viewer window stops the VM. SPICE mode provides clipboard sharing via the SPICE agent.
Flags
| Flag | Type | Default | Description |
|---|---|---|---|
-h <HOST> | string | – | Host to start |
--all | bool | – | Start all installed VMs (headless only) |
--ssh-port <N> | string | auto | Override SSH port |
--ram <MB> | string | 1024 | RAM in MB |
--cpus <N> | string | 2 | CPU count |
--display <MODE> | string | none | Display: none (headless), spice (SPICE viewer), gtk (native window) |
stop-vm
Stop a running VM daemon. Linux and macOS.
nix run .#stop-vm -- -h web-02
nix run .#stop-vm -- --all
Sends SIGTERM to the QEMU process and removes the pidfile.
Flags
| Flag | Type | Default | Description |
|---|---|---|---|
-h <HOST> | string | – | Host to stop |
--all | bool | – | Stop all running VMs |
clean-vm
Delete VM disk, pidfile, and port file. Linux and macOS.
nix run .#clean-vm -- -h web-02
nix run .#clean-vm -- --all
Stops the VM if running, then removes <HOST>.qcow2, <HOST>.pid, and <HOST>.port from ~/.local/share/nixfleet/vms/.
Flags
| Flag | Type | Default | Description |
|---|---|---|---|
-h <HOST> | string | – | Host to clean |
--all | bool | – | Clean all VMs |
test-vm
Automated VM test cycle: build ISO, boot, install, reboot, verify, cleanup. Linux and macOS.
nix run .#test-vm -- -h web-02
nix run .#test-vm -- -h edge-01 --keep
Steps
- Build custom ISO
- Create ephemeral disk (20G)
- Boot QEMU from ISO (headless, SSH on port 2299)
- Install via nixos-anywhere
- Reboot from disk
- Verify: hostname,
multi-user.target,sshd
Cleans up temp directory and disk on exit unless --keep is specified.
Flags
| Flag | Type | Default | Description |
|---|---|---|---|
-h <HOST> | string | – | Host config to install |
--keep | bool | false | Keep temp dir and disk after test |
--ssh-port <N> | string | 2299 | Host port for SSH |
--identity-key <PATH> | string | – | Path to identity key for secrets decryption |
--ram <MB> | string | 4096 | RAM in MB |
--cpus <N> | string | 2 | CPU count |
Note: Provisioning real hardware is done via standard NixOS tooling:
nixos-anywhere --flake .#hostname root@ip. See Standard Tools.
Architecture
NixFleet is a fleet management framework providing a declarative API (mkHost), Rust service crates for orchestration, and NixOS modules for host configuration. Companion repos provide infrastructure scopes (nixfleet-scopes) and compliance controls (nixfleet-compliance).
System Overview
Fleet repo (flake.nix)
|
| calls mkHost { hostName, platform, hostSpec, modules }
v
Framework (core + scopes + service modules)
|
| produces
v
nixosSystem / darwinSystem
|
| deploy via
v
nixos-rebuild / nixos-anywhere (standard)
or
Agent <-> Control Plane <-> CLI (orchestrated)
mkHost is a closure over framework inputs (nixpkgs, home-manager, disko, impermanence, microvm). It returns a nixosSystem or darwinSystem based on the platform argument. For the full module injection order, see mkHost API reference.
Module graph
mkHost closure (binds framework inputs) ->
- hostSpec module (identity-only options)
- disko + impermanence NixOS modules
- core/_nixos.nix or core/_darwin.nix
- scopes/nixfleet/_agent.nix (+ _agent_darwin.nix on Darwin)
- scopes/nixfleet/_control-plane.nix, _cache-server.nix, _cache.nix, _microvm-host.nix
- user-provided modules (roles, fleet profiles, hardware)
Scope self-activation
Scopes are plain NixOS/HM modules. They are always imported but only activate when their corresponding enable option is set:
{ config, lib, ... }:
lib.mkIf config.nixfleet.impermanence.enable {
# persistence paths, btrfs subvolume setup, etc.
}
Every host gets every scope module in its module tree, but inactive scopes produce zero config. Roles (from nixfleet-scopes) set the appropriate enable options. Fleet repos follow the same pattern for their own scopes.
Framework inputs via specialArgs
mkHost passes inputs (the framework flake’s inputs) through specialArgs, making them available to all modules as a function argument. Fleet repos that need their own inputs pass them through _module.args or additional specialArgs.
Framework Separation
| Repo | Contents |
|---|---|
| nixfleet | mkHost API, core modules (nix, SSH, identity), service modules (agent, control plane, cache, microvm), Rust crates, eval/VM tests |
| nixfleet-scopes | 17 infrastructure scopes, 4 roles (server, workstation, endpoint, microvm-guest), 6 disk templates |
| nixfleet-compliance | 16 compliance controls, 4 regulatory frameworks (NIS2, DORA, ISO 27001, ANSSI), evidence probes |
| Consumer fleet repos | Host definitions via mkHost, opinionated scopes, hardware configs, secrets wiring |
The framework is generic with no org-specific assumptions. Fleet repos provide opinions. Consumers import scopes via roles or individual scope modules from nixfleet-scopes.
Rust Workspace
Four crates in a Cargo workspace at the repo root:
| Crate | Binary | Purpose |
|---|---|---|
agent/ | nixfleet-agent | State machine daemon on each managed host: poll - fetch - apply - verify - report |
control-plane/ | nixfleet-control-plane | Axum HTTP server with mTLS. Machine registry, rollout orchestration, audit log |
cli/ | nixfleet | Operator CLI: deploy, status, rollback, release, rollout, machines, bootstrap, init |
shared/ | (library) | nixfleet-types - shared data types and API contracts |
Agents poll the control plane for a desired generation, fetch closures, apply, run health checks, and report status. The CLI interacts with the control plane for machine registration, lifecycle management, releases, and rollouts.
Both the agent and control plane ship as NixOS service modules, auto-included by mkHost but disabled by default. Standard nixos-rebuild and nixos-anywhere work without them.
Flake Inputs
| Input | Purpose |
|---|---|
nixpkgs | Package repository (nixos-unstable) |
darwin | nix-darwin macOS system config |
home-manager | User environment management |
flake-parts | Module system for flake outputs |
import-tree | Auto-import directory tree as modules |
disko | Declarative disk partitioning |
impermanence | Ephemeral root filesystem |
nixos-anywhere | Remote NixOS installation via SSH |
nixos-hardware | Hardware-specific optimizations |
lanzaboote | Secure Boot |
treefmt-nix | Multi-language formatting |
microvm | MicroVM support (microvm.nix) |
crane | Rust build system for Cargo workspace |
nixfleet-scopes | Companion: infrastructure scopes, roles, disk templates |
Fleet repos add their own inputs as needed (e.g. agenix or sops-nix for secrets).
Design Decisions
Key architectural decisions are documented in Architecture Decision Records.
Summary of foundational decisions:
- Dendritic import - every
.nixundermodules/is auto-imported via import-tree. No import lists to maintain. - Plain modules - scopes are plain NixOS/HM modules imported by
mkHost. No deferred registration. - Central fleet definition - all hosts in
flake.nix, not scattered across directories. - Single API -
mkHostis the only public constructor. No mkFleet/mkOrg/mkRole layer. - Scope-aware impermanence - persist paths live alongside their program definitions in scopes.
- Mechanism over policy - the framework provides
mkHost; fleets provide opinions.
Testing Overview
nixfleet has four test tiers that together cover configuration, Rust code, Nix module wiring, and full multi-node runtime behaviour. There is exactly one command that runs everything:
nix run .#validate -- --all
That’s it. Use this for CI, for pre-merge, for pre-release, and for “did my change break something far from where I was editing”. When you need a smaller slice for an inner-loop iteration, the flag variants below trade coverage for speed:
| Command | Runs | Typical duration |
|---|---|---|
nix run .#validate | format + nix flake check + eval-* + host builds | ~1 min |
nix run .#validate -- --vm | ^ + every vm-* check (dynamically discovered) | ~20–40 min |
nix run .#validate -- --rust | ^ + cargo test --workspace + cargo clippy --workspace -- -D warnings + nix-build of every Rust package (sandboxed test run) | ~5–8 min |
nix run .#validate -- --all | Everything | ~25–45 min |
What --all actually runs, in order:
- Formatting -
nix fmt --fail-on-change - Flake eval -
nix flake check --no-build(every flake output type-checks) - Eval tests - all
eval-*derivations under.#checks - Host builds - every
nixosConfigurations.<host>.config.system.build.toplevel - VM tests - every
vm-*under.#checks, discovered dynamically - Rust workspace tests -
cargo test --workspacein the dev shell - Rust lints -
cargo clippy --workspace --all-targets -- -D warnings - Rust package builds -
nix build .#packages.<system>.{nixfleet-agent,nixfleet-control-plane,nixfleet-cli}(runscargo testinside the nix sandbox - catches environment-dependent test failures that the dev-shellcargo testmisses)
Inner-loop iteration (drilling down when something fails)
When --all surfaces a failure, you can reproduce the failing tier with a
narrower command. Prefer these only after --all has already failed:
# Single VM scenario
nix build .#checks.x86_64-linux.vm-fleet-apply-failure --no-link
# Single Rust test binary
nix develop --command cargo test -p nixfleet-control-plane --test route_coverage
# Single test function
nix develop --command cargo test -p nixfleet-agent --test run_loop_scenarios \
poll_hint_shortens_next_interval
Tier C - Eval tests (fast, ~seconds)
Pure Nix evaluations. No VMs, no Rust builds. Asserts structural properties
of hostSpec, scope modules, and service wiring. See
Eval Tests for the per-check list.
Tier B - Integration tests (medium)
| Check | Purpose |
|---|---|
integration-mock-client | Simulates a consumer flake importing nixfleet.lib.mkHost. Proves the public API is reachable, produces valid nixosConfigurations, and exposes core modules/scopes. |
Tier A - VM tests (slow, minutes per test)
Real NixOS VMs booted under QEMU with Python test scripts driving assertions.
See VM Tests for the full list and per-scenario semantics,
including the fleet scenario subtests under _vm-fleet-scenarios/.
High-level categories:
- Framework-level VMs (
vm-core,vm-minimal,vm-firewall,vm-monitoring,vm-backup,vm-backup-restic,vm-secrets,vm-cache-server) - each one boots one or two nodes and exercises a single subsystem in isolation. These prove the framework produces bootable configs even when no fleet is enabled. - Fleet-level VMs (
vm-fleetand thevm-fleet-*scenario subtests under_vm-fleet-scenarios/) - exercise multi-node topologies, mTLS, rollout strategies, failure paths, SSH-direct deploys, and the realfetch → apply → verifypipeline (vm-fleet-agent-rebuild).
Rust tests
Every Rust crate has unit tests in-file, plus integration scenarios in
control-plane/tests/*_scenarios.rs and cli/tests/*_scenarios.rs.
See Rust Tests for the full breakdown.
Finding the right test for a symptom
| Symptom | Where to start |
|---|---|
| “Option X isn’t being set correctly” | Eval test for that option |
| “My consumer flake doesn’t build with mkHost” | integration-mock-client |
| “The agent service won’t start on a real VM” | vm-core, vm-fleet-tag-sync |
| “A scope module (firewall, backup, monitoring) is broken” | vm-firewall, vm-backup, vm-monitoring (per scope) |
| “The fetch→apply pipeline isn’t working” | vm-fleet-agent-rebuild |
| “Rollout state machine is wrong” | vm-fleet + Rust failure_scenarios.rs, deploy_scenarios.rs |
| “mTLS / auth / RBAC is wrong” | vm-fleet-mtls-missing, vm-fleet-mtls-cn-mismatch, Rust auth_scenarios.rs |
| “Release CRUD or release push-hook is wrong” | vm-fleet-release, Rust release_scenarios.rs |
| “Bootstrap / admin-key flow is wrong” | vm-fleet-bootstrap |
| “SSH-direct deploy is broken” | vm-fleet-deploy-ssh, vm-fleet-rollback-ssh |
| “Tag sync from agent config isn’t working” | vm-fleet-tag-sync, Rust machine_scenarios.rs |
| “Health check type X fails” | vm-fleet-apply-failure, agent health::* unit tests |
| “Rollout resume doesn’t resume” | vm-fleet-apply-failure, Rust failure_scenarios.rs |
| “Metrics aren’t being emitted” | Rust metrics_scenarios.rs |
| “Audit log is wrong / CSV injection” | Rust audit_scenarios.rs |
Known coverage gaps
- Real
switch-to-configuration: most VM tests run agents withdryRun = trueso the actual apply path is not exercised. The exception isvm-fleet-agent-rebuild, which runs withdryRun = falseand exercises the missing-path guard end-to-end. Production bootstraps cover the happy apply path. - Multi-CP topologies and agenix secret rotation have no tests.
Eval Tests
Eval tests (Tier C in the testing overview) assert configuration properties at Nix evaluation time. They run instantly and catch structural mistakes before anything is built.
For the full test tier map (eval / integration / VM / Rust) see the Testing Overview. This page documents only the eval checks.
How to run
nix flake check --no-build
The --no-build flag skips VM tests so only eval checks execute. Every check is
a pkgs.runCommand that prints PASS: or FAIL: for each assertion and exits
non-zero on the first failure.
Test fleet
Eval tests run against a minimal test fleet defined in modules/fleet.nix. These
hosts exist solely to exercise framework config paths - they are not a real org.
The test fleet is defined in modules/fleet.nix. Key hosts used by eval checks:
| Host | Key config | Purpose |
|---|---|---|
web-01 | workstation role, impermanence enabled | Default web server, impermanent root |
web-02 | workstation role, impermanence enabled | SSH hardening tests |
dev-01 | userName = "alice" | Custom user override |
edge-01 | endpoint role | Minimal edge device |
srv-01 | server role | Production server |
agent-test | agent enabled, tags, health checks | Agent module options |
Additional hosts (secrets-test, infra-test, cache-test, microvm-test, backup-restic-test) exercise other subsystems. All hosts share org-level defaults and use isVm = true.
Current checks
| Check | Host | What it asserts |
|---|---|---|
eval-ssh-hardening | web-02 | PermitRootLogin == "prohibit-password", PasswordAuthentication == false, firewall enabled |
eval-hostspec-defaults | web-01 | userName is non-empty, hostName matches "web-01" |
eval-username-override | web-01, dev-01 | web-01 uses the shared default user; dev-01 overrides it to a different value |
eval-locale-timezone | web-01 | timeZone, defaultLocale, console.keyMap are all non-empty |
eval-ssh-authorized | web-01 | Primary user and root both have at least one SSH authorized key |
eval-password-files | web-01 | hostSpec exposes hashedPasswordFile and rootHashedPasswordFile options |
eval-agent-tags-health | agent-test | Agent systemd service has NIXFLEET_TAGS = "web,production", health-checks.json config file exists |
Adding a new eval test
-
Pick (or add) a test fleet host in
modules/fleet.nixthat exercises the config path you want to verify. -
Add a new check in
modules/tests/eval.nixfollowing this pattern:
eval-my-check = let
cfg = nixosCfg "web-01";
in
mkEvalCheck "my-check" [
{
check = cfg.some.option == expectedValue;
msg = "web-01 some.option should be expectedValue";
}
];
- Run
nix flake check --no-buildto verify the new assertion passes.
The mkEvalCheck helper (from modules/tests/_lib/helpers.nix) takes a check
name and a list of { check : bool; msg : string; } assertions. It produces a
runCommand derivation that prints each result and fails on the first false.
VM Tests
VM tests boot real NixOS virtual machines under QEMU and assert runtime state via Python test scripts run by the nixosTest driver. They verify services start, ports listen, multi-node interactions work end-to-end, and rollout state machines behave as documented.
How to run
The canonical entry point is nix run .#validate -- --all (see
Testing Overview). For VM-only iteration:
nix run .#validate -- --vm
All vm-* checks under .#checks.<system> are discovered dynamically by
the validate script, so new scenarios land in --vm / --all
automatically without touching it.
When --vm surfaces a specific VM failure, drill in:
nix build .#checks.x86_64-linux.vm-fleet-apply-failure --no-link
nix log /nix/store/<hash>-vm-test-run-vm-fleet-apply-failure.drv
nix log retrieves the full driver output (systemctl status, journals,
Python traceback) for a failed or past run.
Requirements
- Platform: x86_64-linux only (nixosTest uses QEMU)
- KVM:
/dev/kvmfor acceptable performance - Disk space: each VM test builds a NixOS closure; expect several GB per test
- Time: minutes per test (closure build + parallel VM boots + assertions)
Test cycle
Each VM test goes through:
- Build - Nix evaluates the nodes’ config and builds each node’s system closure.
- Boot - QEMU launches one or more VMs in parallel; the shared host
/nix/storeis mounted read-only over 9p on every VM. - Assert - a Python test script runs commands via the test driver API
(
machine.succeed(),machine.fail(),machine.wait_for_unit(),machine.wait_until_succeeds(cmd, timeout=N)). - Cleanup - VMs shut down, driver reports pass/fail.
Framework-level VM tests
These test one subsystem in isolation. Most are defined in modules/tests/vm*.nix.
vm-core
Boots a standard framework node (defaultTestSpec, no special flags) and verifies:
multi-user.targetreachedsshdandNetworkManagerrunning- Firewall active (nftables input chain exists)
- Test user exists in the
wheelgroup - Core packages available to the user (
zsh,git)
This is the “does everything still boot” smoke test.
vm-minimal
Boots a node with the endpoint role (minimal scope set) and verifies the minimal profile stays
minimal:
multi-user.targetreached- Core tools still present (
zsh,gitcome fromcore/nixos.nix, not the base scope) - Graphical/dev tools absent (e.g.,
nirinot installed, Docker not running)
vm-infra
One node, four scopes in one VM for speed:
- Firewall - nftables active, SSH rate limiting rules present
(
limit rate 5/minute), drop logging enabled. - Monitoring - node exporter running, port 9100 responds with
Prometheus text,
node_systemdcollector active. - Backup - systemd timer registered, manual trigger writes
status.jsonwith"status": "success". - Secrets - SSH host key generated at
/etc/ssh/ssh_host_ed25519_keywith mode 600.
vm-fleet - “Tier A headline test”
4-node fleet: cp + web-01 + web-02 + db-01, with full mTLS
(build-time CA + CP server cert + per-agent client certs, no
allowInsecure).
- CP bootstraps an admin API key.
- All 3 agents register with tags (web × 2, db × 1).
- Canary rollout on tag
web(strategystaged, batch sizes["1","100%"])- both agents healthy, rollout reaches
completed.
- both agents healthy, rollout reaches
- Health-gate failure rollout on tag
db(strategyall_at_once) - db-01’s health check points athttp://localhost:9999/healthwhich nothing listens on; the rollout hitshealth_timeoutand pauses. - Resume the paused rollout and verify it transitions out of
paused. - Metrics - CP
/metricsexposesnixfleet_fleet_sizeandnixfleet_rollouts_total; agent node exporter on web-01 exposesnode_cpu.
Fleet scenario subtests
Every CLI path, failure mode, and rollout branch has its own
independently buildable VM subtest under
modules/tests/_vm-fleet-scenarios/*.nix. The aggregator
modules/tests/vm-fleet-scenarios.nix exposes each one as
.#checks.<system>.vm-fleet-<name>.
vm-fleet-agent-rebuild
The only VM test in the suite that runs with dryRun = false - it is
the proof that the agent’s real fetch → apply → verify pipeline
works end-to-end. CP tells the agent to deploy a fabricated store path
that does NOT exist anywhere with no cache URL configured; the agent
must log "not found locally and no cache URL configured" and leave
/run/current-system untouched. Indirect fetch-path coverage still
exists (vm-fleet-release for nix copy + harmonia, vm-fleet-bootstrap
for the happy-path report cycle).
vm-fleet-tag-sync
Real agent with tags = ["web" "canary" "eu-west"] in NixOS config. Asserts
tags appear in the CP machine_tags table after the first health report,
that filtering by a declared tag returns the agent, and that undeclared tags
do not leak into the table.
vm-fleet-bootstrap
End-to-end bootstrap flow:
- Start CP with an empty
api_keystable. - Operator runs
nixfleet bootstrap --name test-admin- the CLI returns the first admin API key over mTLS. - Use the returned key to
list machines(empty), wait for two real agents (web-01,web-02) to register,list machinesagain (2 visible). - Create a release via
POST /api/v1/releasespointing at each agent’s real/run/current-systemtoplevel. - POST a rollout targeting
tag=weband wait forstatus=completed. - Negative: a second
nixfleet bootstrapcall must fail (409 Conflict).
vm-fleet-release
Real nixfleet release create --push-to ssh://root@cache exercised against
a harmonia binary cache server:
- Uses the shared
nix-shim(modules/tests/_lib/nix-shim.nix) to interceptnix evalandnix buildon the builder node - returns a canned store path - while delegatingnix copyto the real nix so the binary transfer actually happens. - Cache node runs
services.nixfleet-cache-server(harmonia) with a build-time signing key baked as a/nix/storepath (avoids theCREDENTIALS=243race documented in TODO.md). - Post-push, assert via the VM-local Nix database (
nix-store -q --references) that the path is registered oncacheand NOT oncp. - Agent then fetches from
http://cache:5000and the DB check passes on the agent too.
vm-fleet-deploy-ssh
Real nixfleet deploy --hosts target --ssh --target root@target - no CP
in the topology at all. The CLI calls nix eval (shim) → nix build
(shim) → nix-copy-closure (real) → ssh target switch-to-configuration
(real). A stub switch-to-configuration writes a marker file to /tmp
that the test asserts. Proves --ssh mode truly bypasses the CP.
vm-fleet-apply-failure
Command health check with a sentinel file
(/var/lib/fail-next-health) drives the failure path:
- Sentinel file created before the agent starts → first health report is unhealthy → rollout pauses (F1).
- Assert
current_generationis still the agent’s original toplevel (RB1- the agent did not advance to the failing generation).
- Clear the sentinel, wait for
health_reports.all_passed = 1, callPOST /api/v1/rollouts/{id}/resume, assert the rollout reachescompleted.
This test covers two subtle bugs in the resume path: the rollout
executor must not re-mark a batch unhealthy from stale pre-resume
reports, and the agent’s CommandChecker must use an absolute /bin/sh
so it works under a systemd unit PATH. A regression in either would
make the resume → completed transition hang.
vm-fleet-revert
2-agent staged rollout with on_failure = revert:
- Both agents healthy → first batch
succeeded. - Test then arms the sentinel on both agents so the next batch fails.
- Rollout executor walks
previous_generationson succeeded batches and restores the per-machine desired generation. - Indirectly covers C3 (HealthRunner::run_all actually runs post-deploy) - if the health runner were dead code, the failing report would never arrive and the revert path wouldn’t fire.
vm-fleet-timeout
The agent is configured but its unit’s wantedBy is forced to [] so the
process never starts. CP records the machine in the release but sees zero
reports from it. The batch sits in pending_count > 0 until
health_timeout elapses, at which point evaluate_batch pushes
pending_count into unhealthy_count and marks the batch failed.
Negative control: the reports table is empty for the machine - the pause
reason really is “timeout”, not “agent reported a failure”.
vm-fleet-poll-retry
Agent starts before the CP. First poll hits a closed port (connection
refused). The agent’s main loop schedules a retry at retryInterval = 5s.
Then the CP starts, and the agent’s next retry succeeds. Asserts the
agent journal contains the retry-scheduling log line, then waits for
registration.
vm-fleet-mtls-missing
Pure transport-layer test. CP has tls.clientCa set. A client with the CA
cert (can verify server) but no client key pair sends curl against
/health and /api/v1/machines/{id}/report:
- Without
--cert→ handshake failure at the TLS layer (asserted by grepping the curl verbose output for any of a set of TLS markers:alert,handshake,certificate required,SSL_ERROR, etc.). - Positive control with a valid client cert → HTTP response comes back (any status - what matters is the handshake completed).
vm-fleet-mtls-cn-mismatch
Application-layer test on top of mTLS. A client with a valid fleet-CA-signed cert (CN = wrong-agent) hits another agent’s endpoints (/api/v1/machines/web-01/...). The cn_matches_path_machine_id middleware rejects with 403 because the cert CN does not match the {id} path segment. Closes the impersonation gap: CA proves fleet membership, CN proves specific agent identity.
vm-fleet-rollback-ssh
Real nixfleet rollback --host target --ssh --generation <G1> end-to-end:
- Deploy stub
G2vianixfleet deploy --ssh→ target writesactive=g2marker file. - Pre-copy
G1to target vianix-copy-closure(rollback handler does NOT copy, it only SSHes and runs<gen>/bin/switch-to-configuration). - Run
nixfleet rollback --host target --ssh --generation <G1>→ target writesactive=g1marker. - Assert both G1 and G2 are still registered in target’s Nix DB (rollback did not delete the forward generation).
Shared VM test helpers
All scenario tests use helpers from modules/tests/_lib/helpers.nix
(via modules/tests/vm-fleet-scenarios.nix which pre-binds them):
-
mkCpNode { testCerts, ... }- a CP node with standard mTLS wiring (CA + server cert,services.nixfleet-control-planewithclientCa),sqliteandpython3pre-installed. -
mkAgentNode { testCerts, hostName, tags, healthChecks, ... }- an agent node with standard TLS, fleet CA trust,services.nixfleet-agentwith pre-wiredmachineId/tags/dryRun. Escape hatchagentExtraConfig(merged vialib.recursiveUpdateintoservices.nixfleet-agent) handles per-scenario overrides likeretryIntervalorallowInsecure. -
tlsCertsModule { testCerts, certPrefix }- a NixOS module fragment wiring the fleet CA plus a named client cert under/etc/nixfleet-tls/, for operator / builder / cache-style nodes that need TLS certs but aren’t a CP or an agent. -
testPrelude { certPrefix ? "cp", api ? "https://localhost:8080" }- returns a Python prelude string withTEST_KEY,KEY_HASH,AUTH,CURL,APIconstants and aseed_admin_key(node)helper. Interpolate at the top of everytestScript:testScript = '' ${testPrelude {}} cp.start() cp.wait_for_unit("nixfleet-control-plane.service") cp.wait_for_open_port(8080) seed_admin_key(cp) ... ''; -
mkTlsCerts { hostnames }(from_lib/helpers.nix) - builds the fleet CA + per-host cert pairs at Nix-eval time. Deterministic, no runtime setup. -
nix-shim(from_lib/nix-shim.nix) - awriteShellApplicationthat interceptsnix eval/nix buildwith canned responses while delegatingnix copyand other subcommands to the real nix at an immutable${pkgs.nix}/bin/nixpath. The absolute path is deliberate: installing the shim intosystemPackageswould collide with the real nix at/run/current-system/sw/bin/nix, and if the shim won the collision its fall-through branches would infinitely exec themselves. See the nixosTest gotchas section below.
nixosTest gotchas worth knowing
A few behaviours of the nixosTest framework itself that have bitten scenarios in this suite:
- Shared
/nix/storevia 9p: every VM sees the host store read-only via 9p mount. Any store path referenced anywhere in the test evaluation is visible as a file on every node regardless of whether it was ever copied there.test -e <storepath>assertions are therefore invariant. The workaround is to check the VM-local Nix database (nix-store -q --references <path>) which is per-VM. - systemd PATH for services: services like
nixfleet-agentdo not get/run/current-system/sw/binin theirPATHby default, soCommand::new("sh")(relative lookup) fails with ENOENT. Use absolute paths like/bin/sh. nixshim collisions: adding a shim package named"nix"toenvironment.systemPackagescauses a silent collision with the realnixin/run/current-system/sw/bin/nix. The workaround is to keep the shim only onsessionVariables.PATH(which still pulls it into the closure via string interpolation) and never insystemPackages.wait_for_unitvswait_until_succeeds("systemctl is-active"): a systemd unit stuck in theactivatingstate forever (e.g., due to aLoadCredential=failure) blockswait_for_unitwith no useful error.wait_until_succeeds(..., timeout=120)wrapped in atry/exceptthat dumpssystemctl status+ the unit journal gives you an informative failure instead of an opaque hang.
Adding a new VM test
- Create
modules/tests/_vm-fleet-scenarios/<name>.nixfollowing thevm-fleet-tag-sync.nixtemplate. - Accept
mkCpNode,mkAgentNode,mkTlsCerts,testPrelude, andtlsCertsModuleviascenarioArgs(andpkgs,lib, etc. as needed with...). - Register the subtest in
modules/tests/vm-fleet-scenarios.nix. - Add the check name to the
vm-fleet-*section in the project README (automatic discovery means no script edit is needed).
For non-fleet VM tests (single-subsystem things like vm-core / vm-infra)
follow the pattern in modules/tests/vm.nix - use mkTestNode directly.
Shared /nix/store and the assertion classes it forbids (WONTFIX)
Every node in a nixosTest mounts the host’s /nix/store read-only via 9p.
This means store-path existence checks (test -e /nix/store/...) are
tautologically true on every node regardless of which node’s closure
references the path. A nix copy between nodes appears to succeed even
when it transferred zero bytes, because the receiver could already see
the path via 9p.
The suite uses two workaround patterns instead of the heavy-weight per-VM store-image approach:
| Need | Workaround | Why it works |
|---|---|---|
| Prove a command ran on a specific node | VM-local marker file under /tmp | /tmp is per-VM, never shared via 9p |
| Prove a path is registered in a node’s Nix DB | nix-store -q --references <path> on the target | The Nix DB (/nix/var/nix/db) is per-VM, only the store files are shared |
Concrete examples in the suite:
vm-fleet-deploy-sshusesnix-store -q --referencesto provenix-copy-closure --toactually registered the stub closure in the target’s Nix DB. The 9p-mounted store would make atest -echeck invariant.vm-fleet-rollback-sshuses the same pattern for the per-generation rollback assertion.vm-fleet-apply-failureuses/tmp/stub-switch-called(a regular filesystem path, VM-local) as the load-bearing proof thatswitch-to-configuration switchwas invoked.
Why not per-VM store images
The alternative - virtualisation.useNixStoreImage = true; virtualisation.mountHostNixStore = false; - was considered and rejected:
every node would rebuild its own store image, multiplying VM build cost
for an assertion class that the workarounds already cover. No scenario
in the current suite needs per-VM store isolation.
If a future scenario genuinely requires it (e.g. asserting on byte-level
transfer through nix copy rather than DB registration), revisit this
decision in a follow-up. Do not adopt per-VM store images preemptively
- they cost real wall-clock minutes per CI run.
Rust Tests
The Rust side of nixfleet lives in three crates:
| Crate | Path | Role |
|---|---|---|
nixfleet-control-plane | control-plane/ | Axum HTTP server, SQLite state, rollout executor, release registry, auth/audit, metrics |
nixfleet-agent | agent/ | Polling daemon, health check runners, store/TLS |
nixfleet-types | shared/ | Wire types shared by the CLI, agent, and CP |
Plus the CLI at cli/ (nixfleet-cli) which has its own integration tests.
How to run
The canonical entry point is nix run .#validate -- --all (see
Testing Overview). For faster Rust-only iteration:
nix run .#validate -- --rust
That runs cargo test --workspace + cargo clippy --workspace --all-targets -- -D warnings + nix build of every Rust package (the
sandboxed test run), in order. Use this over raw cargo test so clippy
and the sandbox-build check stay in the loop.
When you need to drill into a specific failure after --rust has
already surfaced it:
nix develop --command cargo test -p nixfleet-control-plane --test route_coverage
nix develop --command cargo test -p nixfleet-cli --test subcommand_coverage
nix develop --command cargo test -p nixfleet-agent --test run_loop_scenarios \
poll_hint_shortens_next_interval
The Rust toolchain (cargo, rustc, clippy, rustfmt,
rust-analyzer) is pinned in the dev shell.
Unit tests (in-file #[cfg(test)] mod tests)
Each Rust module has its own unit tests exercising pure logic without HTTP / DB / filesystem / network.
nixfleet-control-plane
| Module | Tested logic |
|---|---|
auth.rs | API key SHA-256 hashing, role matrix (admin/deploy/readonly), bearer token parsing, role check predicates |
db.rs | Every persistence method: register machine, insert report, generations table, releases + release entries, rollout batches, lifecycle filter, tag join (machine_tags), get_recent_reports deterministic tiebreaker, migrations idempotency |
metrics.rs | Counter/gauge updates, Prometheus text rendering |
state.rs | FleetState hydration from DB on startup, in-memory machine inventory, poll_hint propagation |
tls.rs | Server/client cert loading, rustls ServerConfig / ClientConfig builder |
rollout/batch.rs | Batch building from strategy (all_at_once, canary, staged), batch_sizes parsing (absolute N and percent), randomization determinism |
rollout/executor.rs | parse_threshold (absolute + percent), tick_for_tests doc-hidden shim for deterministic single-tick advancement |
nixfleet-agent
| Module | Tested logic |
|---|---|
comms.rs | Report payload serialization, HTTP client builder with mTLS |
config.rs | Default config (e.g., dry_run = false, tags = []), CLI arg parsing |
nix.rs | /run/current-system symlink resolution, store-path parsing, generation hash extraction |
store.rs | SQLite state DB: get/set current_generation, log_check, log_error, cleanup |
tls.rs | Client cert/key loading, fleet CA trust |
nixfleet-types
| Module | Tested logic |
|---|---|
lib.rs | Serde round-trips for all wire types |
health.rs | HealthReport + HealthCheckResult serialization |
release.rs | Release / ReleaseEntry serde |
rollout.rs | RolloutStatus, RolloutStrategy, OnFailure enum serde |
Integration tests (scenario files)
Integration tests live in control-plane/tests/*_scenarios.rs,
control-plane/tests/route_coverage.rs, and cli/tests/*_scenarios.rs.
Every file is an independent test binary - cargo test spawns one
binary per file.
Shared harness
Every scenario file imports control-plane/tests/harness.rs via a
#[path = "harness.rs"] mod harness; sibling include. The harness
provides:
| Helper | Purpose |
|---|---|
spawn_cp() / spawn_cp_at(path) | Boot an in-process CP bound to a temp directory with pre-seeded admin / deploy / readonly API keys. Returns a Cp handle with .db, .fleet, .admin, .base, .db_path. |
spawn_cp_with_rollout(store_path) | Canonical “1 machine, 1 release, 1 all-at-once rollout, zero-tolerance, pause-on-failure” fixture. Returns (cp, release_id, rollout_id). |
register_machine(cp, id, tags) | Register a machine directly via DB + fleet state (bypasses HTTP for setup speed). |
create_release(cp, entries) | POST /api/v1/releases; returns the release id. |
create_rollout_for_tag(cp, release_id, tag, strategy, batch_sizes, threshold, on_failure, health_timeout) | POST /api/v1/rollouts; returns the rollout id. |
fake_agent_report(cp, machine_id, generation, success, message, tags) | POST /api/v1/machines/{id}/report as an agent. |
agent_reports_health(cp, machine_id, store_path, healthy) | Paired helper that emits both a fake_agent_report and an insert_health_report - the executor’s generation gate and batch health gate read different tables, so almost every failure / recovery scenario needs both together. |
assert_status(builder, expected) | One-line replacement for the let resp = ...; .send().await; assert_eq!(resp.status(), N) triple used across route_coverage.rs. |
tick_once(cp) | Drive a single executor tick deterministically via executor::test_support::tick_for_tests. Replaces the production 2s tokio::time::interval. |
wait_rollout_status(cp, rollout_id, want, within) | Poll GET /rollouts/{id} until status matches or the deadline elapses. |
Constants: TEST_API_KEY, TEST_DEPLOY_KEY, TEST_READONLY_KEY are
the three pre-seeded role keys.
Scenario files - control-plane
| File | Covers |
|---|---|
release_scenarios.rs | R3 push-hook invocation, R4 release list pagination, R5 referenced release delete → 409, R6 orphan release delete → 204. |
deploy_scenarios.rs | D2 canary strategy happy path, D3 staged strategy happy path. |
failure_scenarios.rs | Generation-gate filters stale-gen reports, failure_threshold = "30%" pauses on 4 of 10, resume does not re-flip on a stale pre-resume report, Paused → Cancelled via operator cancel. |
hydration_scenarios.rs | CP restart mid-rollout resumes from DB (ADR 010) - cp1 stages a rollout, cp2 hydrates from the shared SQLite file and drives it to completion, proving FleetState is re-queried per tick. |
rollback_scenarios.rs | Rollback via CP API: redeploy an old release as a forward rollback; original forward rollout stays Completed (history preserved). |
polling_scenarios.rs | poll_hint = 5 present when a machine is in an active rollout, absent when idle. |
machine_scenarios.rs | M1 lifecycle filter (decommissioned excluded from rollout targets), M2 tag propagation via health reports, M3 direct desired-gen ↔ report cycle, M4 success=false → system_state=error, M5 multi-machine desired-gen isolation, M6 Pending → Active auto-transition, M7 Active ↔ Maintenance round trip. |
auth_scenarios.rs | Bootstrap 409 after first key, anonymous admin route → 401, public /health stays open, readonly/deploy role enforcement on POST /rollouts and READ_ONLY on GET /releases+/rollouts, bearer-token shape errors (invalid token / missing Bearer prefix → 401). |
audit_scenarios.rs | Audit log writes for every mutating route + CSV-injection escaping for untrusted detail fields. |
metrics_scenarios.rs | /metrics exposes every CP-side metric after a real rollout cycle, and the HTTP middleware counter increments per normalized path. |
cn_validation_scenarios.rs | mTLS CN validation middleware: no extension / empty extension / matching CN / mismatched CN (defense in depth above the CA boundary). |
route_coverage.rs | Happy + error + auth coverage for every admin route, grouped by family via section headers (machines / rollouts / releases / audit+bootstrap+public). ~50 tests. |
migrations_scenarios.rs | Fresh DB schema shape, refinery_schema_history exists, idempotent on second migrate, every expected table is queryable. |
Scenario files - cli
| File | Covers |
|---|---|
release_hook_scenarios.rs | release create --push-hook "..." expands {} to the store path and runs the hook under sh -c. |
rollback_cli_scenarios.rs | nixfleet rollback --host <h> --generation <g> constructs the right SSH invocation. |
config_scenarios.rs | CLI/credentials/file precedence + env-var precedence (NIXFLEET_* overrides credentials, loses to CLI flags) + HOSTNAME fallback path. |
subcommand_coverage.rs | Direct CLI test for every leaf subcommand (init, bootstrap, status, host add, machines list/register, rollout list/status/cancel, release list/show/diff). |
release_delete_scenarios.rs | nixfleet release delete CLI dispatch (204 → exit 0, 409 → exit 1, 404 → exit 1). |
Tests deliberately NOT in Rust
- Everything that needs a real systemd unit (
nixfleet-agent.service,harmonia.service,sshd) - those are VM tests. - Anything that needs a real
/run/current-systemsymlink to resolve - the agent’snix::current_generation()returns anunwrap_or_default()at the call site, so the path is testable in VMs only. - End-to-end CLI + real nix builds - those are VM tests
(
vm-fleet-release,vm-fleet-deploy-ssh,vm-fleet-rollback-ssh).
Known gaps
New gaps surfacing during operation should be added here and tracked
in TODO.md.
Coverage measurement
NixFleet measures Rust coverage with cargo llvm-cov on demand. We
deliberately do not record a one-shot baseline snapshot - an orphaned
number from a single point in time is theater without a concrete
change to compare against.
The useful measurement is “coverage delta for the code you just touched”, not “total workspace coverage at an arbitrary date.”
When to run
- Before merging a non-trivial Rust change, to confirm the new code is covered by at least one test path.
- Before a release, to spot-check any module whose coverage has drifted.
- When investigating a regression, to see whether the failing path had test coverage prior to the break.
How to run
cargo install cargo-llvm-cov # once per toolchain
cargo llvm-cov --workspace --html
# Open target/llvm-cov/html/index.html for the per-crate breakdown.
# Or on a specific crate / test target:
cargo llvm-cov --package nixfleet-control-plane --html
cargo llvm-cov --package nixfleet-agent --test run_loop_scenarios --html
# Diff the branch under review against main:
cargo llvm-cov --workspace --summary-only > /tmp/branch.txt
git checkout main
cargo llvm-cov --workspace --summary-only > /tmp/main.txt
diff /tmp/main.txt /tmp/branch.txt
The html output is the primary operator experience. --summary-only
produces a text table suitable for piping into diff tools.
What’s not here
There is no persistent coverage percentage in this document - a static snapshot has no downstream consumer. If a future change wants to establish a persistent baseline (e.g. as a CI regression gate), the tooling above is ready.
Adding a new Rust scenario
-
Create
control-plane/tests/<domain>_scenarios.rsorcli/tests/<domain>_scenarios.rs. -
Add the harness sibling include at the top:
#![allow(unused)] fn main() { #[path = "harness.rs"] mod harness; use harness::*; } -
Write
#[tokio::test]functions. Use thespawn_cp/register_machine/create_release/tick_oncehelpers so your scenario doesn’t fight the executor’s wall-clock interval. -
Run
cargo test -p nixfleet-control-plane --test <file>to iterate. -
If the scenario uncovers a product bug, fix the bug rather than adapting the test around it. See the test-vs-component debugging rule: when a test fails, first determine whether the test or the tested component needs fixing, before choosing a fix. Prefer root-cause fixes.