Hypervisor + VM layer for the home cluster
  • Nix 93.3%
  • HCL 4.6%
  • Makefile 2.1%
Find a file
Tamjid Rahman 13e3982fe3 template: bake virtio modules into cluster-VM initrd
nixos-generators' "proxmox" format doesn't include virtio_pci /
virtio_blk / virtio_scsi in the initrd's available modules. The VMs
booted yesterday because the running kernel had loaded them after
stage 1 — but after the 2026-06-11 power outage, every cluster VM
came back up unable to find /dev/vda and hung in stage 1 emergency
mode (root account locked, so unrecoverable from the console).

Declaring boot.initrd.availableKernelModules here ensures the modules
are on every future template build and any nixos-rebuild that touches
proxmox-base.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-12 23:12:50 -04:00
template template: bake virtio modules into cluster-VM initrd 2026-06-12 23:12:50 -04:00
terraform bring repo to current cluster state: postgres + observability + TrueNAS 2026-06-06 23:26:28 -04:00
.gitignore gitignore: untrack template/terraform.tfstate 2026-06-06 23:27:19 -04:00
inventory.yaml bring repo to current cluster state: postgres + observability + TrueNAS 2026-06-06 23:26:28 -04:00
README.md bring repo to current cluster state: postgres + observability + TrueNAS 2026-06-06 23:26:28 -04:00

basha_infra

Hypervisor + VM layer for the home cluster. Owns machines; service repos own workloads. See ~/src/CLAUDE.md for the broader picture.

Layout

basha_infra/
├── inventory.yaml          # source of truth: Proxmox hosts, VMs, roles, host_volumes
├── template/               # NixOS Proxmox template builder
│   ├── flake.nix
│   ├── modules/
│   │   ├── proxmox-base.nix     # bootloader, networking, ssh — used by every VM
│   │   └── cluster-client.nix   # consul + nomad + docker + tailscale (client mode)
│   ├── hosts/
│   │   └── apps.nix             # per-VM nix config (hostname, meta.role)
│   └── Makefile                 # `make template` builds + uploads + registers
└── terraform/              # VM lifecycle
    ├── main.tf                  # bpg/proxmox: clones template per inventory.yaml
    ├── variables.tf
    └── outputs.tf

The control plane (Consul server, Nomad server, Vault) lives on the home NixOS box (~/src/nixos/), not here. VMs only run the client halves.

How it fits together

   template/             →  qmrestore → Proxmox template VMID 9001
                                              │
                                              │ qm clone
                                              ▼
   terraform/            →  cloned VMs (apps, postgres, …) per inventory.yaml
                                              │
                                              │ nixos-rebuild --target-host
                                              ▼
   template/hosts/<name>.nix lays the per-host config on top of the template

The template carries the cluster-client modules baked in, so a freshly cloned VM joins Consul + Nomad on first boot. Per-host overrides (hostname, meta.role, future host_volume mounts) are applied later via nixos-rebuild --target-host.

Bootstrap (first-time on a fresh Proxmox)

Prerequisites (one-off on Proxmox, see pveum/pvesm):

  • terraform@pve user with an API token + Administrator on /
  • local datastore has import and snippets content types enabled
  • Your laptop's SSH key in root@<proxmox>:~/.ssh/authorized_keys
# 1. Build + register the template (VMID 9001)
cd template && make template

# 2. Fill in terraform.tfvars
cd ../terraform
cp terraform.tfvars.example terraform.tfvars
$EDITOR terraform.tfvars

# 3. Apply — clones a VM per inventory.yaml entry
terraform init
terraform apply

# 4. After first boot, SSH in via LAN IP (guest agent reports it),
#    run `tailscale up --authkey=...` once. Then from the laptop:
nixos-rebuild switch --flake ../template#apps --target-host root@apps

Add a new VM

  1. Add an entry under vms: in inventory.yaml.
  2. Add template/hosts/<name>.nix and a nixosConfigurations.<name> entry in template/flake.nix.
  3. cd terraform && terraform apply clones the VM.
  4. nixos-rebuild switch --flake template#<name> --target-host root@<lan-ip>.

Update an existing VM

Edit template/modules/cluster-client.nix or template/hosts/<name>.nix, then redeploy without recreating the VM:

nixos-rebuild switch --flake ./template#<name> --target-host root@<name>

If the change should also affect future clones, rebuild the template too: cd template && make template. (Existing VMs are not auto-recreated.)

Remove a VM

  1. nomad node drain -enable -force <node-id> — evacuate jobs.
  2. Delete the entry from inventory.yaml.
  3. cd terraform && terraform apply — destroys the VM.

TODOs

  • Move terraform state to Garage S3 backend (see ~/src/CLAUDE.md).
  • Add postgres / garage / vault / forgejo hosts as stateful services migrate off the home box.
  • Thread tailscale auth via nixos-rebuild so the one-time manual tailscale up after clone isn't needed.