Join a node¶

Enrol a hypervisor node into the cluster. The flow is two-sided: an admin mints a short-lived join token on the control plane, then the operator runs the agent's one-shot bootstrap on the node host to redeem it. Redemption signs the node's mTLS identity (no manual cert distribution), after which otherix-agent serve brings the agent to full runtime and the node appears in otherix node list.

1. Mint a join token¶

Join tokens are admin-only and managed through the control-plane CLI (otherix node join-token ..., the /v1/nodes/join-tokens surface).

otherix node join-token create --node-name node-1 --ttl 10m

Token created (id: 9f1c...):

  otx_join_AbC123...

CA fingerprint:

  sha256:1a2b3c...

Save BOTH NOW - server stores only hash, plaintext cannot be retrieved.
Pass to agent via OTHERIX_BOOTSTRAP__TOKEN + OTHERIX_BOOTSTRAP__CA_FINGERPRINT (koanf double-underscore nesting).
Expires: 2026-06-06T12:10:00Z (TTL: 10m0s).
Max uses: 1.
Bound to node-name: node-1 (single-use).

The token plaintext and the cluster CA fingerprint are printed exactly once - the server stores only sha256(token). Save both: the agent needs the token to redeem and the fingerprint to pin the CP's TLS cert (trust-on-first-use).

Flags¶

Flag	Default	Notes
`--ttl`	`1h`	Validity, clamped to `[1m, 24h]`.
`--max-uses`	`0`	Consumption cap. `0` (or omitted) defaults to single-use (server default of 1); multi-use is an explicit `--max-uses N` opt-in.
`--node-name`	(unset)	Bind the token to a specific node identity. Forces single-use (`--max-uses` is set to 1 server-side).
`--output`	`text`	`text` or `json`.

Two minting modes:

Pre-bound, single-use - pass --node-name. The token can only ever provision that one node identity. Defense-in-depth for known hosts.
Fleet bootstrap - omit --node-name. Pass --max-uses N for a strict multi-redemption cap. Omitting --max-uses (or leaving it at 0) defaults to single-use (server default of 1) - a truly unlimited token cannot be minted from the CLI/API; unbounded redemptions survive only on legacy rows. Each agent supplies its own --node-name at bootstrap time.

Manage tokens¶

# Cursor-paginated list. Expired tokens are hidden unless --include-expired.
otherix node join-token list
otherix node join-token list --include-expired --limit 50

# Revoke an unconsumed token by id (idempotent; already-expired returns 409).
otherix node join-token revoke <token-id>

# Audit trail: which agents consumed a token, when, from which source IP.
otherix node join-token consumptions <token-id>

2. Bootstrap the agent on the node host¶

Run otherix-agent bootstrap on the node itself. It executes the join protocol against the CP and writes the issued cert material plus a generated agent.yaml to disk.

otherix-agent bootstrap \
  --token otx_join_AbC123... \
  --ca-fingerprint sha256:1a2b3c... \
  --cp-url https://cp.example:8443 \
  --node-name node-1 \
  --advertised-endpoint https://10.0.0.21:9443 \
  --migration-host 10.0.0.21

The token can come from three mutually exclusive sources (exactly one required):

--token <plaintext> - the literal value.
--token-path <file> - read from a file (whitespace-trimmed).
--token-env <NAME> - read from the named env var at invocation.

Flags¶

Flag	Default	Notes
`--token` / `--token-path` / `--token-env`	-	Token source (exactly one).
`--ca-fingerprint`	-	Cluster CA `sha256` fingerprint (`sha256:<hex>` or bare hex). Required.
`--cp-url`	-	Control-plane base URL (`https://...`). Required.
`--node-name`	-	Cluster-unique node name. Required.
`--advertised-endpoint`	-	HTTPS URL the CP uses to reach this agent. Required.
`--migration-host`	-	Host/IP advertised for live-migration ingress. Required.
`--migration-port-range-start`	`49152`	Migration port range lower bound.
`--migration-port-range-end`	`49251`	Migration port range upper bound.
`--cert-dir`	`/var/lib/otherix/certs`	Destination for `agent.key` / `agent.crt` / `ca.crt`.
`--config-path`	`/etc/otherix/agent.yaml`	Destination for the generated config.
`--listen`	`0.0.0.0:9443`	Agent HTTPS bind address baked into the generated config.
`--heartbeat-interval`	`30s`	Heartbeat cadence baked into the generated config.
`--request-timeout`	`30s`	Per-HTTP-request timeout against the CP.
`--force`	`false`	Re-issue cert material on an already-bootstrapped host.

The architecture is auto-detected from the host (runtime.GOARCH); operators do not supply it.

What bootstrap does¶

Fetches /v1/ca with TLS verification skipped, then pins the returned cert against --ca-fingerprint (trust-on-first-use). A mismatch aborts - an operator typo or an active MITM.
Generates an ECDSA P-384 keypair and a CSR.
Redeems the token at POST /v1/nodes/join. The CP signs the CSR with the cluster CA and returns the leaf cert plus the CA bundle.
Re-verifies the response chains to the pinned CA, then writes agent.key (0600), agent.crt and ca.crt (0644) atomically, and agent.yaml (only if absent).

The token is consumed on redemption

The CP consumes the token once POST /v1/nodes/join returns, even if the agent never observes the response. A failed bootstrap after that point needs a fresh token - re-run step 1.

Idempotency¶

bootstrap is safe to re-run:

All three cert files present and --force not set: prints "already bootstrapped" and exits 0.
A partial cert state (some files present, not all) without --force: refuses - delete the orphans or re-run with --force.
--force re-issues cert material but never overwrites an existing agent.yaml. Operator-tuned config (logger, storage paths, endpoints) always survives. To regenerate the config, delete it first.

3. Start the agent¶

otherix-agent serve

serve (also the default when the binary is run with no subcommand) boots in State A: every 5 seconds it checks whether the config and all three cert files exist and parse. Once they do it transitions one-way to State B - the full runtime (HTTPS mTLS server + heartbeat sender). If you ran bootstrap while serve was already running, no restart is needed; the next poll picks up the new files.

The transition is one-way per process lifetime: cert material lost mid-run is a heartbeat failure, not a regression to State A. Restart the process to recover. Override the config path with --config <path> (default /etc/otherix/agent.yaml).

4. Verify¶

The node row arrives in pending and flips to ready after the first heartbeat lands.

otherix node list

NAME    ARCH   STATUS  CORDONED  AGE
node-1  arm64  ready   no        20s

otherix node get node-1

node list and node get are read-only for every authenticated role. admin / operator callers see the full projection (migration capability, hardware inventory from heartbeat); developer / viewer callers see the reduced NodeSummary shape. The positional for node get is a node name - UUID literals are rejected with 400 validation_failed. Both accept --output json; --show-ids surfaces the UUIDs the table hides. node list filters with --architecture and --status.

Day-2: cordon, uncordon, delete¶

Node creation, cordon / uncordon, and delete are not CLI commands - they are control-plane API operations:

POST /v1/nodes/{id}/cordon and POST /v1/nodes/{id}/uncordon - sync, idempotent. Cordoning excludes a node from VM placement without evicting running VMs.
DELETE /v1/nodes/{id} (admin only) - refuses with 409 conflict when the node hosts VMs or active migrations unless called with ?force=true.

Maintenance is the cordoned state in the node status, not a separate flag.

Join a node¶

1. Mint a join token¶

Flags¶

Manage tokens¶

2. Bootstrap the agent on the node host¶

Flags¶

What bootstrap does¶

Idempotency¶

3. Start the agent¶

4. Verify¶

Day-2: cordon, uncordon, delete¶

See also¶