Internal operator runbook for onboarding a brand-new customer EC2 into the
M8trx brain telemetry fleet end-to-end. Single page; the per-phase docs
linked from § References are the source-of-truth for details.
Audience: internal M8trx operator. Assumes AWS console + Terraform
familiarity, brain EC2 access, and Tailscale admin rights. Onboard only —
customer offboard / decommission is out of scope (see § What this runbook
deliberately does not cover).
Before any customer onboard, all of these must already exist. Each
checklist line has a "verify by" hint. If something is missing, set it up
before continuing — don't try to do it lazily during a per-customer onboard.
Brain server up on Tailscale.
Verify: curl -s http://<brain-tailscale-ip>:8080/v1/healthz returns
{"ok":true,...}.
Source: server/ in this repo;
design at docs/superpowers/specs/2026-04-29-brain-design.md.
Tailscale tailnet ACL set up with per-customer isolation.
The ACL needs the M8trx infrastructure tags
(tag:m8trx-brain, tag:m8trx-team) plus an ACL rule template
for adding per-customer tags. Default-deny means
cross-customer traffic is blocked; intra-customer traffic is
allowed by per-customer rules. Brain reachable from
tag:m8trx-cust-*; team reachable to all customers.
Source-of-truth template at
agent-artifacts/cloud-init/README.md § Tailscale ACL.
Verify: in the Tailscale admin console, the ACL JSON has the
tagOwners + acls shape from that template.
Tailnet Lock enabled (Tailscale → Settings → Tailnet Lock).
Without this, a stolen customer auth key from SSM lets an
attacker onboard hostile devices. With it, new devices stay
offline until an admin signs them.
Fleet-wide SSM param set in the AWS region you'll launch
customer EC2s in:
/m8trx/brain-url (String) = the brain Tailscale URL,http://brain.tailnet.ts.net:8080.bash aws ssm put-parameter --name /m8trx/brain-url \ --type String --value "http://brain.tailnet.ts.net:8080"Fleet IAM applied (one-time per AWS region). Apply the
m8trx-fleet
Terraform module:
hcl module "m8trx_fleet" { source = "github.com/M8trxInfra/M8trx-Brain//agent-artifacts/cloud-init/terraform/m8trx-fleet?ref=main" brain_url = "http://brain.tailnet.ts.net:8080" }
Creates the IAM role + policy + instance profile that every
customer-agent EC2 attaches. Optionally manages the
/m8trx/brain-url SSM param. Outputs
iam_instance_profile_name for the per-agent module to consume.
Run these in order for each new customer. The customer ID must match
brain's mint-key.js validation regex /^cust_[a-z0-9_]+$/ —
e.g. cust_acme, cust_bigco. Once these steps are done, all
future agents launched for this customer auto-connect with zero
additional configuration — the cloud-init bootstrap reads the
customer's tag and resolves all per-customer secrets from SSM by
that ID.
tagOwners entry: "tag:m8trx-cust-<id_without_cust_>": ["autogroup:admin"]cust_acme, that's tag:m8trx-cust-acme.acls rule:{ "action": "accept", "src": ["tag:m8trx-cust-<id>"], "dst": ["tag:m8trx-cust-<id>:*"] }tag:m8trx-cust-* wildcards from § Prerequisites.tag:m8trx-cust-<id_without_cust_> (the tag you addedtskey-auth-... value, then store it in SSM in thebash aws ssm put-parameter --name /m8trx/cust_<id>/tailscale-auth-key \ --type SecureString --value "tskey-auth-..."bash KEY=$(docker compose -f /home/ubuntu/brain/server/docker-compose.yml \ exec -T brain-api node bin/mint-key.js cust_<id> "<Display Name>" \ 2>/dev/null | tail -1) echo "$KEY" # confirm it looks like m8brain_<env>_<32 base32 chars>mint-key.js prints the plaintext key once on stdout; diagnostics2>/dev/null. Capture into $KEY for the next step.bash aws ssm put-parameter --name /m8trx/cust_<id>/brain-key \ --type SecureString --value "$KEY"m8trx-agenthcl module "m8trx_agent_acme_1" { source = "github.com/M8trxInfra/M8trx-Brain//agent-artifacts/cloud-init/terraform/m8trx-agent?ref=main" customer_id = "cust_acme" iam_instance_profile_name = module.m8trx_fleet.iam_instance_profile_name subnet_id = aws_subnet.m8trx.id vpc_security_group_ids = [aws_security_group.m8trx_default.id] }bootstrap.sh as user-data, set thecustomer_id — they all share the customer'sagent_id.The 3-command success check.
m8trx-bootstrap: complete for cust_<id>OnBootSec + 30s RandomizedDelaySec jitter + 5minOnUnitActiveSec).bash docker compose -f /home/ubuntu/brain/server/docker-compose.yml \ exec -T postgres psql -U brain brain -tAc \ "select payload->>'hostname' from events where customer_id='cust_<id>' and event_type='heartbeat' order by ts desc limit 1"ip-10-x-x-x or similar).If all three check out, the customer EC2 is fully onboarded.
See agent-artifacts/cloud-init/README.md
§ Operator debug recipe for the full failure → fix mapping. Common
failures it covers:
Customer tag missing or InstanceMetadataTags disabledAn error occurred (AccessDeniedException) when calling the GetParameter operationAn error occurred (ParameterNotFound)tailscale: failed to authenticateDon't restate the recipe here — the source-of-truth is one click away.
aws ssm delete-parameter --name /m8trx/cust_<id>/brain-key,Per-phase docs, README first (most operationally relevant), spec for
context. Plans are intentionally omitted (implementation history, not
operational reference).
| Phase | What it does | README | Design spec |
|---|---|---|---|
| B.1 wrapper | session.start/end events from the m8trx-claude-isolate wrapper | (no README; see modified script at agent-artifacts/m8trx-claude-isolate.modified) |
2026-05-03-brain-mvp-ingestion-design.md |
| B.2 hooks | tool_call events from Claude Code PostToolUse hook in the agent-runtime container | agent-artifacts/claude-hooks/README.md |
2026-05-03-brain-claude-hooks-design.md |
| B.3 heartbeat | host-side liveness + system-stats events every 5 min via systemd timer | agent-artifacts/heartbeat/README.md |
2026-05-03-brain-host-heartbeat-design.md |
| B.4 cloud-init | one-shot AWS user-data bash that installs deps, joins Tailscale, fetches SSM secrets, writes brain.env, installs heartbeat | agent-artifacts/cloud-init/README.md |
2026-05-03-brain-cloud-init-bootstrap-design.md |