Date: 2026-05-03
Phase: C (operator-facing onboarding documentation)
Predecessor: B.4 (Tailscale + cloud-init bootstrap)
Successor: none — final MVP phase
Ship docs/runbook-connect-customer-ec2.md: a single-page operator
runbook that synthesizes the four B.x phases (wrapper telemetry,
Claude Code hooks, host heartbeat, cloud-init bootstrap) into an
end-to-end "from zero to telemetry-arriving" sequence for connecting
a brand-new customer EC2 to the M8trx brain.
The runbook is the single entry point an internal operator opens
when a new customer needs onboarding. It covers the order and
prerequisites of what to do; the details (script behaviour, debug
recipes, IAM policy text) live in the per-phase READMEs the runbook
links out to.
All four B.x phases shipped artifacts that work in isolation. The
per-phase READMEs (agent-artifacts/<phase>/README.md) cover their
own slice well. What's missing is the operator-facing "where do I
start" entry point — the doc you open the first time you need to
connect a real customer EC2 and want a single sequenced checklist
instead of stitching four READMEs together yourself.
This is the last MVP phase. Once it's in, "first customer connect"
is operationally documented end-to-end.
Internal M8trx operator only. Assumes:
mint-key.js.The runbook does not explain what SSM, Tailscale, or Terraform
are. If/when M8trx ever needs a customer-DevOps-facing variant
(for customers running their own AWS accounts), it'll fork from
this internal version. Building both now is YAGNI.
Onboard only.
Out of scope (each handled separately when needed):
docs/superpowers/specs/2026-04-29-brain-design.md).Single file docs/runbook-connect-customer-ec2.md. Six h2 sections,
in operator-execution order:
What must already exist before any customer onboard. Each item is
a checklist line with a "verify by" hint and a pointer to the
source-of-truth doc:
curl http://<brain-tailscale-ip>:8080/v1/healthz).tag:m8trx-customer-host ACL role/m8trx/brain-url (String, the brain Tailscale URL)./m8trx/tailscale/auth-key (SecureString, reusable+ephemeraltag:m8trx-customer-host).agent-artifacts/cloud-init/iam-policy.json (Terraform snippetagent-artifacts/cloud-init/README.md § Terraform launchThe 4-step sequence for every new customer:
Mint a brain bearer key:
docker compose -f /home/ubuntu/brain/server/docker-compose.yml \
exec -T brain-api node bin/mint-key.js cust_<id> "<Display Name>"
Store it in SSM:
aws ssm put-parameter --name /m8trx/cust_<id>/brain-key \
--type SecureString --value "$KEY"
Update Tailscale ACL — only if first customer on a fresh tailnet
(usually no-op since the tag:m8trx-customer-host rule is in
prerequisites).
Apply Terraform (or paste user-data into the EC2 console) to
launch the customer EC2 with the right tag, metadata-options,
instance-profile, and user-data. Reference
agent-artifacts/cloud-init/README.md § Terraform launch snippet.
The 3-command success check:
Check EC2 console "Get System Log" → expect
m8trx-bootstrap: complete for cust_<id>.
Wait ~5 min for the first heartbeat to fire.
On brain EC2:
docker compose -f /home/ubuntu/brain/server/docker-compose.yml \
exec -T postgres psql -U brain brain -tAc \
"select payload->>'hostname' from events
where customer_id='cust_<id>'
and event_type='heartbeat'
order by ts desc limit 1"
Expect a real hostname.
Done.
Pointer-only section. Don't restate the full failure → fix mapping;
just link to it:
See
agent-artifacts/cloud-init/README.md§ Operator debug recipe
for the full failure → fix mapping (Customer tag missing,
AccessDeniedException, ParameterNotFound, tailscale auth fail, no
events arriving).
Restate the Out-of-scope items above so a reader landing on the
runbook knows what to look for elsewhere. Particularly important:
"updating existing customer EC2" → terminate + relaunch.
Per-phase doc layers, README first (most operationally relevant),
spec second:
agent-artifacts/m8trx-claude-isolate.{patch,modified} (the patch + the post-patch script for inspection; no dedicated README). Upstream design context: docs/superpowers/specs/2026-05-03-brain-mvp-ingestion-design.md (the MVP ingestion design that motivated the wrapper telemetry).agent-artifacts/claude-hooks/README.md. Spec: docs/superpowers/specs/2026-05-03-brain-claude-hooks-design.md.agent-artifacts/heartbeat/README.md. Spec: docs/superpowers/specs/2026-05-03-brain-host-heartbeat-design.md.agent-artifacts/cloud-init/README.md. Spec: docs/superpowers/specs/2026-05-03-brain-cloud-init-bootstrap-design.md.Plans (docs/superpowers/plans/*) are intentionally omitted from
the runbook — they're implementation history, not operational
reference.
In addition to the new runbook file, one small change to
docs/RESUME.md: add a top-level pointer at the start of "What's
running right now" or as a new "Operator runbook" section so a
returning operator/contributor sees:
For onboarding a new customer EC2, read
docs/runbook-connect-customer-ec2.mdfirst.
This is the runbook's discoverability hook from the doc operators
already know to open.
The runbook itself is doc-only — there's no test suite that runs
against it. Validation is empirical: the first real-customer
connect IS the runbook's validation. If a reasonably-careful
operator follows the runbook end-to-end and gets cust_<id>
heartbeats arriving at brain, the runbook works. If they get stuck,
that's a runbook bug to fix in a follow-up.
To set the runbook up for that test, the spec mandates:
<id> placeholder is filled in by theagent-artifacts/cloud-init/README.md § Manual smokeNone at design-approval time. All three clarifying questions were
resolved interactively before this spec was written.