Status: Draft (MVP scope)
Date: 2026-04-29
Owner: M8TRX.AI platform team
Goal. Stand up a phone-home server ("the brain") that collects usage data from hundreds of M8TRX.AI agents deployed on customer EC2 instances, so we can monitor adoption and design product iterations from real usage.
Top questions the brain must answer (priorities, in order):
Cost-to-serve and QA-at-scale are explicit secondary goals: captured by the same data, not the design driver.
Non-goals for MVP (additive later, do not block shipping):
[Customer EC2] [Brain — single AWS region]
┌──────────────────┐ ┌────────────────────────────┐
│ Agent process │ HTTPS POST │ ALB (TLS terminator) │
│ (Python / │ /v1/events │ │ │
│ Claude Code / │ ───────────────► │ ▼ │
│ paperclip) │ Bearer api_key │ FastAPI on EC2 (uvicorn) │
│ + brain SDK │ │ │ validate, redact, write │
└──────────────────┘ │ ▼ │
│ Postgres (RDS) │
│ events / agents / │
│ customers / api_keys │
└────────────┬───────────────┘
│ read-only role
▼
Metabase (queries)
Single-region, AWS-native, intentionally boring. We can horizontally scale FastAPI by adding another EC2 behind the ALB; we can introduce ClickHouse later when Postgres analytics start to hurt.
| Component | Responsibility | Tech |
|---|---|---|
| Brain SDK | Buffer + flush events from inside the agent process; never raise. | Python package m8trx_brain (pip-installable from a private index or git URL). |
| Ingestion API | POST /v1/events, bearer auth, schema validation, redaction, DB write. Stateless. |
FastAPI + uvicorn in a Docker container on EC2 (t3.small) behind an ALB. |
| Redaction pipeline | In-process module. Routes raw summaries through Claude Haiku to strip PII based on the customer's privacy tier. | anthropic SDK; Haiku (claude-haiku-4-5). |
| Postgres (RDS) | Primary store. JSONB payload column for schema flexibility during iteration. | RDS Postgres 16, db.t4g.medium, gp3 storage, 7-day automated snapshots. |
| Metabase | Read-only analytics UI for the team. | Open-source Metabase on a small EC2 / ECS task; uses a read-only DB role. |
Out-of-MVP components (additive later, no design blocker): real-time stream consumer, alerting service, customer-facing dashboard.
Every event sent by the SDK has the same outer shape:
{
"event_id": "uuid-v4",
"ts": "2026-04-29T10:30:00Z",
"event_type": "session.end",
"session_id": "sess_abc123",
"payload": { ... }
}
agent_id and customer_id are never sent by the client. They are resolved server-side from the bearer token. This means a leaked key can only attribute events to the agent that owns it.
event_type |
Emitted when | Payload fields |
|---|---|---|
session.start |
Agent picks up a unit of work. | category (free string, agent-tagged, e.g. "email_reply"); source (optional, e.g. "inbox"). |
session.end |
Unit of work completes, fails, or escalates. | status: "success"|"failed"|"escalated"; duration_ms; tool_calls: [{name, ms, ok}]; llm_usage: {model, input_tokens, output_tokens, cost_cents}; summary_raw? (redaction input). |
error |
Uncaught error in the agent. | message, kind, stack_hash (sha256 of stack trace, no body) — groupable without leaking code paths. |
heartbeat |
SDK background thread, every 5 min. | agent_version, pid_uptime_s, dropped_events_since_last. |
We deliberately do not model tool.call or llm.call as separate top-level events for MVP. Rolling them into session.end keeps query patterns simple (one row per session). Splitting later is an additive, not breaking, change.
category on session.start is a free string, agent-tagged. We accept that this means cross-customer comparison will need a normalization step (likely a periodic job that maps free strings → a curated taxonomy). Forcing a fixed enum on day one would either constrain real workflows or be ignored. The cost of free-string is one normalization job; the cost of premature enum is wrong data.
import brain
brain.init(api_key=os.environ["BRAIN_API_KEY"]) # endpoint defaults to prod URL
with brain.session(category="email_reply") as s:
s.tool("read_inbox", duration_ms=150, ok=True)
s.llm(model="claude-sonnet-4-6", input_tokens=2400,
output_tokens=350, cost_cents=2.1)
s.set_summary("Replied to a billing dispute about an unpaid invoice.")
# context exit emits session.end; status inferred from raised exception or set explicitly via s.fail("...") / s.escalate()
brain.track("error", {"message": "...", "kind": "TimeoutError",
"stack_hash": "sha256:..."}) # raw escape hatch
Behavior:
~/.m8brain/buffer.ndjson and replay on the next successful flush.dropped_events_since_last) is reported on the next heartbeat.A thin wrapper script claude-code-brain-hook.py plus a settings.json snippet customers add to their Claude Code install. Hooks used:
SessionStart → session.startStop / session-end equivalent → session.endPostToolUse → appended to the in-flight session's tool_callsThe wrapper imports the same Python SDK; no separate code path.
paperclip's runtime is not yet known to the brain team. For MVP, paperclip integrates via the raw HTTP contract (Section 4.1). Once we know its language, we wrap it as a thin adapter over the same wire format. Open question — see §11.
Per-customer config (stored on the customers row) chooses one of:
summary_raw is accepted; if sent, it is dropped at the ingestion boundary before the row is written.summary_raw is accepted, passed through Haiku, and only summary_redacted is persisted; the raw value never reaches durable storage.summary_raw is persisted alongside the redacted version. Requires a signed DPA and explicit customer opt-in. Not enabled for any customer at MVP launch.Haiku call uses the following user prompt (system prompt sets the role):
Rewrite the following one-line agent task summary in 120 characters or fewer. Strip every personal name, email address, phone number, postal address, account number, and order/ticket ID. Preserve the business intent (what kind of task it was, what the outcome was). Reply with only the rewritten line.
Wrapped in a 5-second timeout. On failure, the event is still stored, summary_redacted is null, and the payload gains summary_failed: true so we can re-run later. The raw summary is never persisted on a redaction failure for tier B customers — it is dropped.
m8brain_<env>_<32 base32 chars>, e.g. m8brain_prod_AB3F.... The m8brain_ prefix makes leaked keys greppable in code/logs.sha256(key) only. Plaintext shown once at creation.Authorization: Bearer <key> header. Lookup is where key_hash = sha256($1) and revoked_at is null.revoked_at. Rotation is "issue new key, dual-run, revoke old."/admin/*) gated by a separate admin token (env var on the server). Not exposed to the public ALB.create table customers (
id text primary key, -- "cust_<slug>"
name text not null,
privacy_tier text not null check (privacy_tier in ('a','b','c')),
created_at timestamptz not null default now()
);
create table agents (
id text primary key, -- "agent_<slug>"
customer_id text not null references customers(id),
kind text, -- 'booking'|'inbox'|'sales'|'ops'|'other' (advisory)
version text, -- updated from heartbeats
last_seen_at timestamptz, -- updated on every event
created_at timestamptz not null default now()
);
create table api_keys (
id text primary key, -- "key_<slug>"
agent_id text not null references agents(id),
key_hash bytea not null unique,
label text,
created_at timestamptz not null default now(),
revoked_at timestamptz
);
create table events (
id bigserial primary key,
ts timestamptz not null,
ingested_at timestamptz not null default now(),
customer_id text not null, -- denormalized for fast filtering
agent_id text not null,
event_type text not null,
session_id text,
payload jsonb not null,
summary_redacted text -- null for tier-A or pre-redaction
);
create index events_customer_ts on events (customer_id, ts desc);
create index events_agent_ts on events (agent_id, ts desc);
create index events_type_ts on events (event_type, ts desc);
create index events_session on events (session_id) where session_id is not null;
Tier-C raw summaries (when in scope) live inside payload->>'summary_raw'; we deliberately do not promote them to a column to avoid an accidental SELECT * leaking them.
Partition events by month once we cross ~10 M rows. Not needed at MVP volume.
Six saved questions, one for each priority signal, ship with the brain:
now() - max(ts) per agent; agents with no event in > 1 h flagged. (Priority 1.)jsonb_array_elements(payload->'tool_calls') grouped by tool name × customer. (Priority 2.)sum(payload->'llm_usage'->>'cost_cents'). (Unit economics, secondary.)session.start.category distribution per customer + cross-customer. (Priorities 2 & 3.)session.end.status ratios per customer; leading churn indicator. (Priority 1.)These are not the final analytics — they are the smallest set that demonstrably answers the three priority questions on day one.
SDK side. Every internal error is caught. Network failures replay from the on-disk ring buffer. Buffer overflow drops oldest events and bumps the counter on the next heartbeat. The SDK is allowed to be lossy; it is never allowed to crash the agent.
Server side.
400 on schema validation failure (logged, not retried by SDK).401 on bad / revoked / missing bearer token.429 reserved for future rate limiting (not enforced at MVP).503 on DB or downstream errors (SDK retries with exponential backoff: 1 s, 2 s, 4 s, 8 s, max 60 s).Backpressure. Not enforced at MVP. If Postgres falls behind, the SDK will see 503s and retry; we will see it in events_rejected_total before customers do.
docker compose pull && up (or aws ecs update-service if/when we adopt ECS).BRAIN_DB_URL, ANTHROPIC_API_KEY, BRAIN_ADMIN_TOKEN in AWS SSM Parameter Store, fetched at container start.events_ingested_total{event_type, customer_id}, events_rejected_total{reason}, redaction_latency_seconds.brain admin create-customer, create-agent, issue-key) hits the server's /admin/* routes. Output: API key shown once. No web admin UI in MVP.customer_id-scoped queries. If a single large customer's volume becomes disruptive, we partition the events table by customer_id or move them to a dedicated DB. Not needed at MVP.