BrainstormRouter
Architecture

What happens
when you send a request.

Every API call flows through a three-stage pipeline — Ingest, Route, Return — with 13 systems operating in parallel. Overhead under 5ms, measured and returned on every response.

Stage 01 · Ingest & Classify

Before routing happens, four checks fire in sequence.

Three possible outcomes: a cache hit ends here, a guardrail block returns 400, or the request continues to Stage 2.

  1. 01

    Auth & rate limit

    API key validation, tenant identification, rate-limit enforcement.

  2. 02

    Virtual Key Vault

    AES-256-GCM decryption of provider keys, per-key budget ceiling checks.

  3. 03

    Guardrail pre-scan

    PII, PCI-DSS, custom regex. Blocked → 400 with sanitized snippet.

  4. 04

    Semantic cache

    pgvector HNSW + in-memory. Hits return in ~4ms, skipping Stages 2 & 3 entirely.

Stage 02 · Route & Execute

Model selection & provider dispatch with safety checks.

  1. 01

    Thompson sampling

    Bayesian posterior over reward. Balances exploration of new models with exploitation of known winners.

  2. 02

    CAF identity

    Cryptographic certificate validation; SPIFFE ID + RBAC verification.

  3. 03

    ARM budget

    Agent budget profile checked. Auto-downgrades to cheaper model if below threshold; rejects if exhausted.

  4. 04

    Provider dispatch

    Circuit breaker verifies health. Failover to the next-best healthy provider — never a retry to the same endpoint.

Model leaderboard
claude-sonnet-4-6
0.91
σ 0.03
gpt-4o
0.87
σ 0.04
gemini-2.5
0.84
σ 0.05
deepseek-r1
0.71
σ 0.08
Circuit breaker state
Anthropic
CLOSED
OpenAI
CLOSED
Google
HALF-OPEN
DeepSeek
CLOSED
Stage 03 · Return & Learn

Response processing, cost tracking, router learning.

  1. 01

    Streaming firewall

    7-check pipeline, sliding-window buffer. PII severed mid-stream; model continues, client stops, agent quarantined, SIEM alert.

  2. 02

    Guardian

    Records cost, latency, token counts, routing metadata. Computes efficiency, tracks budget velocity, flags waste.

  3. 03

    RMM store

    Stores relevant context in the Relational Memory Manager. pgvector similarity retrieves it next session.

  4. 04

    Posterior update

    Thompson posterior updates based on response quality. The router learned something. Cache stored for the next hit.

Request timing · typical
Guardian overhead
3.2 ms
Provider round-trip
847.0 ms
Total
850.2 ms
BR overhead as share
0.38%
Why Thompson sampling

Bayesian bandits > static rules.

The "arms" are the models you have access to. The "reward" is a composite of response quality, cost, and latency. The posterior updates every request.

Explores automatically

New models and workload shifts trigger exploration naturally. No A/B infrastructure to run.

Exploits efficiently

Narrow posteriors route traffic to known-good models for their known task types.

Adapts continuously

Pricing changes, quality drift, deprecations — all tracked without manual intervention.

Manual rules
No adaptation
No learning
Round-robin
No adaptation
No learning
ε-greedy
Slowly adapts
Partial learning
Thompson
Continuous
Every request
Ready to route?

See the pipeline in action.

Every response includes full routing metadata. Watch the router learn your workload in real time.