Explores automatically
New models and workload shifts trigger exploration naturally. No A/B infrastructure to run.
Every API call flows through a three-stage pipeline — Ingest, Route, Return — with 13 systems operating in parallel. Overhead under 5ms, measured and returned on every response.
Three possible outcomes: a cache hit ends here, a guardrail block returns 400, or the request continues to Stage 2.
API key validation, tenant identification, rate-limit enforcement.
AES-256-GCM decryption of provider keys, per-key budget ceiling checks.
PII, PCI-DSS, custom regex. Blocked → 400 with sanitized snippet.
pgvector HNSW + in-memory. Hits return in ~4ms, skipping Stages 2 & 3 entirely.
Bayesian posterior over reward. Balances exploration of new models with exploitation of known winners.
Cryptographic certificate validation; SPIFFE ID + RBAC verification.
Agent budget profile checked. Auto-downgrades to cheaper model if below threshold; rejects if exhausted.
Circuit breaker verifies health. Failover to the next-best healthy provider — never a retry to the same endpoint.
7-check pipeline, sliding-window buffer. PII severed mid-stream; model continues, client stops, agent quarantined, SIEM alert.
Records cost, latency, token counts, routing metadata. Computes efficiency, tracks budget velocity, flags waste.
Stores relevant context in the Relational Memory Manager. pgvector similarity retrieves it next session.
Thompson posterior updates based on response quality. The router learned something. Cache stored for the next hit.
The "arms" are the models you have access to. The "reward" is a composite of response quality, cost, and latency. The posterior updates every request.
New models and workload shifts trigger exploration naturally. No A/B infrastructure to run.
Narrow posteriors route traffic to known-good models for their known task types.
Pricing changes, quality drift, deprecations — all tracked without manual intervention.
Every response includes full routing metadata. Watch the router learn your workload in real time.