Research Project This is a free AI research project. No warranties, SLAs, or company associations. Learn more
Routing

Thompson Sampling Router

Learns the optimal model for each task type without manual configuration. UCB1 exploration with Gaussian posterior over quality scores. Automatically finds the cheapest model above your quality floor.

The router maintains a Bayesian posterior over model quality for each task category. New task types are explored broadly; established patterns exploit the cheapest model that meets your quality threshold. Routing decisions adapt continuously as your workload shifts.

Model variants give you control over the trade-off:

  • :floor — cheapest model above quality threshold
  • :fast — lowest latency model
  • :best — highest quality regardless of cost
  • auto — Thompson Sampling decides
THOMPSON SAMPLING UCB1 · Gaussian posterior · 321 models quality score → claude-sonnet-4 gpt-4o gemini-flash SAMPLED claude-sonnet-4 confidence: 0.87 cost: $0.003 auto TS picks :floor cheapest :fast lowest p50 :best highest q 14,291 observations · 321 models · 12 task categories · posterior updated per-request
Observability

Guardian Intelligence

The cost seatbelt. Predicts spend before the request executes. Tracks cost at request granularity with <5ms overhead. Every response includes cost, efficiency, and routing metadata in headers.

Guardian identifies waste patterns: tasks where cheaper models produce equivalent quality, burst spending from retry storms, and cost anomalies from unexpected input lengths. Budget velocity alerts warn before you exceed daily targets.

All data feeds into the Insights API for programmatic access to daily spend, waste analysis, and optimization recommendations.

GUARDIAN INTELLIGENCE <5ms overhead · per-request DAILY SPEND $247.82 budget: $500 Mon Wed Fri Today ROUTING SAVINGS $112.56 CACHE SAVINGS $89.40 EFFICIENCY 0.91 RESPONSE HEADERS (every request) X-BR-Guardian-Status: on X-BR-Estimated-Cost: $0.0034 X-BR-Actual-Cost: $0.0031 X-BR-Efficiency: 0.91 X-BR-Guardian-Overhead-Ms: 1.2 X-BR-Selected-Model: claude-haiku-4.5 14,291 requests · $247.82 total · $201.96 saved · 1.2ms avg overhead
Memory

Relational Memory Manager

4-block architecture: human, system, project, and general memory. pgvector-powered similarity search retrieves relevant context automatically. Session-isolated with cross-session retrieval for long-running workflows.

Memory is injected into the system prompt transparently — no code changes. Nightly synthesis compacts memories into durable knowledge. Every memory operation exports to your observability pipeline for compliance auditing.

RELATIONAL MEMORY MANAGER pgvector · session-isolated H HUMAN user preferences · corrections 847 entries S SYSTEM config · constraints · rules 156 entities P PROJECT goals · decisions · context 42 patterns G GENERAL shared knowledge · facts active VECTOR SEARCH "What did we discuss about the API redesign?" 3 matches sim > 0.80 · 1.8ms NIGHTLY SYNTHESIS 847 turns → 23 durable entries CROSS-SESSION auto-injected · zero code changes 2,847 memories · 142 sessions · 1.8ms avg retrieval
Identity

Cryptographic Agency Firewall

SPIFFE-compatible agent identities with 5-minute ephemeral certificates. Every agent gets a cryptographically signed identity that self-destructs before lateral movement can begin.

The kill switch: revoke an agent's JWT, freeze its memory, and emit a SIEM alert — all in one API call. Behavioral profiling detects anomalous tool calls and triggers automatic quarantine.

Internal CA issues RSA-signed certificates. Mutual TLS verification on every request. ALB passthrough mode for AWS deployments.

CRYPTOGRAPHIC IDENTITY SPIFFE · mTLS · RSA-2048 AGENT requests identity CSR RSA-2048 · PKCS#10 CA sign · RSA · SHA-256 CERT TTL: 5min · auto-renew subject: spiffe://brainstormrouter.com/agent/sales-bot claims: role=sales · tid=acme-corp · tools=[search,read,crm] fingerprint: 847e06d1902a84c9 TTL 3m 42s KILL SWITCH 1. Revoke JWT instantly 2. Freeze agent memory 3. SIEM audit + quarantine BEHAVIORAL PROFILE tool_calls/min normal data_access normal peer_comms elevated mTLS: mutual verification · every request · 5min cert lifecycle
Security

Virtual Key Vault

AES-256-GCM encrypted storage for provider API keys. Each key carries its own budget ceiling. Rotate keys without downtime — the new key activates while the old one drains in-flight requests gracefully.

BYOK support: bring your own AWS KMS, GCP Cloud KMS, or Azure Key Vault key for envelope encryption. Per-key encryption metadata means rotation never breaks existing keys. Zero-downtime by design.

VIRTUAL KEY VAULT AES-256-GCM · BYOK A Anthropic sk-ant-****...7f2a $35 / $50/day ACTIVE O OpenAI sk-proj-****...9e1b $82 / $100/day ACTIVE G Google AIza****...xQ4w $12 / $40/day ACTIVE ZERO-DOWNTIME ROTATION old: draining new: active ENCRYPTION cipher: AES-256-GCM kms: AWS KMS (BYOK) TENANT ISOLATION acme · 3 initech · 2 globex · 4 9 keys · 3 tenants · 0 unencrypted · last rotation: 2h ago
Security

Streaming Tool Firewall

7-check security pipeline with 3-layer guardrails. The StreamingGuardrailEvaluator intercepts model output token-by-token. PII detected mid-stream is severed before your application sees it — not logged after delivery.

Layer 1: input guardrails (PII, PCI-DSS, custom regex). Layer 2: streaming output interception (sliding window buffer). Layer 3: tool call governance (RBAC on tool_calls arrays before execution).

Every guardrail verdict exports to your SIEM as structured CEF or ECS JSON events. Blocked requests return structured error responses with sanitized snippets.

STREAMING FIREWALL 7 checks · 3 layers · token-level RBAC PASS DENY PASS ARGS PASS INJECT PASS PII SEVER STREAM SEVERED detected: "John Smith, SSN 482-██-████" action: truncate · SIEM event emitted · agent quarantined TOOL CALL GOVERNANCE tool_call: sql_query("DROP TABLE users") verdict: blocked · reason: destructive_operation · agent: quarantined 847 inspected · 3 blocked · 12 warned · 0.8ms avg overhead
Governance

Agent Resource Manager

Virtual corporate cards for AI agents. Every agent gets a profile with budget limits, quality floors, and lifecycle state management. When budgets run low, ARM auto-downgrades to cheaper models instead of failing.

5-state lifecycle: provisioned → active → quarantined → suspended → terminated. The agent leaderboard ranks agents by cost-efficiency, quality scores, and throughput — identify your best and worst performers at a glance.

AGENT RESOURCE MANAGER profiles · budgets · lifecycle AGENT PROFILE S sales-bot-prod ACTIVE since 4h ago budget: $3.50 / $5.00 quality: 0.94 model: sonnet-4 downgrade at 80% → haiku-4.5 LIFECYCLE PROV ACTIVE QUAR LEADERBOARD 1. sales-bot 0.94 2. research 0.87 3. coding 0.82 AUTO-DOWNGRADE THRESHOLDS 80% → downgrade 95% → quarantine 14 active agents · 1 quarantined · total spend: $42.80 / $100.00
Integration

MCP Gateway

Model Context Protocol gateway with tool registry, RBAC-based tool permissions, and full audit trail. Agents discover and invoke tools through a governed interface — every tool call is authorized, logged, and rate-limited.

Drop-in OpenAI compatibility. Change your base URL and API key. Your existing code, tools, and frameworks work immediately — LangChain, Vercel AI SDK, CrewAI, LlamaIndex.

MCP GATEWAY 65 tools · RBAC · audit trail YOUR APP OpenAI SDK LangChain Vercel AI BRAINSTORMROUTER AUTH RBAC AUDIT route → model selection guard → PII / tool RBAC log → every decision bill → per-request cost A Anthropic O OpenAI G Google DROP-IN COMPATIBLE — CHANGE ONE LINE base_url = https://api.openai.com/v1 base_url = https://api.brainstormrouter.com/v1 OpenAI SDK · LangChain · Vercel AI · CrewAI · LlamaIndex — all compatible
Reliability

Sentinel Health System

Continuous health probes across all provider endpoints. When a provider goes down, Sentinel detects it, opens the circuit breaker, and re-probes at 15s and 60s intervals to catch recovery. Self-healing circuit breakers recovered in under 30 seconds during the Prometheus stress test.

Combined with the cascade system, failures automatically escalate to healthy providers without any client-side retry logic.

SENTINEL HEALTH SYSTEM 7 providers · continuous probes t=0 FAILURE circuit opens PROBE +15s PROBE +60s RECOVERED circuit closes Prometheus result: <30s recovery under sustained provider failure Re-probe schedule: 15s + 60s · eliminates 5min cold-start blind spot

Agents

Agent Delegation & Graduated Trust

Agents can provision sub-agents with scoped permissions, budget limits, and model restrictions. Trust levels graduate from minimal through standard to elevated based on operational history. M2M authentication via Agent JWT or mTLS certificates.

5
Auth Methods
API key, Supabase JWT,
Agent JWT, mTLS, SCIM
3
Trust Levels
minimal, standard,
elevated

AI Discovery

Machine-Native Interface

7-layer discovery stack: llms.txt, agents.json, RFC 8631 Link headers, /v1/discovery, and /v1/self — AI agents bootstrap themselves without human help.

Learn More
Tools

MCP Server — 65 Tools

One SSE connection gives agents governed access to routing, memory, budget, security, and admin tools. RBAC per tool, full audit trail, secretless access to upstream providers.

Learn More

Ready?

Start routing in 30 seconds

Change your base URL. Add your provider keys. All 8 systems activate automatically.

$ pip install openai && export OPENAI_BASE_URL=https://api.brainstormrouter.com/v1