A Fortune 500 security team reduced AI spending by $180K/year while improving incident response time by 40%.
Security operations centers are drowning in alerts. This particular SOC was ingesting 12,000 alerts daily across cloud infrastructure, endpoints, and network traffic. They'd started using Claude and GPT-4 to triage everything — but the bill was $35K/month, and they were burning through rate limits on routine tasks.
The problem wasn't intelligence; it was waste. A firewall rule change triggered 4,000 identical alerts. A misconfigured health check spawned another 2,000. Running each through a $20/million-token model was like hiring a neurosurgeon to check blood pressure.
The team's existing pipeline sent every alert through a single prompt template to Claude Sonnet. No prioritization, no caching, no cost awareness. The prompt analyzed the alert, classified severity, and recommended actions.
// Before: every alert hits the expensive model
for alert in alerts:
response = client.chat.completions.create(
model="claude-sonnet-4-0", // $20/M tokens
messages=[{"role": "user", "content": template(alert)}]
)
With BrainstormRouter, the pipeline changed to a single API endpoint. The router handled the rest:
// After: BrainstormRouter decides the model
for alert in alerts:
response = client.chat.completions.create(
model="auto", // router decides
messages=[{"role": "user", "content": template(alert)}]
)
// X-BR-Routed-Model: anthropic/claude-haiku-4-5
// X-BR-Cache: hit (for 4,000 duplicate firewall alerts)
// X-BR-Actual-Cost: $0.0002 (vs $0.0034 before)
Thompson Sampling learned the alert landscape within 48 hours. Simple alerts — duplicate detections, low-severity pattern matches, known false positives — went to Claude Haiku at 10x lower cost. The model performed identically on these predictable patterns.
Only critical incidents hit Claude Sonnet: ransomware signatures, data exfiltration patterns, anomalous privilege escalation. These required deeper reasoning and multi-step analysis that justified the premium pricing.
The semantic cache compounded the savings. Those 4,000 identical firewall alerts? The first one ran through Haiku. The remaining 3,999 returned cached responses in under 5ms — zero API calls, zero cost.
| Alert Category | Volume/Day | Model | Cost/Alert |
|---|---|---|---|
| Duplicate/noise | 6,200 | Cache hit | $0.00 |
| Low-severity known | 3,400 | Claude Haiku | $0.0003 |
| Medium-severity | 1,800 | Claude Haiku | $0.0008 |
| High-severity novel | 480 | Claude Sonnet | $0.0034 |
| Critical incidents | 120 | Claude Sonnet | $0.0052 |
Within 30 days, costs dropped from $35K/month to $20.3K/month — a 42% reduction. Annualized, that's $180K in savings with zero code changes beyond switching the base URL.
Alert triage time fell 40% on duplicate alerts due to cache hits. The SOC team found more real threats per analyst because they weren't paying for — or waiting on — bulk processing of noise.
Critical incident classification accuracy remained at 100%. The router never sent a critical alert to a cheaper model — Thompson Sampling's exploration phase confirmed that Sonnet outperformed Haiku on these categories, and the routing policy stabilized within the first week.
Semantic caching on structured alerts. Security alerts follow templates. A firewall rule change produces thousands of structurally identical alerts. The semantic cache recognized these as the same query and served cached responses for 52% of daily volume.
Thompson Sampling over rules. The team didn't write routing rules. They didn't classify alert types manually. The router learned the optimal model for each alert pattern by measuring response quality against cost. When new alert types appeared, the router explored models broadly before settling on the cheapest one that met quality thresholds.
Guardian Intelligence for visibility. Every request logged cost and routing decisions. The security team could audit which model handled which alert category and verify that critical incidents always hit Sonnet — a compliance requirement for their SOC2 audit.
Next Case Study