Research Project This is a free AI research project. No warranties, SLAs, or company associations. Learn more
← Back to Case Studies
Security

Cut security AI costs 42%
without losing accuracy

A Fortune 500 security team reduced AI spending by $180K/year while improving incident response time by 40%.

$180K
Annual cost reduction (42%)
40×
Faster on duplicate alerts via caching
100%
Critical incident accuracy maintained

The Problem

Security operations centers are drowning in alerts. This particular SOC was ingesting 12,000 alerts daily across cloud infrastructure, endpoints, and network traffic. They'd started using Claude and GPT-4 to triage everything — but the bill was $35K/month, and they were burning through rate limits on routine tasks.

The problem wasn't intelligence; it was waste. A firewall rule change triggered 4,000 identical alerts. A misconfigured health check spawned another 2,000. Running each through a $20/million-token model was like hiring a neurosurgeon to check blood pressure.

The Architecture

The team's existing pipeline sent every alert through a single prompt template to Claude Sonnet. No prioritization, no caching, no cost awareness. The prompt analyzed the alert, classified severity, and recommended actions.

// Before: every alert hits the expensive model
for alert in alerts:
    response = client.chat.completions.create(
        model="claude-sonnet-4-0",       // $20/M tokens
        messages=[{"role": "user", "content": template(alert)}]
    )

With BrainstormRouter, the pipeline changed to a single API endpoint. The router handled the rest:

// After: BrainstormRouter decides the model
for alert in alerts:
    response = client.chat.completions.create(
        model="auto",                     // router decides
        messages=[{"role": "user", "content": template(alert)}]
    )
    // X-BR-Routed-Model: anthropic/claude-haiku-4-5
    // X-BR-Cache: hit (for 4,000 duplicate firewall alerts)
    // X-BR-Actual-Cost: $0.0002 (vs $0.0034 before)

How Routing Worked

Thompson Sampling learned the alert landscape within 48 hours. Simple alerts — duplicate detections, low-severity pattern matches, known false positives — went to Claude Haiku at 10x lower cost. The model performed identically on these predictable patterns.

Only critical incidents hit Claude Sonnet: ransomware signatures, data exfiltration patterns, anomalous privilege escalation. These required deeper reasoning and multi-step analysis that justified the premium pricing.

The semantic cache compounded the savings. Those 4,000 identical firewall alerts? The first one ran through Haiku. The remaining 3,999 returned cached responses in under 5ms — zero API calls, zero cost.

Alert Routing Distribution (30-Day Average)

Alert Category Volume/Day Model Cost/Alert
Duplicate/noise 6,200 Cache hit $0.00
Low-severity known 3,400 Claude Haiku $0.0003
Medium-severity 1,800 Claude Haiku $0.0008
High-severity novel 480 Claude Sonnet $0.0034
Critical incidents 120 Claude Sonnet $0.0052

Results

Within 30 days, costs dropped from $35K/month to $20.3K/month — a 42% reduction. Annualized, that's $180K in savings with zero code changes beyond switching the base URL.

Alert triage time fell 40% on duplicate alerts due to cache hits. The SOC team found more real threats per analyst because they weren't paying for — or waiting on — bulk processing of noise.

Critical incident classification accuracy remained at 100%. The router never sent a critical alert to a cheaper model — Thompson Sampling's exploration phase confirmed that Sonnet outperformed Haiku on these categories, and the routing policy stabilized within the first week.

What Made It Work

Semantic caching on structured alerts. Security alerts follow templates. A firewall rule change produces thousands of structurally identical alerts. The semantic cache recognized these as the same query and served cached responses for 52% of daily volume.

Thompson Sampling over rules. The team didn't write routing rules. They didn't classify alert types manually. The router learned the optimal model for each alert pattern by measuring response quality against cost. When new alert types appeared, the router explored models broadly before settling on the cheapest one that met quality thresholds.

Guardian Intelligence for visibility. Every request logged cost and routing decisions. The security team could audit which model handled which alert category and verify that critical incidents always hit Sonnet — a compliance requirement for their SOC2 audit.

Next Case Study

Route support queries to the perfect model, automatically →