All features

Everything you need to
master AI costs.

From a single-line proxy swap to enterprise governance and distributed tracing — ModelSpend grows with your team.

Core routing

OpenAI-compatible proxy

Point your existing OpenAI or Anthropic SDK at api.modelspend.best/proxy/v1. Every existing SDK call — streaming, tools, vision, function calling — works identically. The model parameter becomes a routing hint.

  • ✓ Streaming SSE with OpenAI delta format
  • ✓ Function calling and tool use pass-through
  • ✓ Cost metadata in x-modelspend-* headers
  • ✓ Model-to-tier hint table (50+ model patterns)
# Environment variable only OPENAI_BASE_URL=https://api.modelspend.best/proxy/v1 OPENAI_API_KEY=msp_live_... # Then use OpenAI SDK normally from openai import OpenAI client = OpenAI() # reads env vars
New · v3.1

Evaluation framework

Before you switch routing configurations, prove quality is maintained. Upload a dataset of representative prompts, run them against multiple models simultaneously, and score outputs with LLM-as-judge scoring.

Scores are tracked over time so you can see quality trends as providers update their models.

  • ✓ CSV or API dataset upload (up to 1,000 items)
  • ✓ Run against up to 6 models in parallel
  • ✓ LLM-as-judge scoring (0.0–1.0) with reasoning
  • ✓ Exact match mode for deterministic tasks
  • ✓ Per-model quality × cost × latency comparison
  • ✓ Link eval runs to prompt versions
Sample eval result
gpt-4o-mini
91 $0.0003
claude-haiku
88 $0.0004
gemini-flash
83 $0.0002
llama-4-scout
79 $0.0001
New · v3.1
customer-support · v1.4.0
1.4.0 production Today
1.3.0 archived 3 days ago
1.2.1 archived 1 week ago
1.2.0-draft draft Now editing

Prompt registry

System prompts are code. Treat them like it. Semantic versioning, diff views, a staging workflow, and rollback — the same controls you have on your application code.

  • ✓ Semantic versioning (major.minor.patch)
  • ✓ draft → staging → production promotion
  • ✓ Line-level diff between any two versions
  • ✓ One-click rollback to any previous version
  • ✓ Link to eval runs for quality validation
  • ✓ Token count tracking per version
New · v3.1

OpenTelemetry traces

Every execute call emits a distributed trace with child spans for each stage of the pipeline. Export to your existing observability stack via OTLP. Debug exactly why a specific request was expensive, slow, or blocked.

  • ✓ Root span per request + child spans per stage
  • ✓ Attributes: cost, tokens, tier, provider, model
  • ✓ OTLP HTTP export to any collector
  • ✓ Native integrations: Jaeger, Tempo, Datadog, New Relic, Honeycomb
  • ✓ 30-day rolling retention in ModelSpend
  • ✓ In-dashboard trace viewer with waterfall
modelspend.execute 1247ms
modelspend.governance 3ms
modelspend.dlp.scan 8ms
modelspend.routing.decision 12ms
modelspend.budget.check 5ms
modelspend.bridge.execute 1219ms

Ready to reduce your AI bill?

Start free → View pricing