Point your existing OpenAI or Anthropic SDK at api.modelspend.best/proxy/v1. Every existing SDK call — streaming, tools, vision, function calling — works identically. The model parameter becomes a routing hint.
Before you switch routing configurations, prove quality is maintained. Upload a dataset of representative prompts, run them against multiple models simultaneously, and score outputs with LLM-as-judge scoring.
Scores are tracked over time so you can see quality trends as providers update their models.
System prompts are code. Treat them like it. Semantic versioning, diff views, a staging workflow, and rollback — the same controls you have on your application code.
Every execute call emits a distributed trace with child spans for each stage of the pipeline. Export to your existing observability stack via OTLP. Debug exactly why a specific request was expensive, slow, or blocked.