Switchyard Documentation¶
Switchyard is a typed control plane for LLM traffic. It sits between client applications and model backends, translates OpenAI Chat / Anthropic Messages / OpenAI Responses formats, and routes each request through profile-backed chains.
Use Switchyard when you want coding agents, SDK clients, or internal services to keep their native API shape while traffic is served by a different provider, split across model tiers, or selected by routing policy.
Project Overview¶
| Area | What Switchyard provides |
|---|---|
| Client ingress | OpenAI Chat Completions, Anthropic Messages, and OpenAI Responses compatible endpoints. |
| Agent launchers | One-command local proxies for Claude Code, Codex, and OpenClaw. |
| Format translation | Request and response translation between supported wire formats. |
| Routing policies | Random splits, LLM classifier routing with optional session affinity, signal-driven cascade routing, and YAML route bundles. |
| Operations | Request/token statistics and context-window fallback behavior. |
| Deployment options | Local coding-agent proxy, shared HTTP service, or embedded Python runtime. |
At a high level, Switchyard keeps client integrations separate from model providers and routing policy:
For system context and request lifecycle diagrams, see Architecture.
First Run¶
For source installs, non-interactive configuration, and a curl sanity check, use Getting Started.
Main Workflows¶
-
Run coding agents
Launch Claude Code, Codex, or OpenClaw through a local Switchyard proxy.
-
Configure routing
Pick between fixed splits, classifier routing, and cascade routing, with optional session affinity for classifier-driven conversations.
-
Understand the system
See how clients, routing policy, model backends, and operations fit together.
-
Operate the proxy
Understand context-window overflow handling and fallback behavior.
Configuration Model¶
Standalone deployments start with a profile config that separates provider connectivity, upstream targets, and client-facing profiles:
endpoints:
openrouter:
api_key: ${OPENROUTER_API_KEY}
base_url: https://openrouter.ai/api/v1
targets:
strong:
endpoint: openrouter
model: openai/gpt-4o
format: openai
weak:
endpoint: openrouter
model: openai/gpt-4o-mini
format: openai
profiles:
smart:
type: random-routing
strong: strong
weak: weak
strong_probability: 0.3
Run it as a long-lived proxy. Profile and target ids appear as models on
GET /v1/models, and clients select one with the request's model field:
The deprecated --routing-profiles flag is retained only for launcher-owned
legacy bundles and saved bundle paths:
switchyard --routing-profiles routes.yaml -- launch claude
switchyard --routing-profiles routes.yaml -- configure
Profile ids, direct targets, legacy launcher compatibility, and persistence are covered in Routing Overview.
Routing Reference¶
| Need | Read |
|---|---|
| Fixed strong/weak traffic split for baselines or A/B tests | Random Routing |
| Per-request strong/weak decisions from a classifier model | LLM Classifier Routing |
| Signal-driven weak/strong escalation with optional classifier fallback | Cascade Routing |
| Conversation-level affinity for cache reuse | Sticky Routing |
Operations and Reference¶
| Topic | Read |
|---|---|
| Known limitations and workarounds for 0.1.0 | Known Issues |
| CLI syntax, flags, resolution rules, and environment variables | CLI Reference |
| Context-window overflow retry and fallback behavior | Context-Window Handling |