Switchyard Architecture¶
Switchyard is an LLM traffic proxy that sits between clients and model backends. It keeps the client-facing API stable while applying routing policy, translating provider formats, and handling configured fallbacks.
System Context¶
flowchart TB
clients["Clients<br/>Coding agents, SDKs, applications"]
switchyard["Switchyard<br/>Local proxy, shared service, or embedded runtime"]
backends["Model backends<br/>Hosted providers, private endpoints, local models"]
clients <-->|"OpenAI and Anthropic API formats"| switchyard
switchyard <-->|"Provider-compatible requests and responses"| backends
Clients connect using supported OpenAI or Anthropic API formats. Switchyard can route a request to a backend with a different native format and still return the response shape expected by the client.
Request Lifecycle¶
flowchart TB
request["1. Receive<br/>Accept the client API format"]
normalize["2. Normalize<br/>Prepare a provider-independent request"]
route["3. Route<br/>Apply policy and select a model endpoint"]
execute["4. Execute<br/>Call the backend and apply configured fallback"]
respond["5. Return<br/>Translate the response or stream for the client"]
request --> normalize --> route --> execute --> respond
Routing policy determines which model or endpoint receives a request. Depending on the selected strategy, that decision can use fixed weights, a classifier, request signals, or conversation affinity. See the Routing Overview for the available strategies.
Backend Wire Format¶
BackendFormat controls the upstream endpoint Switchyard calls. Explicit
formats select an endpoint directly and do not run capability probes.
| Format | Upstream behavior | Use when |
|---|---|---|
ANTHROPIC |
Always sends to /v1/messages. No probe. |
You know the upstream is Anthropic-native (Anthropic API, NIM Claude routes). |
RESPONSES |
Always sends to /v1/responses. No probe. |
You know the upstream supports the OpenAI Responses API. Fails on NIM / non-OpenAI upstreams. |
OPENAI |
Always sends to /v1/chat/completions. No probe. |
You know the upstream is OpenAI-compatible (NIM, OpenRouter, etc). Safe universal choice. |
AUTO |
Probes at startup, picks best format (see below). | Upstream is unknown or varies across deployments. Used by Claude Code and Codex launchers. OpenClaw is intentionally pinned to OPENAI. |
| (omitted) | Defaults to OPENAI — no probe, no fast-path. Silently uses Chat Completions, which is wrong for Anthropic/Bedrock models. Always set format: explicitly. |
|
Claude Code and Codex launchers use AUTO for their single-model targets. |
||
OpenClaw is intentionally pinned to OPENAI for its equivalent target. |
AUTO Decision Tree¶
flowchart TB
auto["BackendFormat.AUTO"]
messages{"/v1/messages works?"}
anthropic["ANTHROPIC<br/>/v1/messages"]
responses{"/v1/responses works?"}
responses_format["RESPONSES<br/>/v1/responses"]
openai["OPENAI<br/>/v1/chat/completions fallback"]
auto -->|"Probe /v1/messages"| messages
messages -->|"Yes"| anthropic
messages -->|"No: probe /v1/responses"| responses
responses -->|"Yes"| responses_format
responses -->|"No"| openai
Supported inbound and response formats are handled automatically.
TranslationEngine converts the client's request to the resolved backend
format and translates the backend response back to the client's expected
format. When a cross-format conversion is required, both directions decode to
and re-encode from the neutral conversation IR. This lets Claude Code, Codex,
OpenClaw, and SDK clients use their native wire format with any supported
upstream format.
Prefer an explicit format for controlled deployments. It skips capability probes and makes the upstream contract clear. Use
AUTOwhen provider capabilities are unknown or vary across deployments.
Related Documentation¶
- Getting Started: install Switchyard and run a first request
- Agent Launchers: run coding agents through a local proxy
- Routing Overview: choose and configure a routing strategy
- CLI Reference: configure and operate Switchyard from the command line