Switchyard Documentation¶

Switchyard is a typed control plane for LLM traffic. It sits between client applications and model backends, translates OpenAI Chat / Anthropic Messages / OpenAI Responses formats, and routes each request through profile-backed chains.

Use Switchyard when you want coding agents, SDK clients, or internal services to keep their native API shape while traffic is served by a different provider, split across model tiers, or selected by routing policy.

Project Overview¶

Area	What Switchyard provides
Client ingress	OpenAI Chat Completions, Anthropic Messages, and OpenAI Responses compatible endpoints.
Agent launchers	One-command local proxies for Claude Code, Codex, and OpenClaw.
Format translation	Request and response translation between supported wire formats.
Routing policies	Random splits, LLM classifier routing with optional session affinity, signal-driven cascade routing, and YAML route bundles.
Operations	Request/token statistics and context-window fallback behavior.
Deployment options	Local coding-agent proxy, shared HTTP service, or embedded Python runtime.

At a high level, Switchyard keeps client integrations separate from model providers and routing policy:

clients -> compatible API surface -> routing and resilience -> model backends

For system context and request lifecycle diagrams, see Architecture.

First Run¶

pip install "nemo-switchyard[cli,server]"
switchyard configure
switchyard launch claude

For source installs, non-interactive configuration, and a curl sanity check, use Getting Started.

Main Workflows¶

Run coding agents

Launch Claude Code, Codex, or OpenClaw through a local Switchyard proxy.

Agent Launchers
Configure routing

Pick between fixed splits, classifier routing, and cascade routing, with optional session affinity for classifier-driven conversations.

Routing Overview
Understand the system

See how clients, routing policy, model backends, and operations fit together.

Architecture
Operate the proxy

Understand context-window overflow handling and fallback behavior.

Context-Window Handling

Configuration Model¶

Standalone deployments start with a profile config that separates provider connectivity, upstream targets, and client-facing profiles:

endpoints:
  openrouter:
    api_key: ${OPENROUTER_API_KEY}
    base_url: https://openrouter.ai/api/v1

targets:
  strong:
    endpoint: openrouter
    model: openai/gpt-4o
    format: openai
  weak:
    endpoint: openrouter
    model: openai/gpt-4o-mini
    format: openai

profiles:
  smart:
    type: random-routing
    strong: strong
    weak: weak
    strong_probability: 0.3

Run it as a long-lived proxy. Profile and target ids appear as models on GET /v1/models, and clients select one with the request's model field:

switchyard serve --config profiles.yaml --port 4000

The deprecated --routing-profiles flag is retained only for launcher-owned legacy bundles and saved bundle paths:

switchyard --routing-profiles routes.yaml -- launch claude
switchyard --routing-profiles routes.yaml -- configure

Profile ids, direct targets, legacy launcher compatibility, and persistence are covered in Routing Overview.

Routing Reference¶

Need	Read
Fixed strong/weak traffic split for baselines or A/B tests	Random Routing
Per-request strong/weak decisions from a classifier model	LLM Classifier Routing
Signal-driven weak/strong escalation with optional classifier fallback	Cascade Routing
Conversation-level affinity for cache reuse	Sticky Routing

Operations and Reference¶

Topic	Read
Known limitations and workarounds for 0.1.0	Known Issues
CLI syntax, flags, resolution rules, and environment variables	CLI Reference
Context-window overflow retry and fallback behavior	Context-Window Handling