Skip to content

LLM Classifier Routing

LLM classifier routing asks a classifier model to evaluate each request, then sends the request to a weak or strong backend. Use it when routing should depend on request content, tool use, context needs, or risk level instead of a fixed traffic split.

The classifier runs before the selected backend. Low-confidence and abstained results use the configured default tier. Classifier errors do the same when classifier_fail_open is enabled, which is the default. The built-in two-tier policies default to strong.

Choose a policy

Set profile_name for the traffic you expect:

profile_name Use for Default tier mapping
general Mixed chat or API traffic simple uses weak; all higher tiers use strong.
coding_agent Claude Code, Codex, Cursor-style agents simple and medium use weak; complex and reasoning use strong. Tool-planning turns can escalate.
openclaw OpenClaw personal-assistant traffic simple and medium use weak; complex and reasoning use strong. Tool orchestration and high-risk external actions can escalate.

For coding-agent traffic, start with profile_name: coding_agent.

Configure a classifier profile

Define the strong, weak, and classifier models as targets, then reference those target IDs from an llm-routing profile:

endpoints:
  openrouter:
    api_key: ${OPENROUTER_API_KEY}
    base_url: https://openrouter.ai/api/v1

targets:
  strong:
    endpoint: openrouter
    model: openai/gpt-4o
    format: openai
  weak:
    endpoint: openrouter
    model: openai/gpt-4o-mini
    format: openai
  classifier:
    endpoint: openrouter
    model: openai/gpt-4o-mini
    format: openai

profiles:
  smart:
    type: llm-routing
    profile_name: coding_agent
    strong: strong
    weak: weak
    classifier: classifier
    fallback_target_on_evict: strong
    classifier_min_confidence: 0.6
    classifier_fail_open: true
    classifier_recent_turn_window: 4

The classifier target must use format: openai. Start the profile server with:

switchyard serve --config profiles.yaml --port 4000

The profile ID (smart) is the model ID clients select for classifier-based routing. The target IDs remain directly selectable when a client needs to bypass the classifier.

Try the profile with representative requests:

# Coding task: expected to use the strong tier.
curl -X POST http://localhost:4000/v1/chat/completions \
  -H "Authorization: Bearer dummy" -H "Content-Type: application/json" \
  -d '{"model":"smart","messages":[{"role":"user","content":"Plan and implement a multi-file API change."}],"max_tokens":200}'

# Simple question: expected to use the weak tier.
curl -X POST http://localhost:4000/v1/chat/completions \
  -H "Authorization: Bearer dummy" -H "Content-Type: application/json" \
  -d '{"model":"smart","messages":[{"role":"user","content":"What is 2+2? Reply with just the number."}],"max_tokens":50}'

Treat these as smoke checks, not fixed test vectors: the classifier model and prompt determine the verdict.

Useful options

Option Use it when
classifier_min_confidence Low-confidence results should use default_tier instead of the classifier policy.
classifier_fail_open Classifier errors should use default_tier rather than fail the client request.
classifier_recent_turn_window The classifier needs more or less recent conversation and tool context.
classifier_max_tokens You need to cap the classifier tool-call response.
alignment_min_confidence A classifier recommendation should only raise the policy tier above this confidence.
default_tier Abstain, low-confidence, and fail-open decisions should use a tier other than the default strong.
tier_mapping The four classifier policy tiers need a custom mapping to weak or strong.

For a self-hosted strong, weak, or classifier target, configure it like any other OpenAI-compatible endpoint. See Self-hosted targets.

Session affinity

LLM classifier routing supports optional session affinity through DeterministicRoutingConfig. Set session_affinity: true to share one affinity store between the classifier and tier selector. After any configured affinity_warmup_turns, the first confident verdict pins the tier. Later turns reuse that tier before classification, so they skip the classifier call; abstain, low-confidence, missing-signal, and fail-open decisions do not pin.

The CLI currently exposes these fields on a type: deterministic entry in a routes: bundle loaded with --routing-profiles. The Rust llm-routing profile loaded by switchyard serve --config does not yet expose them. See Session Affinity for YAML and How session affinity composes for the interaction with routing decisions.

If the per-request classifier cost is too high, use Cascade Routing, which can route many turns from tool and agent-progress signals without an extra classifier call.