LLM Classifier Routing¶

LLM classifier routing asks a classifier model to evaluate each request, then sends the request to a weak or strong backend. Use it when routing should depend on request content, tool use, context needs, or risk level instead of a fixed traffic split.

The classifier runs before the selected backend. Low-confidence and abstained results use the configured default tier. Classifier errors do the same when classifier_fail_open is enabled, which is the default. The built-in two-tier policies default to strong.

Choose a policy¶

Set profile_name for the traffic you expect:

`profile_name`	Use for	Default tier mapping
`general`	Mixed chat or API traffic	`simple` uses `weak`; all higher tiers use `strong`.
`coding_agent`	Claude Code, Codex, Cursor-style agents	`simple` and `medium` use `weak`; `complex` and `reasoning` use `strong`. Tool-planning turns can escalate.
`openclaw`	OpenClaw personal-assistant traffic	`simple` and `medium` use `weak`; `complex` and `reasoning` use `strong`. Tool orchestration and high-risk external actions can escalate.

For coding-agent traffic, start with profile_name: coding_agent.

Configure a classifier profile¶

Define the strong, weak, and classifier models as targets, then reference those target IDs from an llm-routing profile:

endpoints:
  openrouter:
    api_key: ${OPENROUTER_API_KEY}
    base_url: https://openrouter.ai/api/v1

targets:
  strong:
    endpoint: openrouter
    model: openai/gpt-4o
    format: openai
  weak:
    endpoint: openrouter
    model: openai/gpt-4o-mini
    format: openai
  classifier:
    endpoint: openrouter
    model: openai/gpt-4o-mini
    format: openai

profiles:
  smart:
    type: llm-routing
    profile_name: coding_agent
    strong: strong
    weak: weak
    classifier: classifier
    fallback_target_on_evict: strong
    classifier_min_confidence: 0.6
    classifier_fail_open: true
    classifier_recent_turn_window: 4

The classifier target must use format: openai. Start the profile server with:

switchyard serve --config profiles.yaml --port 4000

The profile ID (smart) is the model ID clients select for classifier-based routing. The target IDs remain directly selectable when a client needs to bypass the classifier.

Try the profile with representative requests:

# Coding task: expected to use the strong tier.
curl -X POST http://localhost:4000/v1/chat/completions \
  -H "Authorization: Bearer dummy" -H "Content-Type: application/json" \
  -d '{"model":"smart","messages":[{"role":"user","content":"Plan and implement a multi-file API change."}],"max_tokens":200}'

# Simple question: expected to use the weak tier.
curl -X POST http://localhost:4000/v1/chat/completions \
  -H "Authorization: Bearer dummy" -H "Content-Type: application/json" \
  -d '{"model":"smart","messages":[{"role":"user","content":"What is 2+2? Reply with just the number."}],"max_tokens":50}'

Treat these as smoke checks, not fixed test vectors: the classifier model and prompt determine the verdict.

Useful options¶

Option	Use it when
`classifier_min_confidence`	Low-confidence results should use `default_tier` instead of the classifier policy.
`classifier_fail_open`	Classifier errors should use `default_tier` rather than fail the client request.
`classifier_recent_turn_window`	The classifier needs more or less recent conversation and tool context.
`classifier_max_tokens`	You need to cap the classifier tool-call response.
`alignment_min_confidence`	A classifier recommendation should only raise the policy tier above this confidence.
`default_tier`	Abstain, low-confidence, and fail-open decisions should use a tier other than the default `strong`.
`tier_mapping`	The four classifier policy tiers need a custom mapping to `weak` or `strong`.

For a self-hosted strong, weak, or classifier target, configure it like any other OpenAI-compatible endpoint. See Self-hosted targets.

Session affinity¶

LLM classifier routing supports optional session affinity through DeterministicRoutingConfig. Set session_affinity: true to share one affinity store between the classifier and tier selector. After any configured affinity_warmup_turns, the first confident verdict pins the tier. Later turns reuse that tier before classification, so they skip the classifier call; abstain, low-confidence, missing-signal, and fail-open decisions do not pin.

The CLI currently exposes these fields on a type: deterministic entry in a routes: bundle loaded with --routing-profiles. The Rust llm-routing profile loaded by switchyard serve --config does not yet expose them. See Session Affinity for YAML and How session affinity composes for the interaction with routing decisions.

If the per-request classifier cost is too high, use Cascade Routing, which can route many turns from tool and agent-progress signals without an extra classifier call.