Routing requests by cost, tool use, or token count with Ringlet

A profile binds an agent to one provider. That covers most cases. But sometimes you want one agent to use different providers for different requests inside the same conversation:

Small completions go to Groq (fast, cheap).
Tool-use turns go to Anthropic (best at structured tool calls).
Anything over 100K tokens goes to a model with the right context window.

For that, Ringlet ships an optional routing proxy called ultrallm. Attach it to a profile and rules decide where each request goes.

When you need routing (and when you don’t)

You probably don’t need routing if:

You only use one agent and one provider.
Cost differences between providers don’t matter at your scale.
Latency variance from cross-provider routing would annoy you more than it would save.

You probably do want routing if:

You’re cost-sensitive and the agent makes a lot of small completions.
You’ve split a workload across two providers manually with shell wrappers and would rather it be declarative.
You want to A/B test “Claude Sonnet for the hard turns, Haiku for everything else” without rewriting the agent.

How to attach the proxy to a profile

ringlet profiles set work --proxy ultrallm

That tells Ringlet to launch ultrallm as a sidecar when the profile runs, with the agent pointed at 127.0.0.1:<port> instead of the provider directly. The proxy reads ~/.ringlet/profiles/work/routing.toml for its rules.

A first routing config

# ~/.ringlet/profiles/work/routing.toml

[[rule]]
match = { tool-use = true }
to    = "anthropic"
model = "claude-sonnet-4-5"

[[rule]]
match = { input-tokens-lt = 2000 }
to    = "groq"
model = "llama-3.3-70b-versatile"

[[rule]]
match = { input-tokens-gt = 100000 }
to    = "anthropic"
model = "claude-sonnet-4-5"   # long context

[default]
to    = "anthropic"
model = "claude-haiku-4-5"

Rules are evaluated top to bottom. First match wins. [default] runs if nothing else matches.

Now when the agent makes a request, ultrallm inspects the payload, finds the matching rule, swaps the base URL and model name, and forwards the request. The agent thinks it talked to one provider; in reality, the routing layer picked the right one per-request.

Available match conditions

Condition	Meaning
`tool-use = true`	Request includes one or more tools.
`input-tokens-lt = N`	Estimated input token count < N.
`input-tokens-gt = N`	Estimated input token count > N.
`messages-gt = N`	Conversation has more than N turns (proxy for “long context”).
`has-system-prompt = true`	Request includes a system prompt.
`model-equals = "..."`	The original request asked for this model.

The match conditions are designed to be cheap — Ringlet doesn’t deserialize the full message body, just enough to evaluate the rule.

Observability

ringlet usage already shows you per-profile spend. With routing, you also get per-rule spend:

$ ringlet usage --profile work --by-rule

RULE                        REQUESTS    TOKENS         COST
tool-use → anthropic        342         1.1M / 280K   $7.81
input-tokens-lt → groq      1284        980K / 220K   $0.18
default → anthropic         128         180K / 45K    $1.10
─────────────────────────────────────────────────────────
TOTAL                                                  $9.09

If a rule fires 0 times, that’s a signal it’s not useful. If 90% of your traffic hits the default rule, that’s a signal your rules aren’t aggressive enough.

Caveats and gotchas

Latency. Cross-provider routing adds 5–20ms in the local hop. Not a problem for most workflows; if you’re running a latency-sensitive automation, measure.
Tool calling shape varies. Anthropic and OpenAI both have tool calling, but the wire format is slightly different. ultrallm handles the translation for Anthropic-shaped agents; for OpenAI-shaped agents, route only between OpenAI-compatible providers.
Streaming. All ultrallm-supported providers stream. If you route to a provider that doesn’t stream, the agent may stall.
Caching. Anthropic’s prompt caching has provider-specific semantics. Routing the same conversation between Anthropic and another provider invalidates the cache.

When to use this vs LiteLLM

LiteLLM is a fuller-featured routing proxy with budgets, retries, fallbacks, and a multi-tenant control plane. ultrallm is a stripped-down version optimised for “one profile, a handful of rules, no separate service to run.”

If you’re already running LiteLLM, point Ringlet at LiteLLM and let LiteLLM do the routing. If you don’t want another service in the stack, ultrallm is enough for most single-developer setups.

Honesty: this is the part of Ringlet that’s least mature

Profiles, isolation, and cost tracking are 0.1.0-stable. Routing is shipping but recent — expect rougher edges, smaller default rule sets, and changes to the TOML schema between minor versions. We document it in the docs but treat it as a power-user feature rather than a default.

If you want bullet-proof routing with audit and budgets, run LiteLLM. If you want declarative per-profile rules without operating a separate service, ultrallm.