Skip to main content

Command Palette

Search for a command to run...

What's New in AgentGateway 1.3: An LLM-First UI, Real Cost Tracking, and Virtual Models

The biggest LLM-consumption release yet: a purpose-built UI, per-token cost attribution, and virtual models that reroute traffic without touching a single client.

Updated
8 min readView as Markdown
What's New in AgentGateway 1.3: An LLM-First UI, Real Cost Tracking, and Virtual Models
M
I am a software architect with over a decade of experience in architecting and building software solutions.

In the earlier posts we did things the manual way: grab the Windows binary, run it, and hand-write config.yaml line by line so you could see every bind, listener, route, and backend. That groundwork still matters. But AgentGateway 1.3 changes how much of it you'd actually do by hand, especially on the LLM side, so this post steps off the roadmap for one release special to catch up.

If you're new here, you'll walk away knowing what an AI-native gateway's UI gives you and why "cost" and "routing" now live in the gateway instead of your code. If you already route LLM traffic through something, this is the post where you see which of your custom bits just became built-in.

I'm on AgentGateway v1.3.1 (June 2026). One honest note up front: the headline features below all landed in v1.3.0. v1.3.1 is a small patch on top (a few bug fixes), so I'm treating 1.3 as the feature line and pinning 1.3.1 as the exact build.

The new UI, rebuilt around how you consume LLMs

Before 1.3, the UI was mostly a window into Gateway-API style config. The 1.3 UI reorganizes everything into three native views, and the split tells you a lot about how the project sees the world now:

  • LLM holds models, providers, policies, guardrails, costs, and virtual API keys.

  • MCP holds servers, tools, resources, and auth.

  • Traffic is the classic HTTP/gRPC Gateway-API experience you already know.

On first launch you pick the capabilities you want and start there, instead of being dropped into a generic config tree. Onboarding a model is now "point an incoming model match at a provider and save," and the same screen lets you add provider-backed models or virtual models. Policies like CORS, API keys, JWT, OIDC, external authz, and rate limiting are configurable per model and visible at a glance, and guardrails (built-in detectors, regex, webhooks, OpenAI moderation, Bedrock Guardrails, Google Model Armor) attach from the same panel.

If you're a pro: if you've ever run a separate admin console for your proxy and a second tool for your model config, this is those two things collapsed into one layer.

Old UI:

New UI:

AI Cost & Analysis: every token and dollar, attributed

This is the feature I'd lead with if you only read one section. AgentGateway 1.3 turns every request into a measured, attributed, exportable data point. You configure cost rates per model or import the official provider tables, and from then on each request carries its exact input/output/total token counts and a dollar figure, surfaced in logs, traces, metrics, the UI, and agctl.

The real unlock is attribution. The Analytics view slices usage by tokens or by dollars across the dimensions that actually come up in a budget meeting: per user, per team, per model, per provider, and per coding tool or agent (so Claude Code, Cursor, GitHub Copilot, and your own agents show up as separate line items). "How much did the support team spend on Claude through Cursor this week?" becomes a two-filter question instead of a data-engineering project. Because LLM, MCP, and A2A traffic all flow through the same gateway, the same view covers your whole agentic stack, not just chat completions.

For the newcomer: this is the thing finance always asks for and nobody can ever produce, generated automatically because the gateway sees every call.

For the pro: if you've ever scraped provider dashboards into a spreadsheet to do chargeback, this is that job, done at the proxy, with the attribution already attached. And since cost data lives next to your auth and rate-limit policies, you can act on it: set budgets, alert on spend, or route a cost-sensitive caller to a cheaper model with the same policy engine.

Virtual models: route smarter without client changes

Real routing is rarely "send everything to one model." You want to A/B a new release against the incumbent, fall back when your primary is throttled, or send long-context requests somewhere with a bigger window. That logic used to live in every client, or in a custom proxy nobody wanted to maintain.

A virtual model is a synthetic model: it has a normal-looking name, but instead of pointing at one backend it applies a routing strategy across several real models. The client just sends the virtual name in the request body, and the gateway decides where each request goes.

Three strategies ship in 1.3. Weighted splits traffic by percentage, so you can send most requests to production and a slice to a candidate and compare quality or cost before committing. Failover orders models by preference and retries the next one when the primary errors or gets rate-limited, so a provider hiccup degrades gracefully. Conditional branches on the request itself using CEL, routing by user tier, prompt length, headers, or anything the gateway can see.

The payoff is decoupling. Routing policy lives in one place owned by the platform team, and flipping an A/B split or adding a fallback takes effect everywhere at once, with no client changes.

Pro bridge: if you've baked retry-and-fallback logic into your SDK wrapper, this is that, lifted out of your code and into config.

Reuse, and a provider explosion

A few changes that make managing many models less painful. Providers and guardrails are now reusable: define one once and reference it across many models, which matters the moment you're juggling dozens of OpenAI-compatible endpoints. Guardrails can be declared once as a shared top-level resource instead of repeated on every route, and they now apply to streaming responses too, with a failureMode on webhook guards so you choose fail-open or fail-closed. There's a proper custom-provider path for anything without first-class support, which replaces the old "OpenAI provider plus a custom base URL" hack. And 1.3 adds 13 new first-class providers: Mistral, Hugging Face, Cohere, Groq, Fireworks, DeepSeek, xAI, Together AI, OpenRouter, Cerebras, DeepInfra, Baseten, and Ollama.

The stuff that actually bit me

A few sharp edges that come specifically from being on 1.3.1 rather than reading the 1.3.0 announcement:

  • The agctl CLI got reorganized, and it's a breaking change. Inspection and tracing commands now live under a proxy parent. agctl config all ... is now agctl proxy config all ..., and agctl trace ... is now agctl proxy trace .... If you have scripts or docs from an older build, they break silently with a "command not found" style error until you update them. There are new agctl version and agctl proxy log / agctl controller log commands too.

  • "Share one port for MCP and LLM" is in the 1.3 release notes, but it was reverted in 1.3.1. If you read the feature list and try to serve both on a single listener port on 1.3.1, it won't work. Run them on separate ports, or check the GitHub releases for the version where it lands again. This is exactly why pinning the patch version matters.

  • The cost numbers depend on a catalog you have to supply. Dollar figures aren't magic. You either configure rates per model or import the provider tables (agctl costs import can generate one). Skip that and your token counts will be right but your dollars will be empty or wrong.

  • New providers use baseUrl, not the old host/path override fields. If you copy a pre-1.3 snippet for a custom or OpenAI-compatible endpoint, expect it to not validate until you switch to baseUrl.

Where this leaves you

You now know what the 1.3 line actually moved into the gateway: a UI organized around LLM, MCP, and Traffic; real per-token and per-dollar cost with attribution you can hand to finance; virtual models that let you A/B, fail over, and branch on requests without client changes; and reusable providers and guardrails across a much bigger provider list. Even on a "no build" read like this one, the mental model is the thing to keep: caller sends a model name, the gateway decides the backend, and every concern (cost, auth, guardrails) attaches at that layer.

Next post we get back on the roadmap and go hands-on: multiplexing several real MCP servers behind one URL, now with the new MCP view to watch the federation happen.


Setup I used: AgentGateway v1.3.1 (binary), Windows 11 with WSL2 (Ubuntu), Node LTS for npx-based MCP servers.

Handy links