What Is AgentGateway? The AI-Native Gateway, Explained for Newbies and Pros
Spend a week building with AI agents and you hit the same wall I did. The moment there's more than one agent, model, or tool in play, nothing is actually in charge of the traffic moving between them.
Two protocols took a swing at the chaos. MCP (Model Context Protocol) standardizes how an agent talks to tools; think of it as a USB-C port for tools. A2A (Agent-to-Agent) standardizes how agents hand work off to each other. Both are genuinely useful. The problem is that they only describe how the messages are shaped. They say nothing about who's allowed to call what, what it costs you, or how you'd debug any of it in production. No authentication, no authorization, no spend control, no audit trail.
AgentGateway is the layer that fills that gap. This is Post 1 of the series, so I'll keep it grounded: what AgentGateway actually is, the four modes it runs in, and the two bits of jargon you'll trip over everywhere else, data plane and control plane.
The short version
AgentGateway is an open-source, Rust-based proxy that sits between your apps and everything they talk to (LLMs, tools, and other agents) and adds the security, governance, and observability the raw protocols leave out.
A few things worth knowing before we go deeper:
It's open source under Apache 2.0, and it's written in Rust, which matters here because a lot of agent traffic rides on long-lived connections that a slower proxy would choke on.
It's hosted by the Linux Foundation and recently joined the Agentic AI Foundation (AAIF). Microsoft, T-Mobile, Dell, CoreWeave, and Akamai are among the backers.
It speaks MCP and A2A natively, but it also handles ordinary HTTP and gRPC. So it isn't a bolt-on "AI sidebar" living next to your real infrastructure.
That last point is really the whole pitch. One gateway for your AI traffic and your normal API traffic, instead of running and securing two separate things.
The four modes
I find the easiest way to hold AgentGateway in your head is "three agentic patterns, plus the boring traditional one." But before the picture of it working, look at the picture of life without it.
With no gateway, every app wires straight to every backend. Each one carries its own API keys, its own auth, its own retry logic, its own logging, all of it duplicated, and nothing anywhere has a full view of what's calling what. With N apps and M backends, you've built yourself an N Ă— M tangle:
flowchart LR
A1["App / Agent 1"]
A2["App / Agent 2"]
A3["App / Agent 3"]
L["LLM providers"]
T["MCP tool servers"]
AG["Other agents"]
M["REST APIs"]
A1 --> L
A1 --> T
A1 --> AG
A1 --> M
A2 --> L
A2 --> T
A2 --> AG
A2 --> M
A3 --> L
A3 --> T
A3 --> AG
A3 --> M
AgentGateway collapses that mesh down to a single governed front door. Same proxy, same policies, four kinds of backend behind it:
flowchart LR
A["Apps & Agents"] --> GW{{"AgentGateway"}}
GW -->|"LLM mode"| L["LLM providers<br/>OpenAI · Anthropic · Gemini · Bedrock"]
GW -->|"MCP mode"| T["MCP tool servers"]
GW -->|"A2A mode"| AG["Other agents"]
GW -->|"HTTP/gRPC mode"| M["REST APIs & microservices"]
1. LLM mode (agent to LLM)
This is the mode most teams reach for first. You point your apps at one OpenAI-compatible endpoint, and AgentGateway deals with the mess behind it: spreading requests across OpenAI, Anthropic, Gemini, Bedrock, or whatever model you're self-hosting, and failing over when one of them falls down.
The load balancing is smarter than round-robin. It uses Power of Two Choices (P2C): it samples two providers and sends the request to whichever looks healthier on latency and pending load, so a struggling provider quietly sheds traffic on its own. On top of that you get token-based rate limits per user, team, or key. That's your guard against what folks have started calling "denial-of-wallet," a runaway agent loop chewing through your budget while you sleep.
Here's a minimal two-provider config, OpenAI and Gemini behind a single route:
# yaml-language-server: $schema=https://agentgateway.dev/schema/config
binds:
- port: 3000
listeners:
- routes:
- backends:
- ai:
groups:
- providers:
- name: openai
provider:
openAI:
model: gpt-3.5-turbo # optional; overrides the model in requests
backendAuth:
key: "$OPENAI_API_KEY"
- name: gemini
provider:
gemini:
model: gemini-1.5-flash-latest
backendAuth:
key: "$GEMINI_API_KEY"
Your app code never changes. It keeps calling a normal /chat/completions endpoint, and the failover, key handling, and cost attribution all happen at the gateway.
2. MCP mode (agent to tool)
Picture an agent that needs GitHub, a database, and a search API. The obvious move is to wire it directly to all three MCP servers. AgentGateway lets it talk to one MCP endpoint instead, and federates the servers behind that.
The security payoff is per-session tool filtering. A given client only ever sees the tools it's allowed to use, which is where RBAC and defenses against tool-poisoning attacks come into play.
# yaml-language-server: $schema=https://agentgateway.dev/schema/config
binds:
- port: 3000
listeners:
- routes:
- policies:
cors:
allowOrigins: ["*"]
allowHeaders: ["*"]
exposeHeaders:
- "Mcp-Session-Id" # required so sessions persist across calls
backends:
- mcp:
targets:
- name: mcp
mcp:
host: http://localhost:3005/mcp/
Worth a note for anyone who's run gateways before, because this is exactly where the older ones fall over. MCP is stateful. A client and server hold a long-lived session, messages travel both directions, and one "list tools" call can fan out to several backends and come back aggregated. AgentGateway pins each session to a backend and packs the session state into the Mcp-Session-Id (encrypted with AES-256-GCM), so every follow-up call lands back on the same server. A gateway designed around stateless request/response simply can't model that, which is why so many of them buckle the first time you point real MCP traffic at them.
3. A2A mode (agent to agent)
Agent A hands a long-running task to Agent B. A2A defines the shape of that handoff, but not the governance around it: who's allowed to delegate to whom, and how you trace the thing once it's running. Put AgentGateway in the middle and you get authentication, authorization, and end-to-end tracing across that boundary. Those happen to be the parts A2A leaves entirely up to you.
4. HTTP/gRPC mode (the unified part)
This is the mode people forget, and it's arguably the reason adopting AgentGateway pays off at all. Your plain REST and microservice traffic runs through the same gateway, with the same load balancing, timeouts, retries, TLS, and authorization you've configured everywhere else. You can even take an existing REST API and expose it as an MCP tool. No standing up a separate "AI gateway" beside your "real" one. It's one proxy and one policy set covering both.
Data plane vs. control plane
These two terms turn up in every gateway and service-mesh doc going, and they reliably trip people up. Here's the version that finally stuck for me.
Picture a restaurant:
The data plane is the floor staff and the kitchen. They take every order and carry every plate, handling each table (each request) in real time.
The control plane is the manager in the back office. They never serve a table themselves. They write the menu, set the rules, decide which section each server covers, update the specials, then push all that out to the floor.
flowchart LR
A["Apps & Agents"] --> GW["AgentGateway"]
GW --> L["LLM mode"]
L --> L_providers["LLM providers"]
L_providers --> L_providers_text["OpenAI · Anthropic · G"]
GW --> T["MCP mode"]
T --> T_servers["MCP tool servers"]
GW --> AG["A2A mode"]
AG --> AG_agents["Other agents"]
GW --> M["HTTP/gRPC mode"]
M --> M_apis["REST APIs & microservices"]
Mapped onto the product: AgentGateway itself is the data plane, the Rust proxy doing the real-time work on every request, from routing to auth to rate limiting to tracing. The control plane is whatever configures those proxies. AgentGateway accepts config dynamically over an xDS interface with no downtime, and it conforms to the Kubernetes Gateway API. On Kubernetes, the recommended control plane for spinning up and managing AgentGateway proxies is kgateway.
If you remember one sentence, make it this: the data plane handles and secures every request in real time, and the control plane decides the rules and pushes them down. You can run AgentGateway standalone with a static config.yaml (the snippets above are exactly that) or let a control plane drive it inside Kubernetes. Same proxy underneath, either way.
Key terms, in one place
| Term | What it means here |
|---|---|
| MCP | Model Context Protocol. How agents call tools and data. Stateful, JSON-RPC. |
| A2A | Agent-to-Agent. How agents delegate tasks to each other. |
| Data plane | The proxy that moves and inspects every request. AgentGateway itself. |
| Control plane | The component that configures the proxies, e.g. kgateway on Kubernetes. |
| xDS | The interface for pushing config to the proxy live, with no restart. |
| P2C | Power of Two Choices. The load-balancing strategy across LLM providers. |
| Tool federation | Exposing many MCP servers through one endpoint, filtered per client. |
| Denial-of-wallet | A cost attack via runaway token usage. Stopped with token rate limits. |
Where this is going
So that's the idea. AgentGateway gives your agents, models, tools, and ordinary APIs one governed front door, running in four modes (LLM, MCP, A2A, HTTP/gRPC), with a clean split between the proxy that does the work and the control plane that sets the rules.
Next time I'll get hands-on: install the binary, run the LLM config from above, and actually watch failover and spend limits kick in. Bring a couple of API keys.
Further reading
AgentGateway docs, Introduction: https://agentgateway.dev/docs/standalone/main/about/introduction/
Multiple LLM providers (P2C load balancing): https://agentgateway.dev/docs/standalone/main/llm/providers/multiple-llms/
Connect to MCP servers (Streamable HTTP): https://agentgateway.dev/docs/standalone/main/mcp/connect/http/
Linux Foundation announcement: https://www.linuxfoundation.org/press/linux-foundation-welcomes-agentgateway-project-to-accelerate-ai-agent-adoption-while-maintaining-security-observability-and-governance