Back to Home
The AI engineering stack we built internally — on the platform we ship

The AI engineering stack we built internally — on the platform we ship

B
Blizine Admin
·1 min read·0 views

The AI engineering stack we built internally — on the platform we ship

The AI engineering stack we built internally — on the platform we ship The AI engineering stack we built internally — on the platform we ship2026-04-20Ayush ThakurScott Roe-MeschkeRajesh Bhatia14 min readThis post is also available in 日本語 and 한국어.In the last 30 days, 93% of Cloudflare’s R&D organization used AI coding tools powered by infrastructure we built on our own platform.Eleven months ago, we undertook a major project: to truly integrate AI into our engineering stack. We needed to build the internal MCP servers, access layer, and AI tooling necessary for agents to be useful at Cloudflare. We pulled together engineers from across the company to form a tiger team called iMARS (Internal MCP Agent/Server Rollout Squad). The sustained work landed with the Dev Productivity team, who also own much of our internal tooling including CI/CD, build systems, and automation.Here are some numbers that capture our own agentic AI use over the last 30 days:3,683 internal users actively using AI coding tools (60% company-wide, 93% across R&D), out of approximately 6,100 total employees47.95 million AI requests 295 teams are currently utilizing agentic AI tools and coding assistants.20.18 million AI Gateway requests per month241.37 billion tokens routed through AI Gateway51.83 billion tokens processed on Workers AIThe impact on developer velocity internally is clear: we’ve never seen a quarter-to-quarter increase in merge requests to this degree. As AI tooling adoption has grown the 4-week rolling average has climbed from ~5,600/week to over 8,700. The week of March 23 hit 10,952, nearly double the Q4 baseline.MCP servers were the starting point, but the team quickly realized we needed to go further: rethink how standards are codified, how code gets reviewed, how engineers onboard, and how changes propagate across thousands of repos.This post dives deep into what that looked like over the past eleven months and where we ended up. We're publishing now, to close out Agents Week, because the AI engineering stack we built internally runs on the same products we’re shipping and enhancing this week. The architecture at a glance The engineer-facing tools layer (OpenCode, Windsurf, and other MCP-compatible clients) include both open-source and third-party coding assistant tools. Each layer maps to a Cloudflare product or tool we use:What we builtBuilt withZero Trust authenticationCloudflare AccessCentralized LLM routing, cost tracking, BYOK, and Zero Data Retention controlsAI GatewayOn-platform inference with open-weight modelsWorkers AIMCP Server Portal with single OAuthWorkers + AccessAI Code Reviewer CI integrationWorkers + AI GatewaySandboxed execution for agent-generated code (Code Mode)Dynamic WorkersStateful, long-running agent sessionsAgents SDK (McpAgent, Durable Objects)Isolated environments for cloning, building, and testingSandbox SDK — GA as of Agents WeekDurable multi-step workflowsWorkflows — scaled 10x during Agents Week16K+ entity knowledge graphBackstage (OSS)None of this is internal-only infrastructure. Everything (besides Backstage) listed above is a shipping product, and many of them got substantial updates during Agents Week.We’ll walk through this in three acts:The platform layer — how authentication, routing, and inference work (AI Gateway, Workers AI, MCP Portal, Code Mode)The knowledge layer — how agents understand our systems (Backstage, AGENTS.md)The enforcement layer — how we keep quality high at scale (AI Code Reviewer, Engineering Codex) Act 1: The platform layer How AI Gateway helped us stay secure and improve the developer experience When you have over 3,600+ internal users using AI coding tools daily, you need to solve for access and visibility across many clients, use cases, and roles.Everything starts with Cloudflare Access, which handles all authentication and zero-trust policy enforcement. Once authenticated, every LLM request routes through AI Gateway. This gives us a single place to manage provider keys, cost tracking, and data retention policies. The OpenCode AI Gateway overview: 688.46k requests per day, 10.57B tokens per day, routing to four providers through one endpoint.AI Gateway analytics show how monthly usage is distributed across model providers. Over the last month, internal request volume broke down as follows.ProviderRequests/monthShareFrontier Labs (OpenAI, Anthropic, Google)13.38M91.16%Workers AI1.3M8.84%Frontier models handle the bulk of complex agentic coding work for now, but Workers AI is already a significant part of the mix and handles an increasing share of our agentic engineering workloads. How we increasingly leverage Workers AI Workers AI is Cloudflare's serverless AI inference platform which runs open-source models on GPUs across our global network. Beyond huge cost improvements compared to frontier models, a key advantage is that inference stays on the same network as your Workers, Durable Objects, and storage. No cross-cloud hops to deal with, which cause more latency, network flakiness, and additional networking configuration to manage. Workers AI usage in the last month: 51.47B input tokens, 361.12M output tokens.Kimi K2.5, launched on Workers AI in March 2026, is a frontier-scale open-source model with a 256k context window, tool calling, and structured outputs. As we described in our Kimi K2.5 launch post, we have a security agent that processes over 7 billion tokens per day on Kimi. That would cost an estimated $2.4M per year on a mid-tier proprietary model. But on Workers AI, it's 77% cheaper.Beyond security, we use Workers AI for documentation review in our CI pipeline, for generating AGENTS.md context files across thousands of repositories, and for lightweight inference tasks where same-network latency matters more than peak model capability.As open-source models continue to improve, we expect Workers AI to handle a growing share of our internal workloads. One thing we got right early: routing through a single proxy Worker from day one. We could have had clients connect directly to AI Gateway, which would have been simpler to set up initially. But centralizing through a Worker meant we could add per-user attribution, model catalog management, and permission enforcement later without touching any client configs. Every feature described in the bootstrap section below exists because we had that single choke point. The proxy pattern gives you a control plane that direct connections don't, and if we plug in additional coding assistant tools later, the same Worker and discovery endpoint will handle them. How it works: one URL to configure everything The entire setup starts with one command: opencode auth login https://opencode.internal.domain

That command triggers a chain that configures providers, models, MCP servers, agents, commands, and permissions, without the user touching a config file. Step 1: Discover auth requirements. OpenCode fetches config from a URL like https://opencode.internal.domain/.well-known/opencode. This discovery endpoint is served by a Worker and the response has an auth block telling OpenCode how to authenticate, along with a config block with providers, MCP servers, agents, commands, and default permissions: { "auth": { "command": ["cloudflared", "access", "login", "..."], "env": "TOKEN" }, "config": { "provider": { "..." }, "mcp": { "..." }, "agent": { "..." }, "command": { "..." }, "permission": { "..." } } }

Step 2: Authenticate via Cloudflare Access. OpenCode runs the auth command and the user authenticates through the same SSO they use for everything else at Cloudflare. cloudflared returns a signed JWT. OpenCode stores it locally and automatically attaches it to every subsequent provider request.Step 3: Config is merged into OpenCode. The config provided is shared defaults for the entire organization, but local configs always take priority. Users can override the default model, add their own agents, or adjust project and user scoped permissions without affecting anyone else.Inside the proxy Worker. The Worker is a simple Hono app that does three things:Serves the shared config. The config is compiled at deploy time from structured source files and contains placeholder values like {baseURL} for the Worker's origin. At request time, the Worker replaces these, so all provider requests route through the Worker rather than directly to model providers. Each provider gets a path prefix (/anthropic, /openai, /google-ai-studio/v1beta, /compat for Workers AI) that the Worker forwards to the corresponding AI Gateway route.Proxies requests to AI Gateway. When OpenCode sends a request like POST /anthropic/v1/messages, the Worker validates the Cloudflare Access JWT, then rewrites headers before forwarding:

Stripped: authorization, cf-access-token, host Added: cf-aig-authorization: Bearer cf-aig-metadata: {"userId": ""}

The request goes to AI Gateway, which routes it to the appropriate provider. The response passes straight through with zero buffering. The apiKey field in the client config is empty because the Worker injects the real key server-side. No API keys exist on user machines.Keeps the model catalog fresh. An hourly cron trigger fetches the current OpenAI model list from models.dev, caches it in Workers KV, and injects store: false on every model for Zero Data Retention. New models get ZDR automatically without a config redeploy.Anonymous user tracking. After JWT validation, the Worker maps the user's email to a

📰Originally published at blog.cloudflare.com

Comments