At KubeCon Amsterdam last week, CNCF announced that its AI Conformance Program now includes formal validation for agentic workloads. The number of certified platforms grew from 18 to 31 since November PR Newswire, and the new requirements - called KARs - mandate support for stable in-place pod resizing, so inference models can adjust resources without restarting, and workload-aware scheduling to avoid resource deadlocks during distributed training. Cloud Native Computing Foundation

This is the infrastructure layer catching up to where AI is actually heading. The inference-to-training ratio is expected to flip by end of 2026 - from two-thirds training, one-third inference, to the reverse. Efficientlyconnected Agents don't just run once. They run continuously, at scale, generating operational load that looks nothing like a typical web application. The Kubernetes scheduler was not designed for this. Now there's a standard being written for it.

For SREs and platform engineers, this changes the support surface significantly. More AI agents in production means more failure modes that don't fit the patterns you've seen before - more unpredictable resource consumption, more inter-agent dependencies, more alert volume from workloads that are fundamentally harder to reason about.

Investigation tooling has to evolve alongside the workload. At OpsWorker, we already investigate Kubernetes alerts autonomously - no human trigger, root cause to Slack in under 2 minutes. Agentic workloads will raise the bar for what "investigate" actually means.

The infrastructure conformance work CNCF is doing is necessary. But conformance alone doesn't reduce your MTTR when an agent deadlocks at 2 AM.

Source used: "CNCF Nearly Doubles Certified Kubernetes AI Platforms" - CNCF.io, March 24, 2026