how-to7 min·updated 2026-06-13

how to secure ai agents in kubernetes production

the short answer

to secure ai agents in kubernetes production, give each agent its own least-privilege serviceaccount and rbac role, route every cluster-mutating call through an in-line approval proxy, and require a human to confirm destructive verbs (delete, drain, scale-to-zero, patch on production namespaces) before they reach the api server. the agent keeps its speed on read-only and safe writes; the dangerous 1% gets caught.

67%

Red Hat State of Kubernetes Security 2024 — 67% of respondents delayed or slowed a deployment over a security concern

ai agents are increasingly trusted to operate clusters: triaging crashlooping pods, scaling deployments, rolling restarts, cleaning up stale resources. that's genuinely useful. the problem is that a single hallucinated command — kubectl delete namespace payments, or a scale --replicas=0 on the wrong deployment — is indistinguishable from a legitimate one until it has already executed. red hat's state of kubernetes security 2024 found that 67% of organizations had delayed or slowed a deployment because of a security concern, which tells you how nervous teams already are about what touches the cluster.

1. scope rbac to the agent, not to a human

start from zero. create a dedicated serviceaccount per agent and bind it to a role that grants only the verbs and resources it actually needs. an agent that reads pod logs and restarts deployments does not need delete on secrets, namespaces, or persistentvolumes. avoid cluster-admin entirely. if the agent works across namespaces, prefer several narrow rolebindings over one broad clusterrolebinding. this is the same access-control discipline we cover in our guide to ai agent access control for devops and sre teams — kubernetes just makes the blast radius bigger.

2. intercept destructive verbs in-line

rbac decides what is possible; it can't decide what is wise in this specific moment. that's where an interception layer earns its place. point the agent's kube client (or the tool that wraps kubectl) at an agent.shield proxy instead of the raw api server. safe, read-only traffic forwards untouched. anything matching a destructive policy is held and surfaced for review.

delete on deployments, statefulsets, namespaces, pvcs, and secrets
scale to zero or drastic replica reductions on production namespaces
drain or cordon on nodes
patch or apply that changes resource limits or image tags in prod
anything touching a namespace you've labelled protected

3. keep a human in the loop for the irreversible 1%

the goal is not to slow the agent down on everything — it's to put a person on the one decision that can't be undone. when a held request lands in the review queue, the reviewer sees the exact verb, resource, namespace, and payload, approves to forward it to the real api server, or denies to stop it cold. this is the same pattern we describe in human-in-the-loop security for ai operations, applied to the cluster.

rbac says what an agent could do. an approval proxy says what it may do, right now, with a name attached to the decision.

4. log every cluster action

kubernetes audit logs are good but noisy. pair them with an agent-level record that ties each intercepted call to the policy it matched and the human who approved or denied it. when an incident review asks how a namespace disappeared, you want one timeline, not a grep across api-server logs. see logging and auditing ai agent actions in production for how to structure that trail.

frequently asked questions

does this require changing my agent's code?+

no. you point the agent's kubernetes client or kubectl wrapper at the proxy url instead of the api server. auth, payloads, and verbs pass through unchanged — only destructive calls are held for review.

won't an approval step slow down incident response?+

only on destructive actions. read-only triage, log pulls, and safe restarts forward instantly. the approval gate applies to the handful of irreversible verbs — delete, drain, scale-to-zero — where a five-second pause is cheaper than an outage.

is rbac alone not enough?+

rbac is necessary but static. it grants a capability for as long as the binding exists; it can't judge whether deleting this namespace right now is correct. an in-line approval layer adds that just-in-time judgment on top of rbac.

what about agents using the kubernetes api directly instead of kubectl?+

same approach. the proxy sits in front of the api server endpoint the agent calls. it inspects method, path, and body, so a raw DELETE to /apis/apps/v1/namespaces/prod/deployments/x is caught just like a kubectl delete.

get started with agent.shield

put a human back in the loop for the actions that can't be undone. no agent rewrite — just a url your agent already knows how to call.

get started with agent.shield see how it works