Zero-Trust Networking for AI Agents: Protecting Your Internal VPC
Preventing SSRF and lateral movement by dynamically provisioning Kubernetes Network Policies for untrusted workloads.

Series: Secure Workload Orchestration
- Part 1Building a Polymorphic Workload Orchestrator with Go and Kubernetes
- Part 2Hardening the Control Plane: From Pods to MicroVMs
- Part 3Zero-Trust Networking for AI Agents: Protecting Your Internal VPC
- Part 4Enterprise Audit Logging and Monitoring: The Missing Piece of the Control Plane
- Part 5Control Plane Resilience: Handling Cascading Failures in Distributed Orchestration
In Hardening the Control Plane: From Pods to MicroVMs, we solved the compute isolation problem by wrapping untrusted agents in Firecracker MicroVMs.
But computation is only half the battle. AI agents are useful because they connect to things. They need to crawl websites, call external APIs, and fetch data.
However, giving an untrusted agent unrestricted network access is a massive security risk. Server-Side Request Forgery (SSRF) is the #1 threat here. If an agent can curl your minimal Metadata Service (169.254.169.254) or ping your internal database, your isolation is broken.
In this post, we'll upgrade our Go Control Plane to automatically provision Zero-Trust Network Policies.
The Goal: "Default Deny"
We want a networking model where:
- Ingress is Blocked: No one can call the agent (except maybe Prometheus).
- Internal Egress is Blocked: The agent cannot talk to other Pods, Services, or the Node IP.
- Public Egress is Allowed: The agent can reach the public internet (Google, OpenAI, etc.).
Default Deny Policy
We don't want to rely on manual configuration. We'll update our Go FirecrackerRuntime to create a NetworkPolicy alongside every Pod.
go
package kubernetes
import (
networkingv1 "k8s.io/api/networking/v1"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
func (f *FirecrackerRuntime) createNetworkPolicy(ctx context.Context, id string) error {
policy := &networkingv1.NetworkPolicy{
ObjectMeta: metav1.ObjectMeta{
Name: "isolate-" + id,
Labels: map[string]string{
"managed-by": "orchestrator",
},
},
Spec: networkingv1.NetworkPolicySpec{
PodSelector: metav1.LabelSelector{
MatchLabels: map[string]string{
"job-id": id, // Target the specific agent
},
},
PolicyTypes: []networkingv1.PolicyType{
networkingv1.PolicyTypeEgress,
},
Egress: []networkingv1.NetworkPolicyEgressRule{
{
// RULE 1: Allow DNS (Essential!)
To: []networkingv1.NetworkPolicyPeer{
{
NamespaceSelector: &metav1.LabelSelector{},
PodSelector: &metav1.LabelSelector{
MatchLabels: map[string]string{"k8s-app": "kube-dns"},
},
},
},
Ports: []networkingv1.NetworkPolicyPort{
{Port: &intstr.IntOrString{IntVal: 53}, Protocol: &UDP},
{Port: &intstr.IntOrString{IntVal: 53}, Protocol: &TCP},
},
},
{
// RULE 2: Block Private IP Ranges (CIDR)
// Note: K8s NetworkPolicies are typically "Allow" lists.
// To implement "Allow Public, Block Private", we usually assign
// an "Allow-All" rule EXCLUDING private CIDRs using a CNI like Cilium.
//
// Standard K8s NetworkPolicy doesn't support "Deny" rules directly.
// We must use specific IPBlock "Except" clauses.
To: []networkingv1.NetworkPolicyPeer{
{
IPBlock: &networkingv1.IPBlock{
CIDR: "0.0.0.0/0",
Except: []string{
"10.0.0.0/8", // VPC Private Class A
"172.16.0.0/12", // VPC Private Class B
"192.168.0.0/16", // VPC Private Class C
"169.254.169.254/32", // AWS/Cloud Metadata
},
},
},
},
},
},
},
}
_, err := f.clientset.NetworkingV1().NetworkPolicies(f.namespace).Create(
ctx,
policy,
metav1.CreateOptions{},
)
return err
}Not all Kubernetes CNIs support IPBlock exceptions correctly. We highly
recommend Cilium or Calico for enforcement. AWS VPC CNI supports
policies but requires the separate Network Policy Agent.
Integrating with Provision()
We simply add this call to our existing Provision method.
go
func (f *FirecrackerRuntime) Provision(ctx context.Context, id string, spec runtime.Spec) error {
// 1. Create the Pod (as seen in Part 2)
// ... pod creation logic ...
// 2. Lock down the network
if err := f.createNetworkPolicy(ctx, id); err != nil {
// If policy fails, DELETE the pod to fail safe.
// Do not leave an un-isolated pod running.
f.Teardown(ctx, id)
return fmt.Errorf("failed to secure network: %w", err)
}
return nil
}Advanced: Egress Gateways
Sometimes, you need all agent traffic to leave from a Static IP (e.g., to whitelist your agents on a partner's API).
In standard K8s, all pods masquerade as the Node IP. To fix this, we can use a Cilium Egress Gateway by creating a CiliumEgressGatewayPolicy CRD.
manifests/egress-policy.yaml
apiVersion: cilium.io/v2
kind: CiliumEgressGatewayPolicy
metadata:
name: agent-static-exit
spec:
selectors:
- podSelector:
matchLabels:
managed-by: orchestrator
security: high
destinationCIDRs:
- "0.0.0.0/0"
egressGateway:
nodeSelector:
matchLabels:
egress-gateway: "true"
egressIP: "203.0.113.50" # Your static Elastic IPThe policy selects our secure agent pods by label and routes their external traffic through a dedicated gateway node with a stable IP. This IP must be pre-configured on the gateway node's network interface.
Conclusion
We have now built a "Fort Knox" for AI agents:
- Compute: Hardware isolation via Firecracker (Part 2).
- Network: Protocol-level isolation via K8s NetworkPolicies (Part 3).
- Control: Orchestrated via a unified Go interface (Part 1).
This architecture allows you to sleep at night while thousands of user-defined agents crawl the web and execute code on your platform.