Security November 13th, 2025 4 min read

Zero-Trust Networking for AI Agents: Protecting Your Internal VPC

Preventing SSRF and lateral movement by dynamically provisioning Kubernetes Network Policies for untrusted workloads.

AstraQ
By Team Astraq
Zero-Trust Networking for AI Agents: Protecting Your Internal VPC

In Hardening the Control Plane: From Pods to MicroVMs, we solved the compute isolation problem by wrapping untrusted agents in Firecracker MicroVMs.

But computation is only half the battle. AI agents are useful because they connect to things. They need to crawl websites, call external APIs, and fetch data.

However, giving an untrusted agent unrestricted network access is a massive security risk. Server-Side Request Forgery (SSRF) is the #1 threat here. If an agent can curl your minimal Metadata Service (169.254.169.254) or ping your internal database, your isolation is broken.

In this post, we'll upgrade our Go Control Plane to automatically provision Zero-Trust Network Policies.

The Goal: "Default Deny"

We want a networking model where:

  1. Ingress is Blocked: No one can call the agent (except maybe Prometheus).
  2. Internal Egress is Blocked: The agent cannot talk to other Pods, Services, or the Node IP.
  3. Public Egress is Allowed: The agent can reach the public internet (Google, OpenAI, etc.).

Default Deny Policy

We don't want to rely on manual configuration. We'll update our Go FirecrackerRuntime to create a NetworkPolicy alongside every Pod.

go

package kubernetes

import (
    networkingv1 "k8s.io/api/networking/v1"
    metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)

func (f *FirecrackerRuntime) createNetworkPolicy(ctx context.Context, id string) error {
    policy := &networkingv1.NetworkPolicy{
        ObjectMeta: metav1.ObjectMeta{
            Name: "isolate-" + id,
            Labels: map[string]string{
                "managed-by": "orchestrator",
            },
        },
        Spec: networkingv1.NetworkPolicySpec{
            PodSelector: metav1.LabelSelector{
                MatchLabels: map[string]string{
                    "job-id": id, // Target the specific agent
                },
            },
            PolicyTypes: []networkingv1.PolicyType{
                networkingv1.PolicyTypeEgress,
            },
            Egress: []networkingv1.NetworkPolicyEgressRule{
                {
                    // RULE 1: Allow DNS (Essential!)
                    To: []networkingv1.NetworkPolicyPeer{
                        {
                            NamespaceSelector: &metav1.LabelSelector{},
                            PodSelector: &metav1.LabelSelector{
                                MatchLabels: map[string]string{"k8s-app": "kube-dns"},
                            },
                        },
                    },
                    Ports: []networkingv1.NetworkPolicyPort{
                        {Port: &intstr.IntOrString{IntVal: 53}, Protocol: &UDP},
                        {Port: &intstr.IntOrString{IntVal: 53}, Protocol: &TCP},
                    },
                },
                {
                    // RULE 2: Block Private IP Ranges (CIDR)
                    // Note: K8s NetworkPolicies are typically "Allow" lists.
                    // To implement "Allow Public, Block Private", we usually assign
                    // an "Allow-All" rule EXCLUDING private CIDRs using a CNI like Cilium.
                    //
                    // Standard K8s NetworkPolicy doesn't support "Deny" rules directly.
                    // We must use specific IPBlock "Except" clauses.
                    To: []networkingv1.NetworkPolicyPeer{
                        {
                            IPBlock: &networkingv1.IPBlock{
                                CIDR: "0.0.0.0/0",
                                Except: []string{
                                    "10.0.0.0/8",     // VPC Private Class A
                                    "172.16.0.0/12",  // VPC Private Class B
                                    "192.168.0.0/16", // VPC Private Class C
                                    "169.254.169.254/32", // AWS/Cloud Metadata
                                },
                            },
                        },
                    },
                },
            },
        },
    }

    _, err := f.clientset.NetworkingV1().NetworkPolicies(f.namespace).Create(
        ctx,
        policy,
        metav1.CreateOptions{},
    )
    return err
}

Integrating with Provision()

We simply add this call to our existing Provision method.

go

func (f *FirecrackerRuntime) Provision(ctx context.Context, id string, spec runtime.Spec) error {
    // 1. Create the Pod (as seen in Part 2)
    // ... pod creation logic ...

    // 2. Lock down the network
    if err := f.createNetworkPolicy(ctx, id); err != nil {
         // If policy fails, DELETE the pod to fail safe.
         // Do not leave an un-isolated pod running.
         f.Teardown(ctx, id)
         return fmt.Errorf("failed to secure network: %w", err)
    }

    return nil
}

Advanced: Egress Gateways

Sometimes, you need all agent traffic to leave from a Static IP (e.g., to whitelist your agents on a partner's API).

In standard K8s, all pods masquerade as the Node IP. To fix this, we can use a Cilium Egress Gateway by creating a CiliumEgressGatewayPolicy CRD.

manifests/egress-policy.yaml

apiVersion: cilium.io/v2
kind: CiliumEgressGatewayPolicy
metadata:
  name: agent-static-exit
spec:
  selectors:
    - podSelector:
        matchLabels:
          managed-by: orchestrator
          security: high
  destinationCIDRs:
    - "0.0.0.0/0"
  egressGateway:
    nodeSelector:
      matchLabels:
        egress-gateway: "true"
    egressIP: "203.0.113.50" # Your static Elastic IP

The policy selects our secure agent pods by label and routes their external traffic through a dedicated gateway node with a stable IP. This IP must be pre-configured on the gateway node's network interface.

Conclusion

We have now built a "Fort Knox" for AI agents:

  1. Compute: Hardware isolation via Firecracker (Part 2).
  2. Network: Protocol-level isolation via K8s NetworkPolicies (Part 3).
  3. Control: Orchestrated via a unified Go interface (Part 1).

This architecture allows you to sleep at night while thousands of user-defined agents crawl the web and execute code on your platform.