Orchestrating a Swarm of AI Agents on Kubernetes

In the previous post, I set up a single shared OpenClaw instance for the team — one AI assistant everyone talks to through Slack, with readonly access to our Elasticsearch data. It works well. But it has a limitation: everyone shares the same agent.

That means shared conversation context, shared skills, shared cron jobs. If one person configures an alerting rule, everyone’s in the same environment. The per-channel-peer session isolation helps, but it’s still one process, one config, one set of customizations.

The natural next step is one agent per person — isolated state, isolated skills, isolated cron jobs. The part I found interesting wasn’t the AI so much as how to run that on Kubernetes: a single orchestrator routing traffic to a swarm of identical worker pods, each with its own persistent volume. OpenClaw happens to be the workload; the pattern is generic.

I never got around to implementing it. We already had a Kubernetes cluster running, which made the shape of the solution fairly obvious — so I wrote up the architecture anyway. This post is that design, not a postmortem.

The architecture

Slack Workspace
      |
      v
+------------------------+
| Orchestrator Bot       |  single Slack app, always running
| (K8s Deployment)       |
| - Routes messages      |
| - Provisions agents    |
| - Manages lifecycle    |
+------------------------+
      |
      +---> Pod: agent-alice
      +---> Pod: agent-bob
      +---> Pod: agent-carol

One Slack app. One orchestrator service that handles all incoming messages. Per-user agent pods that do the actual AI work.

When Alice DMs the bot, the orchestrator would look up her Slack user ID, find her agent pod, and forward the message. Alice’s agent processes it, generates a response, and the orchestrator sends it back through Slack. Alice wouldn’t need to know there’s a dedicated pod running for her — she’d just be talking to “the bot.”

Why Kubernetes

This is a classic controller-worker pattern: one orchestrator Deployment manages lifecycle and routing; N worker pods do the actual work. Kubernetes is a good fit for that swarm model:

If I didn’t have a K8s cluster, I’d be managing EC2 instances per user — dealing with AMIs, auto-stop logic, startup latency, instance lifecycle. All problems K8s has already solved.

The base agent image

Every agent pod would run the same container image, with OpenClaw pre-installed and secure defaults baked in:

FROM node:22-slim

RUN npm install -g openclaw@latest

COPY openclaw-base-config.json /etc/openclaw/base-config.json
COPY entrypoint.sh /entrypoint.sh
RUN chmod +x /entrypoint.sh

ENTRYPOINT ["/entrypoint.sh"]

The entrypoint would handle first-run setup vs. existing user data:

#!/bin/bash
set -e

OPENCLAW_DIR="$HOME/.openclaw"

# First run — copy base config
if [ ! -f "$OPENCLAW_DIR/openclaw.json" ]; then
    mkdir -p "$OPENCLAW_DIR"
    cp /etc/openclaw/base-config.json "$OPENCLAW_DIR/openclaw.json"
fi

# Start the agent
exec openclaw gateway start

On first boot, it copies the base config. On subsequent restarts, it picks up the user’s existing config from the PVC — their customized skills, cron jobs, and preferences persist.

The base config

The base config bakes in security defaults and readonly data access. Every agent would start with the same foundation:

{
  "model": {
    "provider": "anthropic",
    "model": "claude-sonnet-4-20250514"
  },
  "tools": {
    "elevated": { "enabled": false },
    "fs": { "workspaceOnly": true },
    "deny": ["gateway", "sessions_spawn"]
  },
  "skills": {
    "elasticsearch": {
      "enabled": true,
      "config": {
        "readOnly": true
      }
    }
  }
}

Elevated commands disabled. Filesystem restricted. Dangerous tools denied. Elasticsearch access enabled but readonly. Users could add skills and cron jobs on top of this, but they wouldn’t be able to weaken the security baseline.

Kubernetes manifests

Agent pod template

The orchestrator would create one of these per user:

apiVersion: v1
kind: Pod
metadata:
  name: agent-${USER_ID}
  namespace: ai-agents
  labels:
    app: openclaw-agent
    user: ${USER_ID}
spec:
  containers:
    - name: openclaw
      image: your-registry/openclaw-agent:latest
      resources:
        requests:
          memory: "256Mi"
          cpu: "250m"
        limits:
          memory: "512Mi"
          cpu: "500m"
      env:
        - name: OPENCLAW_USER_ID
          value: "${USER_ID}"
      envFrom:
        - secretRef:
            name: openclaw-shared-secrets
      volumeMounts:
        - name: user-data
          mountPath: /home/node/.openclaw
      ports:
        - containerPort: 18789
  volumes:
    - name: user-data
      persistentVolumeClaim:
        claimName: openclaw-${USER_ID}

Each pod gets 256MB RAM requested, 512MB limit. That should be plenty for a single-user agent. On a node with 16GB RAM, you could run ~30 agents comfortably.

Shared secrets

API keys and credentials shared across all agents:

apiVersion: v1
kind: Secret
metadata:
  name: openclaw-shared-secrets
  namespace: ai-agents
type: Opaque
data:
  ANTHROPIC_API_KEY: <base64>
  ES_API_KEY: <base64>
  ES_HOST: <base64>

The Elasticsearch API key would have read-only privileges on specific indexes. Every agent gets the same credential. Even if someone prompt-engineers their way into running arbitrary ES queries, the credential physically can’t write or delete anything.

Network policy

Lock down what agent pods can talk to:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: agent-egress
  namespace: ai-agents
spec:
  podSelector:
    matchLabels:
      app: openclaw-agent
  policyTypes:
    - Egress
  egress:
    # Elasticsearch cluster
    - to:
        - namespaceSelector:
            matchLabels:
              name: data
        - podSelector:
            matchLabels:
              app: elasticsearch
      ports:
        - port: 9200
    # Model API (Anthropic)
    - to:
        - ipBlock:
            cidr: 0.0.0.0/0
      ports:
        - port: 443
    # DNS
    - to: []
      ports:
        - port: 53
          protocol: UDP

Agent pods can reach Elasticsearch on port 9200 and HTTPS endpoints (for the model API). That’s it. No SSH, no internal services, no lateral movement.

The orchestrator bot

This is the piece that ties it together — and the part I haven’t built. It’s a single Slack app that would:

  1. Receive all DMs and mentions
  2. Look up which agent pod belongs to the sender
  3. Forward the message to that pod
  4. Return the response to Slack

And handle provisioning:

  1. User sends /setup-agent (or whatever trigger you choose)
  2. Orchestrator creates a PVC and pod for that user
  3. Waits for the pod to be ready
  4. Registers the mapping: slack_user_id -> pod_service
  5. Responds: “Your agent is ready. Send me a message to start.”

The orchestrator itself would be a small service — a Deployment with one replica. It doesn’t do AI work. It just routes messages and manages pod lifecycle. The user-to-pod mapping could live in a ConfigMap, a small database, or even in-memory (rebuilt on startup by listing pods with the openclaw-agent label).

Open questions I haven’t worked through: how the orchestrator authenticates to agent pods, whether to use raw Pods or Deployments per user, and who gets to trigger provisioning.

The routing logic, at a high level:

on_message(slack_user_id, text):
    pod = lookup_agent_pod(slack_user_id)
    if pod is None:
        reply("No agent found. Send /setup-agent to create one.")
        return
    response = forward_to_pod(pod, text)
    reply(response)

on_command("/setup-agent", slack_user_id):
    if agent_exists(slack_user_id):
        reply("You already have an agent running.")
        return
    create_pvc(slack_user_id)
    create_pod(slack_user_id)
    wait_for_ready(slack_user_id)
    reply("Your agent is ready.")

What each agent would enable

If this worked as designed, each person would get a personal AI assistant that:

The key insight: everyone starts with the same base (readonly data access, security defaults), but each person’s agent diverges over time based on how they use it. The data analyst’s agent becomes good at reporting. The engineer’s agent becomes good at debugging. The PM’s agent becomes good at summarizing.

Estimated cost

Rough back-of-envelope numbers — I haven’t run this in production:

EC2 per userK8s pod per user
Compute~$120/month (t3.xlarge)~$5-10/month (shared node)
StorageEBS volumePVC (1-5 GB)
ManagementAMI updates, instance lifecycleImage updates, pod restarts
ScalingManual or ASGBuilt-in

The main variable cost is still the model API. But that’s per-token, not per-agent — you’re paying for actual usage regardless of how many agents exist.

For a team of 30, I’d expect maybe $150-300/month in additional cluster resources (a couple extra nodes) plus model API costs. Compare that to $600-900/month for individual AI subscriptions that can’t access your data.

Open questions

Things I’d need to figure out before building this:

Takeaways

The pattern is really a Kubernetes swarm: one orchestrator, many isolated worker pods, shared secrets for readonly data access, per-user PVCs for persistent state.

That’s the part worth stealing. OpenClaw is just one possible workload — the same layout works for any per-user service you’d want to provision on demand through a single front door (Slack, HTTP gateway, whatever).

Kubernetes handles the hard parts: scheduling, restarts, resource isolation, network policy, storage. The orchestrator is the only custom piece, and it should be small. Everything else is standard K8s primitives: Deployments, PVCs, Secrets, NetworkPolicies.

If I do build this, each worker pod gives one person an isolated agent with readonly company data access and room to customize on top. But the architecture is the main artifact here — I’d start with the orchestrator and one test pod before rolling anything out broadly.


References

← Back to blog