Orchestrating a Swarm of AI Agents on Kubernetes

March 5, 2026

In the previous post, I set up a single shared OpenClaw instance for the team — one AI assistant everyone talks to through Slack, with readonly access to our Elasticsearch data. It works well. But it has a limitation: everyone shares the same agent.

That means shared conversation context, shared skills, shared cron jobs. If one person configures an alerting rule, everyone’s in the same environment. The per-channel-peer session isolation helps, but it’s still one process, one config, one set of customizations.

The natural next step is one agent per person — isolated state, isolated skills, isolated cron jobs. The part I found interesting wasn’t the AI so much as how to run that on Kubernetes: a single orchestrator routing traffic to a swarm of identical worker pods, each with its own persistent volume. OpenClaw happens to be the workload; the pattern is generic.

I never got around to implementing it. We already had a Kubernetes cluster running, which made the shape of the solution fairly obvious — so I wrote up the architecture anyway. This post is that design, not a postmortem.

The architecture

Slack Workspace
      |
      v
+------------------------+
| Orchestrator Bot       |  single Slack app, always running
| (K8s Deployment)       |
| - Routes messages      |
| - Provisions agents    |
| - Manages lifecycle    |
+------------------------+
      |
      +---> Pod: agent-alice
      +---> Pod: agent-bob
      +---> Pod: agent-carol

One Slack app. One orchestrator service that handles all incoming messages. Per-user agent pods that do the actual AI work.

When Alice DMs the bot, the orchestrator would look up her Slack user ID, find her agent pod, and forward the message. Alice’s agent processes it, generates a response, and the orchestrator sends it back through Slack. Alice wouldn’t need to know there’s a dedicated pod running for her — she’d just be talking to “the bot.”

Why Kubernetes

This is a classic controller-worker pattern: one orchestrator Deployment manages lifecycle and routing; N worker pods do the actual work. Kubernetes is a good fit for that swarm model:

Resource limits — each agent pod gets a CPU/memory budget. One chatty user can’t starve the others.
Auto-restart — if an agent crashes, K8s brings it back without intervention.
Persistent volumes — each user’s ~/.openclaw directory (skills, cron, session history) lives on a PVC. Survives pod restarts and redeployments.
Network policies — restrict what agent pods can reach. They should talk to Elasticsearch (readonly) and the model API. Nothing else.
No idle cost problem — an OpenClaw agent serving one user is lightweight. Maybe 256MB RAM, 0.25 CPU. You could fit dozens on a single node you’re already paying for.

If I didn’t have a K8s cluster, I’d be managing EC2 instances per user — dealing with AMIs, auto-stop logic, startup latency, instance lifecycle. All problems K8s has already solved.

The base agent image

Every agent pod would run the same container image, with OpenClaw pre-installed and secure defaults baked in:

FROM node:22-slim

RUN npm install -g openclaw@latest

COPY openclaw-base-config.json /etc/openclaw/base-config.json
COPY entrypoint.sh /entrypoint.sh
RUN chmod +x /entrypoint.sh

ENTRYPOINT ["/entrypoint.sh"]

The entrypoint would handle first-run setup vs. existing user data:

#!/bin/bash
set -e

OPENCLAW_DIR="$HOME/.openclaw"

# First run — copy base config
if [ ! -f "$OPENCLAW_DIR/openclaw.json" ]; then
    mkdir -p "$OPENCLAW_DIR"
    cp /etc/openclaw/base-config.json "$OPENCLAW_DIR/openclaw.json"
fi

# Start the agent
exec openclaw gateway start

On first boot, it copies the base config. On subsequent restarts, it picks up the user’s existing config from the PVC — their customized skills, cron jobs, and preferences persist.

The base config

The base config bakes in security defaults and readonly data access. Every agent would start with the same foundation:

{
  "model": {
    "provider": "anthropic",
    "model": "claude-sonnet-4-20250514"
  },
  "tools": {
    "elevated": { "enabled": false },
    "fs": { "workspaceOnly": true },
    "deny": ["gateway", "sessions_spawn"]
  },
  "skills": {
    "elasticsearch": {
      "enabled": true,
      "config": {
        "readOnly": true
      }
    }
  }
}

Elevated commands disabled. Filesystem restricted. Dangerous tools denied. Elasticsearch access enabled but readonly. Users could add skills and cron jobs on top of this, but they wouldn’t be able to weaken the security baseline.

Kubernetes manifests

Agent pod template

The orchestrator would create one of these per user:

apiVersion: v1
kind: Pod
metadata:
  name: agent-${USER_ID}
  namespace: ai-agents
  labels:
    app: openclaw-agent
    user: ${USER_ID}
spec:
  containers:
    - name: openclaw
      image: your-registry/openclaw-agent:latest
      resources:
        requests:
          memory: "256Mi"
          cpu: "250m"
        limits:
          memory: "512Mi"
          cpu: "500m"
      env:
        - name: OPENCLAW_USER_ID
          value: "${USER_ID}"
      envFrom:
        - secretRef:
            name: openclaw-shared-secrets
      volumeMounts:
        - name: user-data
          mountPath: /home/node/.openclaw
      ports:
        - containerPort: 18789
  volumes:
    - name: user-data
      persistentVolumeClaim:
        claimName: openclaw-${USER_ID}

Each pod gets 256MB RAM requested, 512MB limit. That should be plenty for a single-user agent. On a node with 16GB RAM, you could run ~30 agents comfortably.

Shared secrets

API keys and credentials shared across all agents:

apiVersion: v1
kind: Secret
metadata:
  name: openclaw-shared-secrets
  namespace: ai-agents
type: Opaque
data:
  ANTHROPIC_API_KEY: <base64>
  ES_API_KEY: <base64>
  ES_HOST: <base64>

The Elasticsearch API key would have read-only privileges on specific indexes. Every agent gets the same credential. Even if someone prompt-engineers their way into running arbitrary ES queries, the credential physically can’t write or delete anything.

Network policy

Lock down what agent pods can talk to:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: agent-egress
  namespace: ai-agents
spec:
  podSelector:
    matchLabels:
      app: openclaw-agent
  policyTypes:
    - Egress
  egress:
    # Elasticsearch cluster
    - to:
        - namespaceSelector:
            matchLabels:
              name: data
        - podSelector:
            matchLabels:
              app: elasticsearch
      ports:
        - port: 9200
    # Model API (Anthropic)
    - to:
        - ipBlock:
            cidr: 0.0.0.0/0
      ports:
        - port: 443
    # DNS
    - to: []
      ports:
        - port: 53
          protocol: UDP

Agent pods can reach Elasticsearch on port 9200 and HTTPS endpoints (for the model API). That’s it. No SSH, no internal services, no lateral movement.

The orchestrator bot

This is the piece that ties it together — and the part I haven’t built. It’s a single Slack app that would:

Receive all DMs and mentions
Look up which agent pod belongs to the sender
Forward the message to that pod
Return the response to Slack

And handle provisioning:

User sends /setup-agent (or whatever trigger you choose)
Orchestrator creates a PVC and pod for that user
Waits for the pod to be ready
Registers the mapping: slack_user_id -> pod_service
Responds: “Your agent is ready. Send me a message to start.”

The orchestrator itself would be a small service — a Deployment with one replica. It doesn’t do AI work. It just routes messages and manages pod lifecycle. The user-to-pod mapping could live in a ConfigMap, a small database, or even in-memory (rebuilt on startup by listing pods with the openclaw-agent label).

Open questions I haven’t worked through: how the orchestrator authenticates to agent pods, whether to use raw Pods or Deployments per user, and who gets to trigger provisioning.

The routing logic, at a high level:

on_message(slack_user_id, text):
    pod = lookup_agent_pod(slack_user_id)
    if pod is None:
        reply("No agent found. Send /setup-agent to create one.")
        return
    response = forward_to_pod(pod, text)
    reply(response)

on_command("/setup-agent", slack_user_id):
    if agent_exists(slack_user_id):
        reply("You already have an agent running.")
        return
    create_pvc(slack_user_id)
    create_pod(slack_user_id)
    wait_for_ready(slack_user_id)
    reply("Your agent is ready.")

What each agent would enable

If this worked as designed, each person would get a personal AI assistant that:

Answers questions about company data. “What’s the status of pipeline X?” translates to an ES query, runs it, returns formatted results. No one needs to know the query DSL.
Sets up personal alerts. “Tell me every morning if there are any anomalies from yesterday.” The user configures this through chat. Their agent runs the cron job, checks the data, DMs them if something triggers. Other users’ alerts are completely separate.
Builds custom skills. A data analyst might create a skill that generates weekly summary reports in a specific format. An engineer might create one that checks deployment status. These are per-user — they don’t affect anyone else.
Maintains conversation context. Each agent remembers past conversations. “Remember that query I ran last week about sensor data?” — it can recall that because the session history is on their PVC.

The key insight: everyone starts with the same base (readonly data access, security defaults), but each person’s agent diverges over time based on how they use it. The data analyst’s agent becomes good at reporting. The engineer’s agent becomes good at debugging. The PM’s agent becomes good at summarizing.

Estimated cost

Rough back-of-envelope numbers — I haven’t run this in production:

	EC2 per user	K8s pod per user
Compute	~$120/month (t3.xlarge)	~$5-10/month (shared node)
Storage	EBS volume	PVC (1-5 GB)
Management	AMI updates, instance lifecycle	Image updates, pod restarts
Scaling	Manual or ASG	Built-in

The main variable cost is still the model API. But that’s per-token, not per-agent — you’re paying for actual usage regardless of how many agents exist.

For a team of 30, I’d expect maybe $150-300/month in additional cluster resources (a couple extra nodes) plus model API costs. Compare that to $600-900/month for individual AI subscriptions that can’t access your data.

Open questions

Things I’d need to figure out before building this:

The orchestrator. Routing and provisioning logic is straightforward on paper; the hard part is making it reliable — pod readiness, retries, stale mappings, upgrades.
Usage dashboards. Per-user token consumption, query volume, cron job frequency. Helps identify power users and cost outliers.
Agent templates. Instead of everyone starting from the same base, offer role-based templates — “data analyst,” “engineer,” “PM” — with pre-configured skills relevant to that role.
Graceful deprovisioning. When someone leaves the company, their pod and PVC should be cleaned up. Probably triggered by an HR system webhook or a manual admin command.
Shared skills library. A way for users to publish skills they’ve built so others can install them. Customizations would otherwise stay siloed.

Takeaways

The pattern is really a Kubernetes swarm: one orchestrator, many isolated worker pods, shared secrets for readonly data access, per-user PVCs for persistent state.

That’s the part worth stealing. OpenClaw is just one possible workload — the same layout works for any per-user service you’d want to provision on demand through a single front door (Slack, HTTP gateway, whatever).

Kubernetes handles the hard parts: scheduling, restarts, resource isolation, network policy, storage. The orchestrator is the only custom piece, and it should be small. Everything else is standard K8s primitives: Deployments, PVCs, Secrets, NetworkPolicies.

If I do build this, each worker pod gives one person an isolated agent with readonly company data access and room to customize on top. But the architecture is the main artifact here — I’d start with the orchestrator and one test pod before rolling anything out broadly.