Orchestrating a Swarm of AI Agents on Kubernetes
In the previous post, I set up a single shared OpenClaw instance for the team — one AI assistant everyone talks to through Slack, with readonly access to our Elasticsearch data. It works well. But it has a limitation: everyone shares the same agent.
That means shared conversation context, shared skills, shared cron jobs. If one person configures an alerting rule, everyone’s in the same environment. The per-channel-peer session isolation helps, but it’s still one process, one config, one set of customizations.
The natural next step is one agent per person — isolated state, isolated skills, isolated cron jobs. The part I found interesting wasn’t the AI so much as how to run that on Kubernetes: a single orchestrator routing traffic to a swarm of identical worker pods, each with its own persistent volume. OpenClaw happens to be the workload; the pattern is generic.
I never got around to implementing it. We already had a Kubernetes cluster running, which made the shape of the solution fairly obvious — so I wrote up the architecture anyway. This post is that design, not a postmortem.
The architecture
Slack Workspace
|
v
+------------------------+
| Orchestrator Bot | single Slack app, always running
| (K8s Deployment) |
| - Routes messages |
| - Provisions agents |
| - Manages lifecycle |
+------------------------+
|
+---> Pod: agent-alice
+---> Pod: agent-bob
+---> Pod: agent-carol
One Slack app. One orchestrator service that handles all incoming messages. Per-user agent pods that do the actual AI work.
When Alice DMs the bot, the orchestrator would look up her Slack user ID, find her agent pod, and forward the message. Alice’s agent processes it, generates a response, and the orchestrator sends it back through Slack. Alice wouldn’t need to know there’s a dedicated pod running for her — she’d just be talking to “the bot.”
Why Kubernetes
This is a classic controller-worker pattern: one orchestrator Deployment manages lifecycle and routing; N worker pods do the actual work. Kubernetes is a good fit for that swarm model:
- Resource limits — each agent pod gets a CPU/memory budget. One chatty user can’t starve the others.
- Auto-restart — if an agent crashes, K8s brings it back without intervention.
- Persistent volumes — each user’s
~/.openclawdirectory (skills, cron, session history) lives on a PVC. Survives pod restarts and redeployments. - Network policies — restrict what agent pods can reach. They should talk to Elasticsearch (readonly) and the model API. Nothing else.
- No idle cost problem — an OpenClaw agent serving one user is lightweight. Maybe 256MB RAM, 0.25 CPU. You could fit dozens on a single node you’re already paying for.
If I didn’t have a K8s cluster, I’d be managing EC2 instances per user — dealing with AMIs, auto-stop logic, startup latency, instance lifecycle. All problems K8s has already solved.
The base agent image
Every agent pod would run the same container image, with OpenClaw pre-installed and secure defaults baked in:
FROM node:22-slim
RUN npm install -g openclaw@latest
COPY openclaw-base-config.json /etc/openclaw/base-config.json
COPY entrypoint.sh /entrypoint.sh
RUN chmod +x /entrypoint.sh
ENTRYPOINT ["/entrypoint.sh"]
The entrypoint would handle first-run setup vs. existing user data:
#!/bin/bash
set -e
OPENCLAW_DIR="$HOME/.openclaw"
# First run — copy base config
if [ ! -f "$OPENCLAW_DIR/openclaw.json" ]; then
mkdir -p "$OPENCLAW_DIR"
cp /etc/openclaw/base-config.json "$OPENCLAW_DIR/openclaw.json"
fi
# Start the agent
exec openclaw gateway start
On first boot, it copies the base config. On subsequent restarts, it picks up the user’s existing config from the PVC — their customized skills, cron jobs, and preferences persist.
The base config
The base config bakes in security defaults and readonly data access. Every agent would start with the same foundation:
{
"model": {
"provider": "anthropic",
"model": "claude-sonnet-4-20250514"
},
"tools": {
"elevated": { "enabled": false },
"fs": { "workspaceOnly": true },
"deny": ["gateway", "sessions_spawn"]
},
"skills": {
"elasticsearch": {
"enabled": true,
"config": {
"readOnly": true
}
}
}
}
Elevated commands disabled. Filesystem restricted. Dangerous tools denied. Elasticsearch access enabled but readonly. Users could add skills and cron jobs on top of this, but they wouldn’t be able to weaken the security baseline.
Kubernetes manifests
Agent pod template
The orchestrator would create one of these per user:
apiVersion: v1
kind: Pod
metadata:
name: agent-${USER_ID}
namespace: ai-agents
labels:
app: openclaw-agent
user: ${USER_ID}
spec:
containers:
- name: openclaw
image: your-registry/openclaw-agent:latest
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "512Mi"
cpu: "500m"
env:
- name: OPENCLAW_USER_ID
value: "${USER_ID}"
envFrom:
- secretRef:
name: openclaw-shared-secrets
volumeMounts:
- name: user-data
mountPath: /home/node/.openclaw
ports:
- containerPort: 18789
volumes:
- name: user-data
persistentVolumeClaim:
claimName: openclaw-${USER_ID}
Each pod gets 256MB RAM requested, 512MB limit. That should be plenty for a single-user agent. On a node with 16GB RAM, you could run ~30 agents comfortably.
Shared secrets
API keys and credentials shared across all agents:
apiVersion: v1
kind: Secret
metadata:
name: openclaw-shared-secrets
namespace: ai-agents
type: Opaque
data:
ANTHROPIC_API_KEY: <base64>
ES_API_KEY: <base64>
ES_HOST: <base64>
The Elasticsearch API key would have read-only privileges on specific indexes. Every agent gets the same credential. Even if someone prompt-engineers their way into running arbitrary ES queries, the credential physically can’t write or delete anything.
Network policy
Lock down what agent pods can talk to:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: agent-egress
namespace: ai-agents
spec:
podSelector:
matchLabels:
app: openclaw-agent
policyTypes:
- Egress
egress:
# Elasticsearch cluster
- to:
- namespaceSelector:
matchLabels:
name: data
- podSelector:
matchLabels:
app: elasticsearch
ports:
- port: 9200
# Model API (Anthropic)
- to:
- ipBlock:
cidr: 0.0.0.0/0
ports:
- port: 443
# DNS
- to: []
ports:
- port: 53
protocol: UDP
Agent pods can reach Elasticsearch on port 9200 and HTTPS endpoints (for the model API). That’s it. No SSH, no internal services, no lateral movement.
The orchestrator bot
This is the piece that ties it together — and the part I haven’t built. It’s a single Slack app that would:
- Receive all DMs and mentions
- Look up which agent pod belongs to the sender
- Forward the message to that pod
- Return the response to Slack
And handle provisioning:
- User sends
/setup-agent(or whatever trigger you choose) - Orchestrator creates a PVC and pod for that user
- Waits for the pod to be ready
- Registers the mapping:
slack_user_id -> pod_service - Responds: “Your agent is ready. Send me a message to start.”
The orchestrator itself would be a small service — a Deployment with one replica. It doesn’t do AI work. It just routes messages and manages pod lifecycle. The user-to-pod mapping could live in a ConfigMap, a small database, or even in-memory (rebuilt on startup by listing pods with the openclaw-agent label).
Open questions I haven’t worked through: how the orchestrator authenticates to agent pods, whether to use raw Pods or Deployments per user, and who gets to trigger provisioning.
The routing logic, at a high level:
on_message(slack_user_id, text):
pod = lookup_agent_pod(slack_user_id)
if pod is None:
reply("No agent found. Send /setup-agent to create one.")
return
response = forward_to_pod(pod, text)
reply(response)
on_command("/setup-agent", slack_user_id):
if agent_exists(slack_user_id):
reply("You already have an agent running.")
return
create_pvc(slack_user_id)
create_pod(slack_user_id)
wait_for_ready(slack_user_id)
reply("Your agent is ready.")
What each agent would enable
If this worked as designed, each person would get a personal AI assistant that:
-
Answers questions about company data. “What’s the status of pipeline X?” translates to an ES query, runs it, returns formatted results. No one needs to know the query DSL.
-
Sets up personal alerts. “Tell me every morning if there are any anomalies from yesterday.” The user configures this through chat. Their agent runs the cron job, checks the data, DMs them if something triggers. Other users’ alerts are completely separate.
-
Builds custom skills. A data analyst might create a skill that generates weekly summary reports in a specific format. An engineer might create one that checks deployment status. These are per-user — they don’t affect anyone else.
-
Maintains conversation context. Each agent remembers past conversations. “Remember that query I ran last week about sensor data?” — it can recall that because the session history is on their PVC.
The key insight: everyone starts with the same base (readonly data access, security defaults), but each person’s agent diverges over time based on how they use it. The data analyst’s agent becomes good at reporting. The engineer’s agent becomes good at debugging. The PM’s agent becomes good at summarizing.
Estimated cost
Rough back-of-envelope numbers — I haven’t run this in production:
| EC2 per user | K8s pod per user | |
|---|---|---|
| Compute | ~$120/month (t3.xlarge) | ~$5-10/month (shared node) |
| Storage | EBS volume | PVC (1-5 GB) |
| Management | AMI updates, instance lifecycle | Image updates, pod restarts |
| Scaling | Manual or ASG | Built-in |
The main variable cost is still the model API. But that’s per-token, not per-agent — you’re paying for actual usage regardless of how many agents exist.
For a team of 30, I’d expect maybe $150-300/month in additional cluster resources (a couple extra nodes) plus model API costs. Compare that to $600-900/month for individual AI subscriptions that can’t access your data.
Open questions
Things I’d need to figure out before building this:
- The orchestrator. Routing and provisioning logic is straightforward on paper; the hard part is making it reliable — pod readiness, retries, stale mappings, upgrades.
- Usage dashboards. Per-user token consumption, query volume, cron job frequency. Helps identify power users and cost outliers.
- Agent templates. Instead of everyone starting from the same base, offer role-based templates — “data analyst,” “engineer,” “PM” — with pre-configured skills relevant to that role.
- Graceful deprovisioning. When someone leaves the company, their pod and PVC should be cleaned up. Probably triggered by an HR system webhook or a manual admin command.
- Shared skills library. A way for users to publish skills they’ve built so others can install them. Customizations would otherwise stay siloed.
Takeaways
The pattern is really a Kubernetes swarm: one orchestrator, many isolated worker pods, shared secrets for readonly data access, per-user PVCs for persistent state.
That’s the part worth stealing. OpenClaw is just one possible workload — the same layout works for any per-user service you’d want to provision on demand through a single front door (Slack, HTTP gateway, whatever).
Kubernetes handles the hard parts: scheduling, restarts, resource isolation, network policy, storage. The orchestrator is the only custom piece, and it should be small. Everything else is standard K8s primitives: Deployments, PVCs, Secrets, NetworkPolicies.
If I do build this, each worker pod gives one person an isolated agent with readonly company data access and room to customize on top. But the architecture is the main artifact here — I’d start with the orchestrator and one test pod before rolling anything out broadly.