
**Author:** Brenner Axiom, #B4mad Industries
**Date:** 2026-02-23
**Bead:** nanoclaw-k8s-r1

---

## Abstract

This paper investigates architectural approaches for deploying NanoClaw containers on Kubernetes and OpenShift platforms. NanoClaw currently uses Docker as its container runtime to execute Claude Agent SDK instances in isolated environments. We analyze the existing Docker-based architecture, propose three distinct Kubernetes deployment patterns, and provide detailed trade-off analysis for each approach. We recommend a **Job-based architecture with PersistentVolumeClaims** for initial implementation due to minimal code disruption, OpenShift compatibility, and clear evolution paths. This paper targets technical readers familiar with container orchestration and Kubernetes primitives.

---

## 1. Context: Why Kubernetes for NanoClaw?

NanoClaw is a lightweight personal AI assistant framework that runs Claude Code in isolated Linux containers. Each agent session spawns an ephemeral Docker container with filesystem isolation, supporting:

- **Multi-group isolation** — Each WhatsApp/Telegram group gets its own container sandbox
- **Concurrent execution** — Up to 5 containers running simultaneously (configurable)
- **Filesystem-based IPC** — Host controller communicates with containers via polling
- **Security by isolation** — Bind mounts for workspace access, secrets via stdin

### Current Limitations

The Docker-based architecture works well for single-host deployments but lacks:

1. **Multi-node scaling** — Cannot distribute workload across multiple machines
2. **Resource orchestration** — No native quotas, limits, or priority scheduling
3. **High availability** — Single point of failure (Docker daemon on one host)
4. **Enterprise security** — OpenShift Security Context Constraints (SCC) not enforceable

Migrating to Kubernetes/OpenShift enables cloud-native deployment patterns while preserving NanoClaw's simplicity and security model.

---

## 2. Current Architecture Analysis

### 2.1 Container Lifecycle

**File:** `/workspace/project/src/container-runner.ts`

Each agent session follows this lifecycle:

1. **Spawn** — `docker run` with bind mounts for workspace, IPC, sessions
2. **Stream** — Parse stdout for structured results (sentinel markers)
3. **Idle** — Container stays alive 30min after completion (handles follow-ups)
4. **Cleanup** — Graceful `docker stop` or force kill after timeout

**Key characteristics:**
- Ephemeral containers (`--rm` flag, no persistent state)
- Short-lived (30min max per session)
- Named pattern: `nanoclaw-{groupFolder}-{timestamp}`

### 2.2 Volume Mount Strategy

**File:** `/workspace/project/src/container-runner.ts` (lines 53-179)

NanoClaw uses Docker bind mounts to provide filesystem isolation:

```
/workspace/project    → {projectRoot}              (read-only)
/workspace/group      → groups/{folder}/           (read-write)
/home/node/.claude    → data/sessions/{folder}     (read-write)
/workspace/ipc        → data/ipc/{folder}/         (read-write)
/workspace/extra/*    → {additionalMounts}         (validated)
```

**Security boundaries:**
- Main group gets read-only access to project root (prevents code tampering)
- Non-main groups forced read-only for extra mounts (security boundary)
- Mount allowlist stored outside project (`~/.config/nanoclaw/mount-allowlist.json`)

### 2.3 IPC Mechanism

**File:** `/workspace/project/container/agent-runner/src/index.ts`

Communication between host controller and container uses **filesystem polling**:

**Host → Container:**
- Write JSON files to `/workspace/ipc/input/{timestamp}.json`
- Write sentinel `_close` to signal shutdown

**Container → Host:**
- Write structured output to stdout (parsed by host)
- Wrap results in `---NANOCLAW_OUTPUT_START---` markers

**Why filesystem?**
- Simple, reliable, no network dependencies
- Works across container runtimes (Docker, Apple Container, Kubernetes)
- No port conflicts or service discovery

### 2.4 Concurrency Model

**File:** `/workspace/project/src/group-queue.ts`

A **GroupQueue** manages concurrent container execution:

- **Global limit:** 5 containers (configurable via `MAX_CONCURRENT_CONTAINERS`)
- **Per-group state:** Active process, idle flag, pending messages/tasks
- **Queue behavior:** FIFO processing when slots become available
- **Preemption:** Idle containers can be killed for pending high-priority tasks

### 2.5 Security Model

**Secrets** — Never written to disk:
- Read from `.env` only where needed
- Passed to container via stdin
- Stripped from Bash subprocess environment

**User isolation** — UID/GID mapping:
- Container runs as host user (not root)
- Ensures bind-mounted files have correct permissions
- Skipped for root (uid 0) or container default (uid 1000)

**Mount security** — Allowlist validation:
- Blocked patterns: `.ssh`, `.aws`, `.kube`, `.env`, private keys
- Enforced on host before container creation (tamper-proof)
- Non-main groups forced read-only for extra mounts

---

## 3. Kubernetes Deployment Approaches

We propose three architectures, each with different trade-offs for complexity, performance, and multi-node support.

### 3.1 Approach 1: Job-Based with Persistent Volumes

#### Overview

Each agent session spawns a **Kubernetes Job** → one Pod → auto-cleanup after completion. State persists via **PersistentVolumeClaims (PVC)**.

#### Architecture Diagram

```
┌─────────────────────────────────────────────────┐
│  Host Controller (Deployment)                   │
│  ┌─────────────────────────────────────────┐   │
│  │ GroupQueue                               │   │
│  │ - Queue pending messages/tasks           │   │
│  │ - Create Job when slot available         │   │
│  │ - Poll Job status for completion         │   │
│  └─────────────────────────────────────────┘   │
│                                                  │
│  Mounted PVCs:                                  │
│  - /data/ipc/{groupFolder}/  (IPC polling)     │
│  - /data/sessions/{groupFolder}/               │
└─────────────────────────────────────────────────┘
                    │
                    │ Creates Job
                    ▼
┌─────────────────────────────────────────────────┐
│  Kubernetes Job: nanoclaw-main-1708712345       │
│  ┌─────────────────────────────────────────┐   │
│  │ Pod (ephemeral)                          │   │
│  │                                           │   │
│  │ Volumes:                                  │   │
│  │ - PVC: nanoclaw-group-main → /workspace/group │
│  │ - PVC: nanoclaw-ipc-main → /workspace/ipc    │
│  │ - PVC: nanoclaw-sessions-main → /.claude     │
│  │ - PVC: nanoclaw-project-ro → /workspace/project │
│  │                                           │   │
│  │ securityContext:                          │   │
│  │   runAsUser: 1000                         │   │
│  │   fsGroup: 1000                           │   │
│  └─────────────────────────────────────────┘   │
│                                                  │
│  activeDeadlineSeconds: 1800  (30min timeout)  │
│  ttlSecondsAfterFinished: 300  (5min cleanup)  │
└─────────────────────────────────────────────────┘
```

#### Volume Strategy

**PVC per resource type:**

```yaml
# Group workspace (read-write)
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: nanoclaw-group-main
spec:
  accessModes:
    - ReadWriteMany  # Multi-node requires RWX
  resources:
    requests:
      storage: 10Gi
  storageClassName: nfs  # Or cephfs, efs, etc.

# IPC directory (read-write)
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: nanoclaw-ipc-main
spec:
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 1Gi

# Project root (read-only)
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: nanoclaw-project-ro
spec:
  accessModes:
    - ReadOnlyMany
  resources:
    requests:
      storage: 5Gi
```

**Job manifest template:**

```yaml
apiVersion: batch/v1
kind: Job
metadata:
  name: nanoclaw-main-{{timestamp}}
spec:
  activeDeadlineSeconds: 1800
  ttlSecondsAfterFinished: 300
  template:
    spec:
      restartPolicy: Never
      securityContext:
        runAsUser: 1000
        runAsGroup: 1000
        fsGroup: 1000
      containers:
      - name: agent
        image: nanoclaw-agent:latest
        stdin: true
        stdinOnce: true
        volumeMounts:
        - name: group-workspace
          mountPath: /workspace/group
        - name: ipc
          mountPath: /workspace/ipc
        - name: sessions
          mountPath: /home/node/.claude
        - name: project
          mountPath: /workspace/project
          readOnly: true
      volumes:
      - name: group-workspace
        persistentVolumeClaim:
          claimName: nanoclaw-group-main
      - name: ipc
        persistentVolumeClaim:
          claimName: nanoclaw-ipc-main
      - name: sessions
        persistentVolumeClaim:
          claimName: nanoclaw-sessions-main
      - name: project
        persistentVolumeClaim:
          claimName: nanoclaw-project-ro
```

#### Implementation Changes

**New file: `/workspace/project/src/k8s-runtime.ts`**

```typescript
import * as k8s from '@kubernetes/client-node';

export async function createAgentJob(
  groupFolder: string,
  timestamp: number,
  volumeMounts: VolumeMount[]
): Promise<string> {
  const kc = new k8s.KubeConfig();
  kc.loadFromDefault();

  const batchV1 = kc.makeApiClient(k8s.BatchV1Api);

  const jobName = `nanoclaw-${groupFolder}-${timestamp}`;
  const job = buildJobManifest(jobName, groupFolder, volumeMounts);

  await batchV1.createNamespacedJob('default', job);
  return jobName;
}

export async function pollJobStatus(
  jobName: string
): Promise<JobStatus> {
  // Poll Job.status.conditions for completion
  // Return exit code or error
}
```

**Modified: `/workspace/project/src/container-runtime.ts`**

```typescript
export const CONTAINER_RUNTIME_TYPE =
  process.env.CONTAINER_RUNTIME || 'docker';  // 'docker' | 'kubernetes'

export function getRuntime(): ContainerRuntime {
  if (CONTAINER_RUNTIME_TYPE === 'kubernetes') {
    return new K8sRuntime();
  }
  return new DockerRuntime();
}
```

**Modified: `/workspace/project/src/container-runner.ts`**

```typescript
const runtime = getRuntime();

if (runtime instanceof K8sRuntime) {
  const jobName = await runtime.createAgentJob(groupFolder, timestamp, mounts);
  const result = await runtime.pollJobStatus(jobName);
  // Parse result same as Docker output
} else {
  // Existing Docker spawn() logic
}
```

#### Pros & Cons

| Aspect | Assessment |
|--------|------------|
| **Code changes** | ✅ Low (abstraction layer only) |
| **IPC mechanism** | ✅ Unchanged (filesystem polling works) |
| **OpenShift compatible** | ✅ Yes (PVC + SCC friendly) |
| **Latency** | ⚠️ Medium (Job creation ~2-5s vs Docker <1s) |
| **Multi-node** | ⚠️ Requires ReadWriteMany PVCs (NFS, CephFS) |
| **Resource usage** | ✅ Low (ephemeral Pods, auto-cleanup) |
| **Complexity** | ✅ Low (native K8s primitives) |
| **Rollback** | ✅ Easy (just switch runtime back to Docker) |

---

### 3.2 Approach 2: StatefulSet with Sidecar Pattern

#### Overview

Replace ephemeral Jobs with **long-lived Pods** (one per group) that stay idle between sessions. Host controller sends work via IPC (unchanged).

#### Architecture Diagram

```
┌─────────────────────────────────────────────────┐
│  Host Controller (Deployment)                   │
│  - Sends IPC messages to wake idle Pods         │
│  - Scales StatefulSet to 0 after idle timeout   │
└─────────────────────────────────────────────────┘
                    │
                    │ IPC via PVC
                    ▼
┌─────────────────────────────────────────────────┐
│  StatefulSet: nanoclaw-main (1 replica)         │
│  ┌─────────────────────────────────────────┐   │
│  │ Pod: nanoclaw-main-0 (always running)    │   │
│  │                                           │   │
│  │ Container loops forever:                  │   │
│  │ 1. Poll /workspace/ipc/input/             │   │
│  │ 2. Process message if present             │   │
│  │ 3. Write output                            │   │
│  │ 4. Sleep 500ms, repeat                     │   │
│  │                                           │   │
│  │ Idle timeout: 30min → graceful shutdown   │   │
│  └─────────────────────────────────────────┘   │
│                                                  │
│  volumeClaimTemplate:                           │
│  - workspace (10Gi RWX)                         │
└─────────────────────────────────────────────────┘
```

#### Volume Strategy

StatefulSet automatically provisions PVCs via `volumeClaimTemplates`:

```yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: nanoclaw-main
spec:
  serviceName: nanoclaw
  replicas: 1
  selector:
    matchLabels:
      app: nanoclaw
      group: main
  template:
    spec:
      containers:
      - name: agent
        image: nanoclaw-agent:latest
        command: ["/app/entrypoint-loop.sh"]  # Modified entrypoint
        volumeMounts:
        - name: workspace
          mountPath: /workspace
  volumeClaimTemplates:
  - metadata:
      name: workspace
    spec:
      accessModes: [ "ReadWriteOnce" ]
      resources:
        requests:
          storage: 10Gi
```

#### Implementation Changes

**Modified: `/workspace/project/container/agent-runner/src/index.ts`**

```typescript
// Replace single-shot execution with infinite loop
while (true) {
  const message = await pollIpcInput();
  if (message === '_close') {
    console.log('Shutdown signal received');
    break;
  }
  if (message) {
    await processQuery(message);
  }
  await sleep(500);

  // Idle timeout
  if (Date.now() - lastActivity > IDLE_TIMEOUT) {
    console.log('Idle timeout, shutting down');
    break;
  }
}
```

**Modified: `/workspace/project/src/group-queue.ts`**

```typescript
// Instead of spawning new container, ensure StatefulSet exists
async ensureStatefulSet(groupFolder: string) {
  if (!await k8s.statefulSetExists(groupFolder)) {
    await k8s.createStatefulSet(groupFolder);
  }
  await k8s.waitForPodReady(groupFolder);
}

// Send IPC message to wake idle Pod
async enqueueMessageCheck(groupFolder: string, message: Message) {
  await ensureStatefulSet(groupFolder);
  await writeIpcMessage(groupFolder, message);
}
```

#### Pros & Cons

| Aspect | Assessment |
|--------|------------|
| **Code changes** | ⚠️ Medium (queue + agent-runner modifications) |
| **Latency** | ✅ Low (Pod already running, no Job creation) |
| **Resource usage** | ❌ High (idle Pods consume memory/CPU) |
| **IPC mechanism** | ✅ Unchanged |
| **OpenShift compatible** | ✅ Yes |
| **Session reuse** | ✅ Claude SDK stays warm (faster startup) |
| **Complexity** | ⚠️ Medium (StatefulSet lifecycle, idle timeout logic) |
| **Multi-node** | ⚠️ Requires RWX PVCs |

---

### 3.3 Approach 3: DaemonSet Controller + Job Workers

#### Overview

Host controller runs as **DaemonSet** on each K8s node. Jobs are node-affinited to the same node as their group's PVC. Optimized for multi-node clusters with **hostPath volumes** (local disk speed).

#### Architecture Diagram

```
┌────────────────────────────────────────────────────────┐
│  Kubernetes Cluster (3 nodes)                          │
│                                                         │
│  Node 1                Node 2               Node 3     │
│  ┌─────────────┐      ┌─────────────┐     ┌──────┐   │
│  │ nanoclaw-   │      │ nanoclaw-   │     │ ... │   │
│  │ controller  │      │ controller  │     └──────┘   │
│  │ DaemonSet   │      │ DaemonSet   │                 │
│  │ Pod         │      │ Pod         │                 │
│  │             │      │             │                 │
│  │ Manages:    │      │ Manages:    │                 │
│  │ - group-a   │      │ - group-c   │                 │
│  │ - group-b   │      │ - group-d   │                 │
│  └─────────────┘      └─────────────┘                 │
│         │                     │                        │
│         │ Creates Job         │ Creates Job            │
│         │ with nodeSelector   │ with nodeSelector      │
│         ▼                     ▼                        │
│  ┌─────────────┐      ┌─────────────┐                │
│  │ Job: group-a│      │ Job: group-c│                │
│  │ (Node 1)    │      │ (Node 2)    │                │
│  │             │      │             │                │
│  │ hostPath:   │      │ hostPath:   │                │
│  │ /var/       │      │ /var/       │                │
│  │ nanoclaw/   │      │ nanoclaw/   │                │
│  │ group-a/    │      │ group-c/    │                │
│  └─────────────┘      └─────────────┘                │
└────────────────────────────────────────────────────────┘
```

#### Group → Node Assignment

Use **consistent hashing** to assign groups to nodes:

```typescript
function getNodeForGroup(groupFolder: string, nodes: Node[]): string {
  const hash = createHash('sha256')
    .update(groupFolder)
    .digest('hex');
  const index = parseInt(hash.slice(0, 8), 16) % nodes.length;
  return nodes[index].metadata.name;
}
```

Store mapping in ConfigMap:

```yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: nanoclaw-group-assignments
data:
  group-main: "node-1"
  group-family: "node-2"
  group-work: "node-1"
```

#### Volume Strategy

**hostPath volumes** for zero network latency:

```yaml
apiVersion: batch/v1
kind: Job
metadata:
  name: nanoclaw-main-{{timestamp}}
spec:
  template:
    spec:
      nodeSelector:
        kubernetes.io/hostname: node-1  # Pinned to same node as controller
      containers:
      - name: agent
        volumeMounts:
        - name: ipc
          mountPath: /workspace/ipc
        - name: group
          mountPath: /workspace/group
      volumes:
      - name: ipc
        hostPath:
          path: /var/nanoclaw/ipc/main
          type: Directory
      - name: group
        hostPath:
          path: /var/nanoclaw/groups/main
          type: Directory
```

#### Implementation Changes

**New file: `/workspace/project/src/k8s-daemonset.ts`**

```typescript
export async function assignGroupToNode(groupFolder: string): Promise<string> {
  const nodes = await k8s.listNodes();
  const nodeName = getNodeForGroup(groupFolder, nodes);

  // Store in ConfigMap
  await k8s.updateConfigMap('nanoclaw-group-assignments', {
    [groupFolder]: nodeName
  });

  return nodeName;
}

export async function createJobWithAffinity(
  groupFolder: string,
  nodeName: string
): Promise<string> {
  const job = buildJobManifest(groupFolder, {
    nodeSelector: {
      'kubernetes.io/hostname': nodeName
    },
    volumes: buildHostPathVolumes(groupFolder)
  });
  await k8s.createJob(job);
}
```

#### Pros & Cons

| Aspect | Assessment |
|--------|------------|
| **Performance** | ✅ Best (local disk I/O, no network mounts) |
| **Multi-node** | ✅ Native (DaemonSet per node) |
| **Resource usage** | ⚠️ Medium (one controller per node) |
| **Code changes** | ❌ High (distributed state, node affinity logic) |
| **Security** | ❌ Poor (hostPath requires privileged access) |
| **OpenShift compatible** | ❌ No (hostPath blocked by restricted SCC) |
| **Complexity** | ❌ High (node assignment, rebalancing, failure handling) |

---

## 4. Comparison Matrix

| Criterion | Approach 1: Job+PVC | Approach 2: StatefulSet | Approach 3: DaemonSet |
|-----------|---------------------|------------------------|----------------------|
| **Code complexity** | ✅ Low | ⚠️ Medium | ❌ High |
| **Job/Pod latency** | ⚠️ 2-5s | ✅ <500ms | ✅ <500ms |
| **Resource idle cost** | ✅ Low | ❌ High | ⚠️ Medium |
| **Multi-node support** | ⚠️ Requires RWX | ⚠️ Requires RWX | ✅ Native |
| **Volume I/O performance** | ⚠️ Network (NFS) | ⚠️ Network (NFS) | ✅ Local disk |
| **OpenShift SCC** | ✅ Compatible | ✅ Compatible | ❌ Blocked |
| **IPC mechanism** | ✅ Unchanged | ✅ Unchanged | ✅ Unchanged |
| **Rollback ease** | ✅ Easy | ⚠️ Medium | ❌ Hard |
| **Production readiness** | ✅ Good | ✅ Good | ⚠️ Experimental |
| **Recommended for** | POC, single-node | Production, <50 groups | High-scale, >100 groups |

---

## 5. Recommended Approach

**Approach 1: Job-Based with PersistentVolumeClaims**

### Rationale

1. **Minimal disruption** — Abstraction layer only, IPC unchanged
2. **OpenShift compatible** — No hostPath, SCC-friendly
3. **Easy rollback** — Runtime flag toggles Docker/K8s
4. **Natural evolution** — Can upgrade to StatefulSet later if needed

### Migration Path

**Phase 1: Single-Node Kubernetes (Week 1-2)**
- Implement `k8s-runtime.ts` with Job API client
- Create PVCs for main group (group, IPC, sessions, project)
- Test Job creation, status polling, output parsing
- Validate IPC mechanism works across PVCs

**Phase 2: Multi-Group Support (Week 3-4)**
- Dynamic PVC provisioning per group
- Test concurrent Job execution (5 simultaneous groups)
- Performance benchmarking (Job creation latency, PVC I/O)

**Phase 3: Multi-Node Deployment (Week 5-6)**
- Evaluate RWX PVC backends (NFS vs CephFS vs AWS EFS)
- Test cross-node scheduling (Pod on Node 2, PVC on Node 1)
- If latency unacceptable: pilot Approach 3 (DaemonSet + hostPath)

**Phase 4: Production Hardening (Week 7-8)**
- OpenShift SCC validation
- Security audit (PVC isolation, secrets handling)
- Resource limits and quotas
- Monitoring and alerting (Job failures, PVC capacity)

### Risk Mitigation

**High Risk: PVC Performance**
- **Symptom**: Slow I/O on NFS-backed PVCs
- **Mitigation**: Benchmark early (Phase 2), pivot to DaemonSet if needed
- **Fallback**: Use ReadWriteOnce + node affinity (pseudo-hostPath)

**Medium Risk: Job Creation Latency**
- **Symptom**: 5-10s delay for Job → Running
- **Mitigation**: Pre-warm Pod pool (StatefulSet with scale=0, scale up on demand)
- **Fallback**: Accept latency or switch to StatefulSet (Approach 2)

**Low Risk: OpenShift SCC**
- **Symptom**: PVC mount permissions fail
- **Mitigation**: Use `fsGroup` in securityContext, request `anyuid` SCC if needed
- **Fallback**: Manual PVC permission fixing via initContainer

---

## 6. Implementation Checklist

### Prerequisites

- [ ] Kubernetes cluster (1.24+) or OpenShift (4.12+)
- [ ] StorageClass with ReadWriteMany support (NFS, CephFS, EFS)
- [ ] Container registry for nanoclaw-agent image
- [ ] RBAC permissions (create Jobs, PVCs, read Pods)

### Code Changes

- [ ] Create `/workspace/project/src/k8s-runtime.ts` (Job API client)
- [ ] Modify `/workspace/project/src/container-runtime.ts` (runtime detection)
- [ ] Modify `/workspace/project/src/container-runner.ts` (Job dispatcher)
- [ ] Add `/workspace/project/src/config.ts` (`CONTAINER_RUNTIME`, `K8S_NAMESPACE`)
- [ ] Add `/workspace/project/k8s/pvc-templates.yaml` (PVC manifests)
- [ ] Add tests for K8s runtime abstraction

### Deployment

- [ ] Build and push nanoclaw-agent image to registry
- [ ] Create namespace: `kubectl create namespace nanoclaw`
- [ ] Apply PVC templates: `kubectl apply -f k8s/pvc-templates.yaml`
- [ ] Deploy host controller (Deployment with PVC mounts)
- [ ] Set `CONTAINER_RUNTIME=kubernetes` env var
- [ ] Verify Job creation: `kubectl get jobs -n nanoclaw`

### Testing

- [ ] Single-group test (main group)
- [ ] Concurrent execution test (5 groups simultaneously)
- [ ] IPC round-trip test (follow-up messages work)
- [ ] Idle timeout test (Pod cleans up after 30min)
- [ ] Failure recovery test (Job fails, retry logic works)
- [ ] Performance test (Job latency, PVC throughput)

---

## 7. Future Work

### Short-Term (1-3 months)

- **Performance optimization**: Pre-warm Pod pool to reduce Job creation latency
- **Dynamic PVC provisioning**: Auto-create PVCs for new groups
- **Multi-cluster support**: Federate Jobs across multiple K8s clusters

### Long-Term (6-12 months)

- **Native K8s IPC**: Replace filesystem polling with HTTP (Pod → Service)
- **Serverless integration**: Knative for auto-scaling (scale to zero when idle)
- **Operator pattern**: Custom Resource Definitions (CRD) for NanoClaw groups

---

## 8. Conclusion

Deploying NanoClaw on Kubernetes/OpenShift unlocks multi-node scaling, resource orchestration, and enterprise security without sacrificing simplicity. The **Job-based architecture with PersistentVolumeClaims** provides the best balance of low complexity, OpenShift compatibility, and clear evolution paths. Implementation requires minimal code changes (~500 LOC) and preserves the existing IPC mechanism.

For organizations running NanoClaw at scale (>10 groups, multi-node), this migration enables cloud-native deployment patterns while maintaining the framework's core philosophy: **secure by isolation, simple by design**.

---

## References

- NanoClaw source code: https://github.com/qwibitai/nanoclaw
- Kubernetes Jobs documentation: https://kubernetes.io/docs/concepts/workloads/controllers/job/
- OpenShift Security Context Constraints: https://docs.openshift.com/container-platform/4.12/authentication/managing-security-context-constraints.html
- PersistentVolumes with ReadWriteMany: https://kubernetes.io/docs/concepts/storage/persistent-volumes/#access-modes

