A single-replica agent pod is a single point of failure. Node evictions, image pulls, rolling restarts, and unplanned hardware failures can all take your agent offline without warning. This guide shows you how to combine multiple replicas, resource boundaries, health probes, and pod scheduling rules into an Agent spec that stays available through the disruptions that are normal in any production Kubernetes cluster.
Never use the :latest image tag in production. It prevents deterministic rollbacks, makes it impossible to audit exactly what code is running, and can cause unexpected behavior when the upstream image changes between pod restarts. Pin every image to a specific version tag.
Multiple replicas
The simplest HA change is setting replicas to three or more. The Flokoa operator propagates this value to the underlying Deployment, so Kubernetes distributes the pods across available nodes.
spec:
runtime:
type: standard
spec:
replicas: 3
container:
name: agent
image: ghcr.io/example/my-agent:v2.0.0
ports:
- containerPort: 8080
name: http
With multiple replicas, the ClusterIP Service created by the operator load-balances requests across all healthy pods. A single pod failure does not interrupt traffic.
Resource requests and limits
Always set both requests and limits for CPU and memory. Without requests, the Kubernetes scheduler cannot make informed placement decisions. Without limits, a misbehaving agent can exhaust node resources and destabilize other workloads.
spec:
runtime:
spec:
container:
resources:
requests:
cpu: "500m"
memory: "512Mi"
limits:
cpu: "2000m"
memory: "2Gi"
Start with the values above for a typical agent workload and tune them using kubectl top pods once you have production traffic data.
Health probes
A liveness probe restarts a pod that has become deadlocked or unresponsive. A readiness probe removes a pod from the Service’s endpoint list until it is fully initialised and ready to handle requests. Both are essential for safe rolling updates.
spec:
runtime:
spec:
container:
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 10
periodSeconds: 5
timeoutSeconds: 3
successThreshold: 1
failureThreshold: 3
Your agent container must expose a /health endpoint that returns a 2xx status code when the process is alive, and a /ready endpoint that returns a 2xx status code only after the agent has loaded its model configuration and is prepared to accept requests.
Pod anti-affinity
By default, Kubernetes may schedule all three replicas on the same node. Pod anti-affinity rules ask the scheduler to spread them out. The preferred form below adds a strong preference without making it a hard requirement, which avoids scheduling deadlocks on smaller clusters.
spec:
runtime:
spec:
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchLabels:
flokoa.ai/agent: my-agent
topologyKey: kubernetes.io/hostname
To spread across availability zones instead of individual nodes, change topologyKey to topology.kubernetes.io/zone. You can stack both rules with different weights if your cluster spans multiple zones with multiple nodes per zone.
Rolling updates
When you change any field in the Agent spec — including the container image, environment variables, or resource limits — the Flokoa operator updates the underlying Deployment. Kubernetes then performs a rolling update automatically: it starts a new pod, waits for it to pass its readiness probe, and only then terminates one of the old pods. This process repeats until all replicas are running the new configuration.
Because the rollout relies on readiness probes to gate each step, configuring those probes correctly (as shown above) is what makes rolling updates zero-downtime in practice.
To trigger an update, patch the image field directly:
kubectl patch agent my-agent --type='json' \
-p='[{"op": "replace", "path": "/spec/runtime/spec/container/image", "value": "ghcr.io/example/my-agent:v2.1.0"}]'
Watch the rollout progress:
Secrets and ConfigMaps
Reference sensitive values from Kubernetes Secrets using secretKeyRef rather than placing them directly in the Agent spec. This applies to API keys, database passwords, and any other credential your agent needs at runtime.
spec:
runtime:
spec:
container:
env:
- name: API_KEY
valueFrom:
secretKeyRef:
name: agent-secrets
key: api-key
- name: LOG_LEVEL
valueFrom:
configMapKeyRef:
name: agent-config
key: log-level
Non-sensitive configuration such as log levels, feature flags, and environment identifiers belongs in a ConfigMap, not a Secret. Keeping the two separate makes it easier to manage RBAC policies that restrict Secret access.
Use separate namespaces for dev, staging, and production. This lets you apply namespace-level ResourceQuotas, NetworkPolicies, and RBAC rules to each environment independently, without any risk of a development agent accidentally talking to a production model provider or tool.
Complete HA example
The following manifest combines every recommendation on this page into a single production-ready Agent spec:
apiVersion: agent.flokoa.ai/v1alpha1
kind: Agent
metadata:
name: production-agent
spec:
framework: pydantic-ai
model:
name: gpt-4o-model
runtime:
type: standard
spec:
replicas: 3
container:
name: agent
image: ghcr.io/example/my-agent:v2.0.0
ports:
- containerPort: 8080
name: http
env:
- name: ENVIRONMENT
value: "production"
- name: API_KEY
valueFrom:
secretKeyRef:
name: agent-secrets
key: api-key
resources:
requests:
cpu: "500m"
memory: "512Mi"
limits:
cpu: "2000m"
memory: "2Gi"
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 10
periodSeconds: 5
timeoutSeconds: 3
successThreshold: 1
failureThreshold: 3
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchLabels:
flokoa.ai/agent: production-agent
topologyKey: kubernetes.io/hostname
Apply this manifest and verify all three replicas become available:
kubectl apply -f production-agent.yaml
kubectl get agents -w
Once AVAILABLE REPLICAS reaches three, your agent is running in a highly available configuration.