Run Flokoa agents with production-grade high availability

A single-replica agent pod is a single point of failure. Node evictions, image pulls, rolling restarts, and unplanned hardware failures can all take your agent offline without warning. This guide shows you how to combine multiple replicas, resource boundaries, health probes, and pod scheduling rules into an Agent spec that stays available through the disruptions that are normal in any production Kubernetes cluster.

Never use the :latest image tag in production. It prevents deterministic rollbacks, makes it impossible to audit exactly what code is running, and can cause unexpected behavior when the upstream image changes between pod restarts. Pin every image to a specific version tag.

Multiple replicas

The simplest HA change is setting replicas to three or more. The Flokoa operator propagates this value to the underlying Deployment, so Kubernetes distributes the pods across available nodes.

spec:
  runtime:
    type: standard
    spec:
      replicas: 3
      container:
        name: agent
        image: ghcr.io/example/my-agent:v2.0.0
        ports:
        - containerPort: 8080
          name: http

With multiple replicas, the ClusterIP Service created by the operator load-balances requests across all healthy pods. A single pod failure does not interrupt traffic.

Resource requests and limits

Always set both requests and limits for CPU and memory. Without requests, the Kubernetes scheduler cannot make informed placement decisions. Without limits, a misbehaving agent can exhaust node resources and destabilize other workloads.

spec:
  runtime:
    spec:
      container:
        resources:
          requests:
            cpu: "500m"
            memory: "512Mi"
          limits:
            cpu: "2000m"
            memory: "2Gi"

Start with the values above for a typical agent workload and tune them using kubectl top pods once you have production traffic data.

Health probes

A liveness probe restarts a pod that has become deadlocked or unresponsive. A readiness probe removes a pod from the Service’s endpoint list until it is fully initialised and ready to handle requests. Both are essential for safe rolling updates.

spec:
  runtime:
    spec:
      container:
        livenessProbe:
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 30
          periodSeconds: 10
          timeoutSeconds: 5
          failureThreshold: 3
        readinessProbe:
          httpGet:
            path: /ready
            port: 8080
          initialDelaySeconds: 10
          periodSeconds: 5
          timeoutSeconds: 3
          successThreshold: 1
          failureThreshold: 3

Your agent container must expose a /health endpoint that returns a 2xx status code when the process is alive, and a /ready endpoint that returns a 2xx status code only after the agent has loaded its model configuration and is prepared to accept requests.

Pod anti-affinity

By default, Kubernetes may schedule all three replicas on the same node. Pod anti-affinity rules ask the scheduler to spread them out. The preferred form below adds a strong preference without making it a hard requirement, which avoids scheduling deadlocks on smaller clusters.

spec:
  runtime:
    spec:
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            podAffinityTerm:
              labelSelector:
                matchLabels:
                  flokoa.ai/agent: my-agent
              topologyKey: kubernetes.io/hostname

To spread across availability zones instead of individual nodes, change topologyKey to topology.kubernetes.io/zone. You can stack both rules with different weights if your cluster spans multiple zones with multiple nodes per zone.

Rolling updates

When you change any field in the Agent spec — including the container image, environment variables, or resource limits — the Flokoa operator updates the underlying Deployment. Kubernetes then performs a rolling update automatically: it starts a new pod, waits for it to pass its readiness probe, and only then terminates one of the old pods. This process repeats until all replicas are running the new configuration. Because the rollout relies on readiness probes to gate each step, configuring those probes correctly (as shown above) is what makes rolling updates zero-downtime in practice. To trigger an update, patch the image field directly:

kubectl patch agent my-agent --type='json' \
  -p='[{"op": "replace", "path": "/spec/runtime/spec/container/image", "value": "ghcr.io/example/my-agent:v2.1.0"}]'

Watch the rollout progress:

kubectl get agents -w

Secrets and ConfigMaps

Reference sensitive values from Kubernetes Secrets using secretKeyRef rather than placing them directly in the Agent spec. This applies to API keys, database passwords, and any other credential your agent needs at runtime.

spec:
  runtime:
    spec:
      container:
        env:
        - name: API_KEY
          valueFrom:
            secretKeyRef:
              name: agent-secrets
              key: api-key
        - name: LOG_LEVEL
          valueFrom:
            configMapKeyRef:
              name: agent-config
              key: log-level

Non-sensitive configuration such as log levels, feature flags, and environment identifiers belongs in a ConfigMap, not a Secret. Keeping the two separate makes it easier to manage RBAC policies that restrict Secret access.

Use separate namespaces for dev, staging, and production. This lets you apply namespace-level ResourceQuotas, NetworkPolicies, and RBAC rules to each environment independently, without any risk of a development agent accidentally talking to a production model provider or tool.

Complete HA example

The following manifest combines every recommendation on this page into a single production-ready Agent spec:

apiVersion: agent.flokoa.ai/v1alpha1
kind: Agent
metadata:
  name: production-agent
spec:
  framework: pydantic-ai

  model:
    name: gpt-4o-model

  runtime:
    type: standard
    spec:
      replicas: 3

      container:
        name: agent
        image: ghcr.io/example/my-agent:v2.0.0

        ports:
        - containerPort: 8080
          name: http

        env:
        - name: ENVIRONMENT
          value: "production"
        - name: API_KEY
          valueFrom:
            secretKeyRef:
              name: agent-secrets
              key: api-key

        resources:
          requests:
            cpu: "500m"
            memory: "512Mi"
          limits:
            cpu: "2000m"
            memory: "2Gi"

        livenessProbe:
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 30
          periodSeconds: 10
          timeoutSeconds: 5
          failureThreshold: 3

        readinessProbe:
          httpGet:
            path: /ready
            port: 8080
          initialDelaySeconds: 10
          periodSeconds: 5
          timeoutSeconds: 3
          successThreshold: 1
          failureThreshold: 3

      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            podAffinityTerm:
              labelSelector:
                matchLabels:
                  flokoa.ai/agent: production-agent
              topologyKey: kubernetes.io/hostname

Apply this manifest and verify all three replicas become available:

kubectl apply -f production-agent.yaml
kubectl get agents -w

Once AVAILABLE REPLICAS reaches three, your agent is running in a highly available configuration.

Get Started

Core Resources

Guides

Python SDK

Operations

Run Flokoa agents with production-grade high availability

Multiple replicas

Resource requests and limits

Health probes

Pod anti-affinity

Rolling updates

Secrets and ConfigMaps

Complete HA example

​Multiple replicas

​Resource requests and limits

​Health probes

​Pod anti-affinity

​Rolling updates

​Secrets and ConfigMaps

​Complete HA example

Multiple replicas

Resource requests and limits

Health probes

Pod anti-affinity

Rolling updates

Secrets and ConfigMaps

Complete HA example