Agent: deploy and manage AI agents on Kubernetes

The Agent CRD is the core building block of Flokoa. It represents a fully deployable AI agent inside your Kubernetes cluster, combining container runtime configuration with LLM model references, tool bindings, and A2A (Agent-to-Agent) protocol metadata. When you apply an Agent manifest, the Flokoa operator reconciles it into a Kubernetes Deployment, creates a Service, and continuously reports the agent’s lifecycle state back through the resource’s status fields.

API reference

apiVersion: agent.flokoa.ai/v1alpha1
kind: Agent

Minimal configuration

Every Agent requires a runtime block with a type and at least one container definition. The example below is the smallest valid manifest you can apply:

apiVersion: agent.flokoa.ai/v1alpha1
kind: Agent
metadata:
  name: minimal-agent
spec:
  card:
    name: "Minimal Agent"
    description: "A basic agent"
    version: "1.0.0"
    defaultInputModes:
      - "application/json"
    defaultOutputModes:
      - "application/json"
    capabilities:
      streaming: false
    skills: []
  runtime:
    type: standard
    spec:
      container:
        name: agent
        image: ghcr.io/example/agent:latest
        ports:
          - containerPort: 8080
            name: http

Runtime modes

Flokoa supports two runtime modes. Use standard when you are bringing your own agent image, and template when you want the operator to manage the runtime for you.

standard
template

In standard mode you provide your own container image. The operator wraps it in a Deployment and Service, but all application logic lives in your image.

spec:
  runtime:
    type: standard
    spec:
      replicas: 2
      container:
        name: agent
        image: ghcr.io/example/my-agent:v1.2.0
        ports:
          - containerPort: 8080
            name: http
        env:
          - name: LOG_LEVEL
            value: "info"
        resources:
          requests:
            cpu: "200m"
            memory: "256Mi"
          limits:
            cpu: "1000m"
            memory: "1Gi"
        livenessProbe:
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /ready
            port: 8080
          initialDelaySeconds: 10
          periodSeconds: 5

In template mode the operator runs a generic runtime image managed by Flokoa. You define the agent’s behaviour entirely through spec.instruction and an optional output schema — no custom image required.

spec:
  instruction:
    template: "You are a helpful customer support agent."
  runtime:
    type: template
    spec:
      replicas: 1
      config:
        outputSchema:
          name: "SupportResponse"
          description: "Structured response from the support agent"
          jsonSchema:
            type: object
            properties:
              answer:
                type: string
              confidence:
                type: number
      resources:
        requests:
          cpu: "200m"
          memory: "256Mi"
      env:
        - name: EXTRA_VAR
          value: "value"

The template runtime requires a spec.instruction entry — either an inline template string or an instructionRef pointing to an existing Instruction CR.

Spec reference

card — Agent metadata (A2A protocol)

The card block exposes your agent via the A2A (Agent-to-Agent) protocol and is the primary way other agents and orchestrators discover your agent’s capabilities.

Field	Type	Required	Description
`name`	string	✅	Human-readable agent name
`description`	string	✅	What the agent does
`version`	string	✅	Semantic version of the agent
`defaultInputModes`	string[]	✅	Accepted input MIME types (e.g. `application/json`)
`defaultOutputModes`	string[]	✅	Produced output MIME types
`capabilities.streaming`	bool	—	Whether the agent supports streaming responses
`capabilities.pushNotifications`	bool	—	Whether the agent supports push notifications
`capabilities.stateTransitionHistory`	bool	—	Whether the agent exposes task state history
`skills`	object[]	✅	List of skills (see below)

Each skill in skills has:

Field	Description
`id`	Unique skill identifier
`name`	Human-readable skill name
`description`	What the skill does
`tags`	Categorization keywords
`examples`	Sample prompts demonstrating the skill
`inputModes`	Per-skill input MIME override
`outputModes`	Per-skill output MIME override

spec:
  card:
    name: "Customer Support Agent"
    description: "Handles customer inquiries and support requests"
    version: "1.0.0"
    defaultInputModes:
      - "application/json"
    defaultOutputModes:
      - "application/json"
    capabilities:
      streaming: true
      pushNotifications: false
      stateTransitionHistory: true
    skills:
      - id: "answer-questions"
        name: "Answer Customer Questions"
        description: "Provide answers to common customer questions"
        tags:
          - "support"
          - "faq"
        examples:
          - "What are your business hours?"
          - "How do I reset my password?"

runtime — Deployment configuration

The runtime block controls how your agent is deployed.

Field	Description
`type`	`standard` (your image) or `template` (operator-managed)
`spec.replicas`	Number of pod replicas (default: `1`)
`spec.container`	Full Kubernetes container spec (standard mode)
`spec.volumes`	Pod volumes (standard mode)
`spec.serviceAccountName`	ServiceAccount for the pod
`spec.nodeSelector`	Schedule pods on matching nodes
`spec.tolerations`	Allow scheduling on tainted nodes
`spec.affinity`	Advanced scheduling rules
`spec.imagePullSecrets`	Secrets for private registries
`spec.securityContext`	Pod-level security attributes
`spec.config`	Agent config with output schema (template mode)
`spec.env`	Extra environment variables (template mode)
`spec.resources`	CPU/memory requests and limits (template mode)

model — LLM reference

Attach a Model CR so your agent has access to an LLM at runtime. The operator injects the necessary connection details as environment variables.

Field	Description
`model.name`	Name of the `Model` CR
`model.namespace`	Namespace of the `Model` CR (defaults to agent’s namespace)

spec:
  model:
    name: gpt-4o-model
    namespace: shared-models   # optional

instruction — System prompt

Attach a system prompt either inline or by referencing an existing Instruction CR. Supported in both standard and template runtime modes.

# Inline — operator creates a child Instruction CR automatically
spec:
  instruction:
    template: "You are a helpful assistant that answers questions concisely."

# Reference — reuse a shared Instruction across multiple agents
spec:
  instruction:
    instructionRef:
      name: customer-support-prompt
      namespace: shared-resources   # optional

tools — Tool bindings

Tools give your agent access to external APIs. You can reference an existing AgentTool CR or define a tool inline.

spec:
  tools:
    # Reference a shared tool
    - toolRef:
        name: weather-api
        namespace: shared-tools   # optional

    # Inline tool definition
    - name: product-search
      template:
        type: openapi
        description: "Search the product catalogue"
        openApi:
          url: "https://api.example.com"
          openApiSchema:
            endpointPath: "/openapi.json"

framework — Observability hint

Explicitly declaring the AI framework lets Flokoa and your observability stack identify the agent’s type in logs and metrics.

spec:
  framework: pydantic-ai

Supported values: pydantic-ai, langchain, google-adk, crewai, marvin, autogen, a2a

Status fields

The operator writes the following fields to status after reconciling an Agent:

status:
  phase: Running           # Pending | Running | Failed
  backend: standard        # Runtime backend in use
  url: http://my-agent.default.svc.cluster.local:8080
  replicas: 2
  availableReplicas: 2
  detectedFramework: pydantic-ai
  lastToolSync: "2026-01-15T10:30:00Z"
  observedGeneration: 3
  conditions:
    - type: Ready
      status: "True"
      lastTransitionTime: "2026-01-15T10:30:00Z"
      reason: DeploymentAvailable
      message: "Agent is running and available"

Field	Description
`phase`	Lifecycle phase: `Pending`, `Running`, or `Failed`
`backend`	Active runtime backend
`url`	In-cluster endpoint for calling the agent
`replicas`	Current number of pod replicas
`availableReplicas`	Number of replicas that are ready
`detectedFramework`	Framework detected from the container image
`lastToolSync`	Timestamp of the last tool synchronisation
`conditions`	Standard Kubernetes condition array
`observedGeneration`	Last spec generation reconciled by the operator

Production example

The following manifest deploys a highly available agent with health probes, resource limits, and pod anti-affinity:

apiVersion: agent.flokoa.ai/v1alpha1
kind: Agent
metadata:
  name: production-agent
  namespace: production
spec:
  framework: pydantic-ai

  card:
    name: "Production Support Agent"
    description: "Customer-facing support agent with HA configuration"
    version: "2.0.0"
    defaultInputModes:
      - "application/json"
    defaultOutputModes:
      - "application/json"
    capabilities:
      streaming: true
    skills:
      - id: "support"
        name: "Customer Support"
        description: "Handle customer inquiries"
        tags: ["support"]

  model:
    name: gpt-4o-model

  tools:
    - toolRef:
        name: knowledge-base
    - toolRef:
        name: ticket-system

  runtime:
    type: standard
    spec:
      replicas: 3

      container:
        name: agent
        image: ghcr.io/example/support-agent:v2.0.0

        ports:
          - containerPort: 8080
            name: http
          - containerPort: 9090
            name: metrics

        env:
          - name: ENVIRONMENT
            value: "production"
          - name: API_KEY
            valueFrom:
              secretKeyRef:
                name: agent-secrets
                key: api-key

        resources:
          requests:
            cpu: "500m"
            memory: "512Mi"
          limits:
            cpu: "2000m"
            memory: "2Gi"

        livenessProbe:
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 30
          periodSeconds: 10
          timeoutSeconds: 5
          failureThreshold: 3

        readinessProbe:
          httpGet:
            path: /ready
            port: 8080
          initialDelaySeconds: 10
          periodSeconds: 5
          timeoutSeconds: 3
          successThreshold: 1
          failureThreshold: 3

        securityContext:
          allowPrivilegeEscalation: false
          readOnlyRootFilesystem: true
          runAsNonRoot: true
          runAsUser: 1000
          capabilities:
            drop:
              - ALL

      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
            - weight: 100
              podAffinityTerm:
                labelSelector:
                  matchLabels:
                    flokoa.ai/agent: production-agent
                topologyKey: topology.kubernetes.io/zone

kubectl operations

# List all agents and their phases
kubectl get agents

# Watch live status updates
kubectl get agents -w

# Inspect a specific agent in detail
kubectl describe agent my-agent

# Get the agent's endpoint URL
kubectl get agent my-agent -o jsonpath='{.status.url}'

# Scale replicas
kubectl patch agent my-agent --type='json' \
  -p='[{"op": "replace", "path": "/spec/runtime/spec/replicas", "value": 5}]'

# Update the container image
kubectl patch agent my-agent --type='json' \
  -p='[{"op": "replace", "path": "/spec/runtime/spec/container/image", "value": "new-image:v2.0.0"}]'

# Get pods belonging to an agent
kubectl get pods -l flokoa.ai/agent=my-agent

# Stream logs from all replicas
kubectl logs -l flokoa.ai/agent=my-agent --all-containers=true -f

# Delete an agent
kubectl delete agent my-agent

Best practices

Always set resource requests and limits to prevent agents from starving or monopolising cluster nodes.
Add liveness and readiness probes so Kubernetes can route traffic only to healthy replicas and self-heal on crashes.
Run at least two replicas in production and combine with pod anti-affinity to spread them across zones.
Declare the framework explicitly — spec.framework improves observability and future tooling integration.
Never put secrets in the Agent spec — use secretKeyRef in env or a mounted Kubernetes Secret volume.
Set container security contexts — run as non-root with readOnlyRootFilesystem: true and drop all capabilities.
Pin image tags — avoid latest in production so rollbacks are predictable.
Use standard mode for custom logic, template mode for prompt-driven agents — pick the mode that matches your workload.
Share Model and AgentTool CRs across agents using cross-namespace references to reduce duplication.
Start minimal and iterate — validate a one-replica, no-probe configuration before adding production hardening.

Troubleshooting

Agent is stuck in Pending phase

The most common causes are an inaccessible container image or insufficient cluster resources.

# Check events on the agent pods
kubectl describe pods -l flokoa.ai/agent=my-agent

# Verify the image is pullable
kubectl get pods -l flokoa.ai/agent=my-agent -o jsonpath='{.items[*].status.containerStatuses[*].state}'

Confirm the image tag exists in the registry.
If using a private registry, ensure imagePullSecrets is set.
Check that nodes have sufficient CPU and memory with kubectl describe nodes.

Agent pods are crash-looping

# Read container logs for the error
kubectl logs -l flokoa.ai/agent=my-agent --previous

# Inspect the full pod spec that was generated
kubectl get pod <pod-name> -o yaml

Check that all secretKeyRef secrets exist in the correct namespace.
Verify health probe paths (/health, /ready) are implemented in your image.
Ensure resource limits are not too low — OOMKilled pods show reason: OOMKilled in status.containerStatuses.

Performance is degraded or responses are slow

# Check current resource consumption
kubectl top pods -l flokoa.ai/agent=my-agent

# Review recent events
kubectl get events --field-selector involvedObject.name=my-agent

Increase CPU/memory requests and limits if the pod is being throttled.
Scale out replicas if all pods are consistently high on CPU.
Check tool call latency — slow external APIs directly impact agent response time.

Networking or service connectivity issues

# Confirm the Service was created
kubectl get svc -l flokoa.ai/agent=my-agent

# Test reachability from inside the cluster
kubectl run -it --rm debug --image=curlimages/curl --restart=Never -- \
  curl http://my-agent.<namespace>.svc.cluster.local:8080/health

If the Service is missing, check operator logs: kubectl logs -n flokoa-system deploy/flokoa-operator.
Review NetworkPolicies that may be blocking traffic to or from the agent pods.

Get Started

Core Resources

Guides

Python SDK

Operations

Agent: deploy and manage AI agents on Kubernetes

API reference

Minimal configuration

Runtime modes

Spec reference

Status fields

Production example

kubectl operations

Best practices

Troubleshooting

​API reference

​Minimal configuration

​Runtime modes

​Spec reference

​Status fields

​Production example

​kubectl operations

​Best practices

​Troubleshooting

API reference

Minimal configuration

Runtime modes

Spec reference

Status fields

Production example

kubectl operations

Best practices

Troubleshooting