Skip to main content
Flokoa agents run as standard Kubernetes Deployments, so every observability tool that works in your cluster works with them too. You can tail logs with kubectl, scrape Prometheus metrics from a dedicated port, instrument your agent code with the OpenTelemetry SDK for end-to-end distributed tracing, and interrogate real-time health through the status.conditions array on every Agent resource. This page covers each approach in turn.

Logs

Flokoa labels every pod it creates with flokoa.ai/agent=<name>, so you can target logs by agent name regardless of how many replicas are running.
# Print logs from the agent's pods (most recent output)
kubectl logs -l flokoa.ai/agent=my-agent

# Follow logs in real time across all replicas and containers
kubectl logs -l flokoa.ai/agent=my-agent -f --all-containers
Declaring the framework field on your Agent spec (for example, spec.framework: pydantic-ai) enables richer log correlation. The operator injects framework-specific labels and annotations that many log aggregation tools — such as Grafana Loki or Datadog — can use to group and filter traces automatically.
Operator logs reveal reconciliation activity, including why an agent moved to Failed phase or why a model reference could not be resolved:
# Print recent operator logs
kubectl logs -n flokoa-system -l control-plane=controller-manager

# Follow operator logs in real time
kubectl logs -n flokoa-system -l control-plane=controller-manager -f

Metrics

Expose a dedicated metrics port from your agent container so that Prometheus (or any compatible scraper) can collect custom metrics alongside standard Kubernetes resource metrics. Declare the metrics port in your Agent spec alongside the main HTTP port:
spec:
  runtime:
    spec:
      container:
        ports:
        - containerPort: 8080
          name: http
        - containerPort: 9090
          name: metrics
The Flokoa operator itself also exposes metrics on port 8443 (HTTPS) in the flokoa-system namespace. You can enable a Prometheus ServiceMonitor for the operator by setting controller.metrics.serviceMonitor.enabled=true in the Helm values. Monitor live CPU and memory consumption for all replicas of an agent:
kubectl top pods -l flokoa.ai/agent=my-agent

Distributed Tracing with OpenTelemetry

Flokoa integrates with the OpenTelemetry ecosystem to give you end-to-end trace visibility across A2A request spans and framework-level spans such as LLM calls and tool invocations.
1

Install the tracing extra

Add the tracing optional dependency to your agent’s Python environment. This pulls in the OpenTelemetry SDK and the pydantic-ai instrumentation package.
pip install "flokoa[tracing]"
2

Configure the OTEL exporter via environment variables

Set the standard OpenTelemetry environment variables in your Agent spec. The flokoa run CLI automatically initialises the tracer provider at startup when the tracing extra is installed — no code changes are required.
spec:
  runtime:
    spec:
      container:
        env:
        - name: OTEL_EXPORTER_OTLP_ENDPOINT
          value: "http://otel-collector:4317"
        - name: OTEL_SERVICE_NAME
          value: "my-agent"
        - name: OTEL_TRACES_EXPORTER
          value: "otlp"
3

Deploy an OpenTelemetry Collector (if needed)

Point OTEL_EXPORTER_OTLP_ENDPOINT at an existing collector in your cluster, or deploy the OpenTelemetry Operator to manage collectors as CRDs. The collector can forward traces to any backend — Jaeger, Tempo, Honeycomb, Datadog, and others.
When the tracing extra is installed, traces automatically include:
  • A2A request spans — one span per incoming agent task, including input and output metadata
  • Framework-level spans — for pydantic-ai, each LLM call and tool invocation is captured as a child span

Status Conditions

The operator maintains a status.conditions array on every Agent resource that provides a machine-readable summary of the agent’s health. Each condition has the following fields:
FieldDescription
typeThe condition name, e.g. Ready or ModelResolved
status"True", "False", or "Unknown"
reasonA short CamelCase code, e.g. DeploymentAvailable
messageA human-readable explanation of the current state
lastTransitionTimeISO 8601 timestamp of the most recent status change
Common condition types:
  • ReadyTrue when the agent’s Deployment has the desired number of available replicas and the Service is reachable.
  • ModelResolvedTrue when the referenced Model and its ModelProvider have been found, validated, and injected into the agent’s environment.
Retrieve the full conditions array for an agent at any time:
kubectl get agent my-agent -o jsonpath='{.status.conditions}'
To check a specific condition in a script:
kubectl get agent my-agent \
  -o jsonpath='{.status.conditions[?(@.type=="Ready")].status}'

Agent Status Phase

The status.phase field provides a coarse-grained summary of the agent’s state. The three possible values are:

Pending

The operator has created the Deployment but pods are still being scheduled or the container image is being pulled.

Running

At least one pod is running and the Service endpoint is available. The agent is ready to receive traffic.

Failed

The Deployment failed to reach a healthy state. Inspect status.conditions and pod events for the root cause.
Watch the phase transition in real time after applying a new Agent manifest:
kubectl get agents -w