Troubleshoot common Flokoa issues

Most problems with Flokoa fall into one of four categories: agents that won’t start or keep crashing, model or provider configuration issues, tool connectivity failures, and operator-level problems such as missing CRDs. Work through the accordion items below that match your symptom. Each item starts with the fastest diagnostic command and then walks you through the likely root causes in order of frequency.

Agent issues

Agent stuck in Pending

An agent stays in Pending when Kubernetes cannot start the pods. Begin by finding the pods and reading their events.Check which pods exist (or don’t):

kubectl get pods -l flokoa.ai/agent=<agent-name>

Read the pod events for scheduling or image-pull failures:

kubectl get events --field-selector involvedObject.name=<agent-name>
kubectl describe pod <pod-name>

Common causes and fixes:

Image not found or inaccessible — verify the image name and tag exist in your registry. Check imagePullPolicy in the spec; Always forces a fresh pull on every start, which can surface auth errors.
Missing image pull secret — for private registries, create a pull secret and reference it in the Agent spec under spec.runtime.spec.imagePullSecrets.
Insufficient cluster resources — the scheduler may be unable to place the pod if no node has enough free CPU or memory. Run kubectl describe nodes to see allocatable capacity versus current requests.
Unresolvable model or tool reference — if spec.model.name references a Model that does not exist or is in the wrong namespace, the operator holds the agent in Pending. Run kubectl describe agent <name> and read the status.conditions array for a ModelResolved: False condition.

# Confirm the agent's current conditions
kubectl describe agent <agent-name>
kubectl get agent <agent-name> -o jsonpath='{.status.conditions}'

Agent pods are crashing (CrashLoopBackOff)

CrashLoopBackOff means the container starts but exits immediately. Kubernetes keeps restarting it with increasing back-off delays.Read the logs from the most recent (crashed) pod:

kubectl logs <pod-name>
# Or use the previous container's logs if the current one has already restarted
kubectl logs <pod-name> --previous

Common causes and fixes:

Missing API key secret — the most frequent cause. If your agent reads an environment variable backed by a secretKeyRef, verify the Secret exists and the key name is correct:
kubectl get secret <secret-name> kubectl get secret <secret-name> -o jsonpath='{.data}' | tr ',' '\n'
Missing or misconfigured environment variables — cross-check every env entry in the Agent spec against what your application code expects.
Resource limits too tight — if the container is killed with exit code 137, it was OOM-killed. Increase resources.limits.memory in the Agent spec.
Failing health probe — if the liveness probe fires before the application finishes starting up, Kubernetes kills the container repeatedly. Increase initialDelaySeconds on the livenessProbe.
Application startup error — read the full log output carefully. Stack traces or missing dependency errors appear here.

Agent service not reachable

If an agent is Running but you cannot send it requests, work through the network path from the outside in.Verify the agent has reached Running phase:

kubectl get agent my-agent -o jsonpath='{.status.phase}'

Confirm the Service exists and has endpoints:

kubectl get svc -l flokoa.ai/agent=<agent-name>
kubectl get endpoints -l flokoa.ai/agent=<agent-name>

Test connectivity with a port-forward (bypasses Ingress and NetworkPolicies):

kubectl port-forward svc/my-agent 8080:8080
# In another terminal:
curl http://localhost:8080/health

If the port-forward works but direct cluster traffic does not, a NetworkPolicy is blocking the path. Review any NetworkPolicy resources that select your agent’s pods and ensure your client’s namespace or IP range is permitted in the ingress rules.If the port-forward also fails, the readiness probe is likely reporting unhealthy. Check the pod events and probe configuration:

kubectl describe pod <pod-name>

Model and provider issues

ModelProvider not ready

A ModelProvider that stays not-ready prevents every Model that references it from resolving, which in turn blocks any Agent that references those models.Inspect the provider’s conditions:

kubectl describe modelprovider <provider-name>

Verify the Secret referenced by apiKeySecretRef exists:

kubectl get secret <secret-name>

Check that the key name in the Secret matches apiKeySecretRef.key:

# Decode and print the value to verify it is non-empty and correctly formatted
kubectl get secret <secret-name> -o jsonpath='{.data.api-key}' | base64 -d

Common causes and fixes:

Secret does not exist — create it with kubectl create secret generic <name> --from-literal=api-key=<value>.
Wrong key name — the key field in apiKeySecretRef must exactly match a key in the Secret’s data map.
Invalid API key — decode the stored value and confirm it is valid by testing it directly against the provider’s API outside of Kubernetes.
Wrong namespace — Secrets are namespaced. Make sure the Secret and the ModelProvider are in the same namespace.

Model not found error in agent

This error appears in agent logs or in the ModelResolved status condition when the operator cannot locate the referenced Model resource.Verify the Model exists:

kubectl get model <model-name>
# If you suspect a namespace issue:
kubectl get models --all-namespaces

Check whether the model is ready:

kubectl get model <model-name> -o jsonpath='{.status.ready}'

Confirm the ModelProvider that backs this model is also ready:

kubectl get modelprovider <provider-name> -o jsonpath='{.status.ready}'

Common causes and fixes:

Namespace mismatch — if the Model is in a different namespace from the Agent, you must set spec.model.namespace in the Agent spec. The operator does not search other namespaces by default.
Typo in the model name — spec.model.name is case-sensitive and must exactly match the Model resource’s metadata.name.
Provider not ready — a Model cannot become ready until its ModelProvider is ready. Fix the provider first.

Tool issues

AgentTool not working

When an agent logs errors about a tool or the tool produces no results, start with the tool’s own status and then test the underlying endpoint directly.Inspect the tool’s conditions:

kubectl describe agenttool <tool-name>

Test the tool’s endpoint from inside the cluster (avoids false negatives caused by your local network):

kubectl run -it --rm debug --image=curlimages/curl --restart=Never -- \
  curl -v <tool-url>/openapi.json

Common causes and fixes:

Invalid OpenAPI spec — if the Validated condition is False, the operator could not parse the spec. Use an online validator (for example, editor.swagger.io) to check the spec at the path returned by endpointPath.
Wrong endpoint path — confirm the service actually serves its OpenAPI document at the path you specified in openApiSchema.endpointPath.
ConfigMap key missing — if you use openApiSchema.valueFrom, verify the ConfigMap exists and contains the expected key:
kubectl get configmap <configmap-name> -o jsonpath='{.data}'
Internal service not reachable — for serviceRef tools, confirm the referenced Service exists and is in the correct namespace:
kubectl get svc <service-name> -n <namespace>
Timeout too short — if the backing API is slow to respond, the agent may be hitting the default 30-second timeout before the response arrives. Increase timeoutSeconds in the tool spec.

Tool timeout errors

Tool timeouts typically mean the upstream service is slow or unreachable from within the cluster.Increase the timeout in the AgentTool spec:

spec:
  openApi:
    timeoutSeconds: 120  # Raised from the default 30s

Apply the change with a patch:

kubectl patch agenttool <tool-name> --type='json' \
  -p='[{"op": "replace", "path": "/spec/openApi/timeoutSeconds", "value": 120}]'

Test connectivity from an agent pod:

# Get a pod name
kubectl get pods -l flokoa.ai/agent=<agent-name> -o name

# Curl the tool endpoint from inside the pod
kubectl exec -it <pod-name> -- curl -v <tool-url>/health

Check NetworkPolicies — if the agent namespace has egress restrictions, verify that traffic to the tool endpoint’s IP range and port is explicitly allowed. DNS (UDP 53) must also be permitted so that service names resolve.

Operator issues

Operator pod not running

If the operator is not running, no CRD reconciliation occurs, meaning new agents won’t start and existing ones won’t be updated.Check the operator pod status:

kubectl get pods -n flokoa-system

Describe the pod for events (image pull errors, OOM, etc.):

kubectl describe pod -n flokoa-system <operator-pod-name>

Read the operator logs for startup errors:

kubectl logs -n flokoa-system -l control-plane=controller-manager

If the pod is in ImagePullBackOff, your cluster cannot reach ghcr.io. Check your cluster’s egress rules and any image pull secrets configured via the Helm images.pullSecrets value.If the pod is in CrashLoopBackOff, the --previous flag retrieves logs from the container that just crashed:

kubectl logs -n flokoa-system -l control-plane=controller-manager --previous

CRDs not found (no kind 'Agent' found)

This error from kubectl means the CRDs were never installed, were accidentally deleted, or the cluster is targeting the wrong Kubernetes context.Verify which CRDs are present:

kubectl get crds | grep flokoa

Re-apply the install bundle to restore any missing CRDs:

kubectl apply -f https://github.com/danielnyari/flokoa/releases/latest/download/install.yaml

If you installed via Helm and CRDs are missing, they may have been removed during a helm uninstall. Re-install with crds.install=true (the default):

helm upgrade --install flokoa oci://ghcr.io/danielnyari/charts/flokoa \
  --namespace flokoa-system \
  --create-namespace \
  --set crds.install=true

Check your kubeconfig context — if you manage multiple clusters, confirm you are pointing at the right one:

kubectl config current-context
kubectl config get-contexts

Getting more help

If you have worked through the items above and still cannot resolve your issue, the following resources are available:

GitHub Repository

Browse the source code, read open issues, and check the changelog for recent fixes that may address your problem.

File an Issue

Open a new GitHub issue. Include the output of kubectl describe agent <name>, relevant pod logs, and your Flokoa version (kubectl get crds agents.agent.flokoa.ai -o jsonpath='{.metadata.annotations}').

Get Started

Core Resources

Guides

Python SDK

Operations

Troubleshoot common Flokoa issues

Agent issues

Model and provider issues

Tool issues

Operator issues

Getting more help

GitHub Repository

File an Issue

​Agent issues

​Model and provider issues

​Tool issues

​Operator issues

​Getting more help

GitHub Repository

File an Issue

Agent issues

Model and provider issues

Tool issues

Operator issues

Getting more help