Model: configure LLM models and parameters

The Model CRD connects a specific LLM (such as gpt-4o or claude-sonnet-4-20250514) to its provider credentials and generation parameters. By separating model configuration from the Agent and ModelProvider resources, you can reuse the same model definition across many agents, adjust parameters independently, and manage versions through GitOps workflows. When an agent references a Model, the operator injects the model name and all parameters into the agent’s runtime environment at reconcile time.

API reference

apiVersion: agent.flokoa.ai/v1alpha1
kind: Model

Basic structure

apiVersion: agent.flokoa.ai/v1alpha1
kind: Model
metadata:
  name: gpt-4o-default
spec:
  model: "gpt-4o"

  providerRef:
    name: openai-provider
    namespace: shared-resources   # optional

  parameters:
    temperature: "0.7"
    maxTokens: 4096

Model names by provider

Use exactly these model identifier strings in spec.model. The value is passed directly to the provider API, so spelling and casing must be precise.

OpenAI
Anthropic
Google
AWS Bedrock

spec:
  model: "gpt-4o"          # GPT-4 Omni — general-purpose, recommended
  # model: "gpt-4o-mini"   # Smaller, faster, lower cost variant
  # model: "o1"            # OpenAI o1 reasoning model
  # model: "o3-mini"       # OpenAI o3-mini reasoning model

spec:
  model: "claude-sonnet-4-20250514"    # Claude Sonnet 4 — latest generation
  # model: "claude-opus-4-20250514"    # Claude Opus 4 — most capable
  # model: "claude-3-5-sonnet-20241022"
  # model: "claude-3-5-haiku-20241022"

spec:
  model: "gemini-2.0-flash-exp"   # Gemini 2.0 Flash — fast multimodal
  # model: "gemini-1.5-pro"
  # model: "gemini-1.5-flash"

spec:
  model: "anthropic.claude-3-5-sonnet-20241022-v2:0"
  # model: "anthropic.claude-3-5-haiku-20241022-v1:0"
  # model: "amazon.nova-pro-v1:0"
  # model: "amazon.nova-lite-v1:0"

Common parameters

These parameters are supported across all providers. All values are placed under spec.parameters:

spec:
  parameters:
    # Controls output randomness: 0.0 = deterministic, 2.0 = highly random
    temperature: "0.7"

    # Maximum tokens the model generates in a single response
    maxTokens: 4096

    # Nucleus sampling threshold (0.0–1.0)
    topP: "0.9"

    # Limits the vocabulary considered at each token step
    topK: 40

    # Penalises tokens that have already appeared in the output (-2.0–2.0)
    presencePenalty: "0.0"

    # Penalises tokens proportionally to how often they have appeared (-2.0–2.0)
    frequencyPenalty: "0.0"

    # Response generation timeout in seconds
    timeOut: 60

    # Allow the model to call multiple tools simultaneously
    parallelToolCalls: true

    # Seed for deterministic outputs (provider support varies)
    seed: 42

Provider-specific parameters

OpenAI
Anthropic
Google
AWS Bedrock

Place OpenAI-specific settings under spec.parameters.openai:

spec:
  model: "o1"
  providerRef:
    name: openai-provider

  parameters:
    maxTokens: 16000

    openai:
      # Reasoning effort for o1/o3 models
      # Values: none | minimal | low | medium | high | xhigh
      reasoningEffort: "high"

      # Service tier for prioritised or flexible capacity
      # Values: auto | default | flex | priority
      serviceTier: "auto"

Place Anthropic-specific settings under spec.parameters.anthropic:

spec:
  model: "claude-sonnet-4-20250514"
  providerRef:
    name: anthropic-provider

  parameters:
    temperature: "0.7"
    maxTokens: 8192

    anthropic:
      # Extended thinking — requires budgetTokens >= 1024
      thinking:
        type: "enabled"   # enabled | disabled
        budgetTokens: 4096

      # Prompt caching to reduce cost and latency
      # Values: true | "5m" | "1h"
      cacheToolDefinitions: "5m"
      cacheInstructions: "5m"
      cacheMessages: "5m"

Place Google-specific settings under spec.parameters.google:

spec:
  model: "gemini-2.0-flash-exp"
  providerRef:
    name: google-provider

  parameters:
    temperature: "0.7"
    maxTokens: 8192
    topK: 40

    google:
      # Thinking budget configuration
      thinkingConfig:
        includeThoughts: true
        # -1 = automatic, 0 = disabled, >0 = token budget
        thinkingBudget: -1
        # Values: unspecified | minimal | low | medium | high
        thinkingLevel: "medium"

      # Content safety filters
      safetySettings:
        - category: "HARM_CATEGORY_HARASSMENT"
          threshold: "BLOCK_MEDIUM_AND_ABOVE"
          method: "PROBABILITY"
        - category: "HARM_CATEGORY_HATE_SPEECH"
          threshold: "BLOCK_MEDIUM_AND_ABOVE"

Place Bedrock-specific settings under spec.parameters.bedrock:

spec:
  model: "anthropic.claude-3-5-sonnet-20241022-v2:0"
  providerRef:
    name: bedrock-provider

  parameters:
    temperature: "0.7"
    maxTokens: 4096

    bedrock:
      # Apply a Bedrock Guardrail for content moderation
      guardrailConfig:
        guardrailIdentifier: "guardrail-id"
        guardrailVersion: "1"
        # Values: disabled | enabled | enabled_full
        trace: "enabled"

Cross-namespace models

You can create Model resources in a shared namespace and reference them from agents in any other namespace. This is the recommended pattern for team-wide model management:

# Create the model in a shared namespace
apiVersion: agent.flokoa.ai/v1alpha1
kind: Model
metadata:
  name: shared-gpt-4o
  namespace: shared-models
spec:
  model: "gpt-4o"
  providerRef:
    name: openai-provider
    namespace: shared-resources
  parameters:
    temperature: "0.7"
    maxTokens: 8192
---
# Reference it from an agent in a different namespace
apiVersion: agent.flokoa.ai/v1alpha1
kind: Agent
metadata:
  name: my-agent
  namespace: my-app
spec:
  model:
    name: shared-gpt-4o
    namespace: shared-models
  # ... rest of agent spec

Status fields

status:
  ready: true

  resolvedProvider:
    provider: openai
    namespace: default
    name: openai-provider

  conditions:
    - type: Ready
      status: "True"
      lastTransitionTime: "2026-01-15T10:30:00Z"
      reason: ProviderFound
      message: "Model is configured and ready"

  observedGeneration: 1

Field	Description
`ready`	`true` when the referenced provider is found and ready
`resolvedProvider`	The provider name, namespace, and type that were resolved
`conditions`	Standard condition array; the `Ready` condition carries error details
`observedGeneration`	The last spec generation reconciled by the operator

Parameter guidelines

Use this table as a quick reference when tuning your model parameters:

Parameter	Range / Values	Recommended starting point
`temperature`	`0.0` – `2.0`	`0.2` for code/math, `0.7` for general use, `1.2` for creative tasks
`maxTokens`	Provider-dependent	`2048` for short replies, `8192` for analysis, `16384+` for code generation
`topP`	`0.0` – `1.0`	`0.1`–`0.3` for focused outputs, `0.9`–`1.0` for diversity
`topK`	`1` – provider max	`40` is a safe default; lower values increase focus
`presencePenalty`	`-2.0` – `2.0`	`0.0` unless you need to encourage topic diversity
`frequencyPenalty`	`-2.0` – `2.0`	`0.0` unless you need to reduce repetition
`timeOut`	seconds	`60` for interactive agents, `120`+ for batch or reasoning models
`seed`	any integer	Set for reproducible test outputs; omit in production

Best practices

Name models descriptively using provider and use case, such as gpt-4o-code or claude-creative, so the purpose is clear at a glance.
Create shared models in a dedicated namespace so all teams reference the same configuration without duplication.
Start with default parameters — only override temperature, maxTokens, or other values when you have a specific reason.
Match model size to task complexity — use cheaper models like gpt-4o-mini for simple classification and reserve large models for complex reasoning.
Set explicit timeOut values that reflect the expected response time for your workload — do not rely on provider defaults.
Enable provider caching (e.g., cacheInstructions, cacheToolDefinitions on Anthropic) to reduce cost and latency for repeated system prompts.
Version-control your Model manifests alongside application code so parameter changes are auditable and reversible.
Monitor token consumption regularly, especially when maxTokens is set to large values like 16 384 or above.

Get Started

Core Resources

Guides

Python SDK

Operations

Model: configure LLM models and parameters

API reference

Basic structure

Model names by provider

Common parameters

Provider-specific parameters

Cross-namespace models

Status fields

Parameter guidelines

Best practices

​API reference

​Basic structure

​Model names by provider

​Common parameters

​Provider-specific parameters

​Cross-namespace models

​Status fields

​Parameter guidelines

​Best practices

API reference

Basic structure

Model names by provider

Common parameters

Provider-specific parameters

Cross-namespace models

Status fields

Parameter guidelines

Best practices