Skip to main content
The Model CRD connects a specific LLM (such as gpt-4o or claude-sonnet-4-20250514) to its provider credentials and generation parameters. By separating model configuration from the Agent and ModelProvider resources, you can reuse the same model definition across many agents, adjust parameters independently, and manage versions through GitOps workflows. When an agent references a Model, the operator injects the model name and all parameters into the agent’s runtime environment at reconcile time.

API reference

apiVersion: agent.flokoa.ai/v1alpha1
kind: Model

Basic structure

apiVersion: agent.flokoa.ai/v1alpha1
kind: Model
metadata:
  name: gpt-4o-default
spec:
  model: "gpt-4o"

  providerRef:
    name: openai-provider
    namespace: shared-resources   # optional

  parameters:
    temperature: "0.7"
    maxTokens: 4096

Model names by provider

Use exactly these model identifier strings in spec.model. The value is passed directly to the provider API, so spelling and casing must be precise.
spec:
  model: "gpt-4o"          # GPT-4 Omni — general-purpose, recommended
  # model: "gpt-4o-mini"   # Smaller, faster, lower cost variant
  # model: "o1"            # OpenAI o1 reasoning model
  # model: "o3-mini"       # OpenAI o3-mini reasoning model

Common parameters

These parameters are supported across all providers. All values are placed under spec.parameters:
spec:
  parameters:
    # Controls output randomness: 0.0 = deterministic, 2.0 = highly random
    temperature: "0.7"

    # Maximum tokens the model generates in a single response
    maxTokens: 4096

    # Nucleus sampling threshold (0.0–1.0)
    topP: "0.9"

    # Limits the vocabulary considered at each token step
    topK: 40

    # Penalises tokens that have already appeared in the output (-2.0–2.0)
    presencePenalty: "0.0"

    # Penalises tokens proportionally to how often they have appeared (-2.0–2.0)
    frequencyPenalty: "0.0"

    # Response generation timeout in seconds
    timeOut: 60

    # Allow the model to call multiple tools simultaneously
    parallelToolCalls: true

    # Seed for deterministic outputs (provider support varies)
    seed: 42

Provider-specific parameters

Place OpenAI-specific settings under spec.parameters.openai:
spec:
  model: "o1"
  providerRef:
    name: openai-provider

  parameters:
    maxTokens: 16000

    openai:
      # Reasoning effort for o1/o3 models
      # Values: none | minimal | low | medium | high | xhigh
      reasoningEffort: "high"

      # Service tier for prioritised or flexible capacity
      # Values: auto | default | flex | priority
      serviceTier: "auto"

Cross-namespace models

You can create Model resources in a shared namespace and reference them from agents in any other namespace. This is the recommended pattern for team-wide model management:
# Create the model in a shared namespace
apiVersion: agent.flokoa.ai/v1alpha1
kind: Model
metadata:
  name: shared-gpt-4o
  namespace: shared-models
spec:
  model: "gpt-4o"
  providerRef:
    name: openai-provider
    namespace: shared-resources
  parameters:
    temperature: "0.7"
    maxTokens: 8192
---
# Reference it from an agent in a different namespace
apiVersion: agent.flokoa.ai/v1alpha1
kind: Agent
metadata:
  name: my-agent
  namespace: my-app
spec:
  model:
    name: shared-gpt-4o
    namespace: shared-models
  # ... rest of agent spec

Status fields

status:
  ready: true

  resolvedProvider:
    provider: openai
    namespace: default
    name: openai-provider

  conditions:
    - type: Ready
      status: "True"
      lastTransitionTime: "2026-01-15T10:30:00Z"
      reason: ProviderFound
      message: "Model is configured and ready"

  observedGeneration: 1
FieldDescription
readytrue when the referenced provider is found and ready
resolvedProviderThe provider name, namespace, and type that were resolved
conditionsStandard condition array; the Ready condition carries error details
observedGenerationThe last spec generation reconciled by the operator

Parameter guidelines

Use this table as a quick reference when tuning your model parameters:
ParameterRange / ValuesRecommended starting point
temperature0.02.00.2 for code/math, 0.7 for general use, 1.2 for creative tasks
maxTokensProvider-dependent2048 for short replies, 8192 for analysis, 16384+ for code generation
topP0.01.00.10.3 for focused outputs, 0.91.0 for diversity
topK1 – provider max40 is a safe default; lower values increase focus
presencePenalty-2.02.00.0 unless you need to encourage topic diversity
frequencyPenalty-2.02.00.0 unless you need to reduce repetition
timeOutseconds60 for interactive agents, 120+ for batch or reasoning models
seedany integerSet for reproducible test outputs; omit in production

Best practices

  1. Name models descriptively using provider and use case, such as gpt-4o-code or claude-creative, so the purpose is clear at a glance.
  2. Create shared models in a dedicated namespace so all teams reference the same configuration without duplication.
  3. Start with default parameters — only override temperature, maxTokens, or other values when you have a specific reason.
  4. Match model size to task complexity — use cheaper models like gpt-4o-mini for simple classification and reserve large models for complex reasoning.
  5. Set explicit timeOut values that reflect the expected response time for your workload — do not rely on provider defaults.
  6. Enable provider caching (e.g., cacheInstructions, cacheToolDefinitions on Anthropic) to reduce cost and latency for repeated system prompts.
  7. Version-control your Model manifests alongside application code so parameter changes are auditable and reversible.
  8. Monitor token consumption regularly, especially when maxTokens is set to large values like 16 384 or above.