Model CRD connects a specific LLM (such as gpt-4o or claude-sonnet-4-20250514) to its provider credentials and generation parameters. By separating model configuration from the Agent and ModelProvider resources, you can reuse the same model definition across many agents, adjust parameters independently, and manage versions through GitOps workflows. When an agent references a Model, the operator injects the model name and all parameters into the agent’s runtime environment at reconcile time.
API reference
Basic structure
Model names by provider
Use exactly these model identifier strings inspec.model. The value is passed directly to the provider API, so spelling and casing must be precise.
- OpenAI
- Anthropic
- Google
- AWS Bedrock
Common parameters
These parameters are supported across all providers. All values are placed underspec.parameters:
Provider-specific parameters
- OpenAI
- Anthropic
- Google
- AWS Bedrock
Place OpenAI-specific settings under
spec.parameters.openai:Cross-namespace models
You can createModel resources in a shared namespace and reference them from agents in any other namespace. This is the recommended pattern for team-wide model management:
Status fields
| Field | Description |
|---|---|
ready | true when the referenced provider is found and ready |
resolvedProvider | The provider name, namespace, and type that were resolved |
conditions | Standard condition array; the Ready condition carries error details |
observedGeneration | The last spec generation reconciled by the operator |
Parameter guidelines
Use this table as a quick reference when tuning your model parameters:| Parameter | Range / Values | Recommended starting point |
|---|---|---|
temperature | 0.0 – 2.0 | 0.2 for code/math, 0.7 for general use, 1.2 for creative tasks |
maxTokens | Provider-dependent | 2048 for short replies, 8192 for analysis, 16384+ for code generation |
topP | 0.0 – 1.0 | 0.1–0.3 for focused outputs, 0.9–1.0 for diversity |
topK | 1 – provider max | 40 is a safe default; lower values increase focus |
presencePenalty | -2.0 – 2.0 | 0.0 unless you need to encourage topic diversity |
frequencyPenalty | -2.0 – 2.0 | 0.0 unless you need to reduce repetition |
timeOut | seconds | 60 for interactive agents, 120+ for batch or reasoning models |
seed | any integer | Set for reproducible test outputs; omit in production |
Best practices
- Name models descriptively using provider and use case, such as
gpt-4o-codeorclaude-creative, so the purpose is clear at a glance. - Create shared models in a dedicated namespace so all teams reference the same configuration without duplication.
- Start with default parameters — only override
temperature,maxTokens, or other values when you have a specific reason. - Match model size to task complexity — use cheaper models like
gpt-4o-minifor simple classification and reserve large models for complex reasoning. - Set explicit
timeOutvalues that reflect the expected response time for your workload — do not rely on provider defaults. - Enable provider caching (e.g.,
cacheInstructions,cacheToolDefinitionson Anthropic) to reduce cost and latency for repeated system prompts. - Version-control your Model manifests alongside application code so parameter changes are auditable and reversible.
- Monitor token consumption regularly, especially when
maxTokensis set to large values like 16 384 or above.
