modelparams.dev
API GitHub

Every LLM parameter, for every model.

An open, community-maintained catalog of LLM model parameters. Search, filter, and link straight to the knobs you can turn. API-key and subscription variants of the same model are listed separately, because they behave differently.

Filter by provider

Filter by parameter

174 of 174 models

OpenAI 41

OpenAI Chatgpt 4o Latest 3 params
Parameter Type Default Description Condition
Length · 1 param
Max tokens
max_tokens
integer (1…+∞) 4096 Maximum number of output tokens the model may generate.
Sampling · 2 params
Temperature
temperature
number (0…2 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.
OpenAI Gpt 3.5 Turbo 3 params
Parameter Type Default Description Condition
Length · 1 param
Max tokens
max_tokens
integer (1…+∞) 4096 Maximum number of output tokens the model may generate.
Sampling · 2 params
Temperature
temperature
number (0…2 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.
OpenAI Gpt 4 Turbo 3 params
Parameter Type Default Description Condition
Length · 1 param
Max tokens
max_tokens
integer (1…+∞) 4096 Maximum number of output tokens the model may generate.
Sampling · 2 params
Temperature
temperature
number (0…2 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.
OpenAI Gpt 4 Turbo 2024-04-09 3 params
Parameter Type Default Description Condition
Length · 1 param
Max tokens
max_tokens
integer (1…+∞) 4096 Maximum number of output tokens the model may generate.
Sampling · 2 params
Temperature
temperature
number (0…2 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.
OpenAI Gpt 4.1 3 params
Parameter Type Default Description Condition
Length · 1 param
Max tokens
max_tokens
integer (1…+∞) 4096 Maximum number of output tokens the model may generate.
Sampling · 2 params
Temperature
temperature
number (0…2 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.
OpenAI Gpt 4.1 Mini 3 params
Parameter Type Default Description Condition
Length · 1 param
Max tokens
max_tokens
integer (1…+∞) 4096 Maximum number of output tokens the model may generate.
Sampling · 2 params
Temperature
temperature
number (0…2 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.
OpenAI Gpt 4.1 Nano 3 params
Parameter Type Default Description Condition
Length · 1 param
Max tokens
max_tokens
integer (1…+∞) 4096 Maximum number of output tokens the model may generate.
Sampling · 2 params
Temperature
temperature
number (0…2 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.
OpenAI GPT-4o 3 params
Parameter Type Default Description Condition
Length · 1 param
Max tokens
max_tokens
integer (1…+∞) 4096 Maximum number of output tokens the model may generate.
Sampling · 2 params
Temperature
temperature
number (0…2 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.
OpenAI Gpt 4o 2024-11-20 3 params
Parameter Type Default Description Condition
Length · 1 param
Max tokens
max_tokens
integer (1…+∞) 4096 Maximum number of output tokens the model may generate.
Sampling · 2 params
Temperature
temperature
number (0…2 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.
OpenAI GPT-4o mini 3 params
Parameter Type Default Description Condition
Length · 1 param
Max tokens
max_tokens
integer (1…+∞) 4096 Maximum number of output tokens the model may generate.
Sampling · 2 params
Temperature
temperature
number (0…2 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.
OpenAI Gpt 5 2 params
Parameter Type Default Description Condition
Length · 1 param
Max completion tokens
max_completion_tokens
integer (16…+∞) 4096 Maximum number of output tokens the model may generate.
Reasoning · 1 param
Reasoning effort
reasoning_effort
enum (minimal | low | medium | high) "medium" Controls how much reasoning the model should perform before producing an answer.
OpenAI Gpt 5 Chat Latest 1 param
Parameter Type Default Description Condition
Length · 1 param
Max completion tokens
max_completion_tokens
integer (16…+∞) 4096 Maximum number of output tokens the model may generate.
OpenAI Gpt 5 Mini 2 params
Parameter Type Default Description Condition
Length · 1 param
Max completion tokens
max_completion_tokens
integer (16…+∞) 4096 Maximum number of output tokens the model may generate.
Reasoning · 1 param
Reasoning effort
reasoning_effort
enum (minimal | low | medium | high) "medium" Controls how much reasoning the model should perform before producing an answer.
OpenAI Gpt 5 Nano 2 params
Parameter Type Default Description Condition
Length · 1 param
Max completion tokens
max_completion_tokens
integer (16…+∞) 4096 Maximum number of output tokens the model may generate.
Reasoning · 1 param
Reasoning effort
reasoning_effort
enum (minimal | low | medium | high) "medium" Controls how much reasoning the model should perform before producing an answer.
OpenAI Gpt 5.1 2 params
Parameter Type Default Description Condition
Length · 1 param
Max completion tokens
max_completion_tokens
integer (16…+∞) 4096 Maximum number of output tokens the model may generate.
Reasoning · 1 param
Reasoning effort
reasoning_effort
enum (none | low | medium | high) "none" Controls how much reasoning the model should perform before producing an answer.
OpenAI Gpt 5.1 Codex Max Subscription 3 params
Parameter Type Default Description Condition
Reasoning · 2 params
Reasoning effort
reasoning.effort
enum (minimal | low | medium | high | xhigh) "medium" Controls how much reasoning the model should perform before producing an answer.
Reasoning summary
reasoning.summary
enum (auto | concise | detailed | none) "auto" Controls the level of reasoning summary returned with the response.
Output · 1 param
Verbosity
text.verbosity
enum (low | medium | high) "medium" Controls how concise or detailed the model's final text response should be.
OpenAI Gpt 5.1 Codex Subscription 3 params
Parameter Type Default Description Condition
Reasoning · 2 params
Reasoning effort
reasoning.effort
enum (minimal | low | medium | high) "medium" Controls how much reasoning the model should perform before producing an answer.
Reasoning summary
reasoning.summary
enum (auto | concise | detailed | none) "auto" Controls the level of reasoning summary returned with the response.
Output · 1 param
Verbosity
text.verbosity
enum (low | medium | high) "medium" Controls how concise or detailed the model's final text response should be.
OpenAI Gpt 5.2 2 params
Parameter Type Default Description Condition
Length · 1 param
Max completion tokens
max_completion_tokens
integer (16…+∞) 4096 Maximum number of output tokens the model may generate.
Reasoning · 1 param
Reasoning effort
reasoning_effort
enum (none | low | medium | high | xhigh) "medium" Controls how much reasoning the model should perform before producing an answer.
OpenAI Gpt 5.2 Codex Subscription 3 params
Parameter Type Default Description Condition
Reasoning · 2 params
Reasoning effort
reasoning.effort
enum (minimal | low | medium | high | xhigh) "medium" Controls how much reasoning the model should perform before producing an answer.
Reasoning summary
reasoning.summary
enum (auto | concise | detailed | none) "auto" Controls the level of reasoning summary returned with the response.
Output · 1 param
Verbosity
text.verbosity
enum (low | medium | high) "medium" Controls how concise or detailed the model's final text response should be.
OpenAI Gpt 5.2 Subscription 3 params
Parameter Type Default Description Condition
Reasoning · 2 params
Reasoning effort
reasoning.effort
enum (minimal | low | medium | high | xhigh) "medium" Controls how much reasoning the model should perform before producing an answer.
Reasoning summary
reasoning.summary
enum (auto | concise | detailed | none) "auto" Controls the level of reasoning summary returned with the response.
Output · 1 param
Verbosity
text.verbosity
enum (low | medium | high) "medium" Controls how concise or detailed the model's final text response should be.
OpenAI Gpt 5.3 Codex 2 params
Parameter Type Default Description Condition
Length · 1 param
Max completion tokens
max_completion_tokens
integer (16…+∞) 4096 Maximum number of output tokens the model may generate.
Reasoning · 1 param
Reasoning effort
reasoning_effort
enum (low | medium | high | xhigh) "medium" Controls how much reasoning the model should perform before producing an answer.
OpenAI Gpt 5.3 Codex Spark Subscription 3 params
Parameter Type Default Description Condition
Reasoning · 2 params
Reasoning effort
reasoning.effort
enum (minimal | low | medium | high | xhigh) "medium" Controls how much reasoning the model should perform before producing an answer.
Reasoning summary
reasoning.summary
enum (auto | concise | detailed | none) "auto" Controls the level of reasoning summary returned with the response.
Output · 1 param
Verbosity
text.verbosity
enum (low | medium | high) "medium" Controls how concise or detailed the model's final text response should be.
OpenAI Gpt 5.3 Codex Subscription 3 params
Parameter Type Default Description Condition
Reasoning · 2 params
Reasoning effort
reasoning.effort
enum (minimal | low | medium | high | xhigh) "medium" Controls how much reasoning the model should perform before producing an answer.
Reasoning summary
reasoning.summary
enum (auto | concise | detailed | none) "auto" Controls the level of reasoning summary returned with the response.
Output · 1 param
Verbosity
text.verbosity
enum (low | medium | high) "medium" Controls how concise or detailed the model's final text response should be.
OpenAI Gpt 5.4 2 params
Parameter Type Default Description Condition
Length · 1 param
Max completion tokens
max_completion_tokens
integer (16…+∞) 4096 Maximum number of output tokens the model may generate.
Reasoning · 1 param
Reasoning effort
reasoning_effort
enum (none | low | medium | high | xhigh) "medium" Controls how much reasoning the model should perform before producing an answer.
OpenAI Gpt 5.4 Mini 2 params
Parameter Type Default Description Condition
Length · 1 param
Max completion tokens
max_completion_tokens
integer (16…+∞) 4096 Maximum number of output tokens the model may generate.
Reasoning · 1 param
Reasoning effort
reasoning_effort
enum (none | low | medium | high | xhigh) "medium" Controls how much reasoning the model should perform before producing an answer.
OpenAI Gpt 5.4 Mini Subscription 3 params
Parameter Type Default Description Condition
Reasoning · 2 params
Reasoning effort
reasoning.effort
enum (minimal | low | medium | high | xhigh) "medium" Controls how much reasoning the model should perform before producing an answer.
Reasoning summary
reasoning.summary
enum (auto | concise | detailed | none) "auto" Controls the level of reasoning summary returned with the response.
Output · 1 param
Verbosity
text.verbosity
enum (low | medium | high) "medium" Controls how concise or detailed the model's final text response should be.
OpenAI Gpt 5.4 Nano 2 params
Parameter Type Default Description Condition
Length · 1 param
Max completion tokens
max_completion_tokens
integer (16…+∞) 4096 Maximum number of output tokens the model may generate.
Reasoning · 1 param
Reasoning effort
reasoning_effort
enum (none | low | medium | high | xhigh) "medium" Controls how much reasoning the model should perform before producing an answer.
OpenAI Gpt 5.4 Pro 2 params
Parameter Type Default Description Condition
Length · 1 param
Max completion tokens
max_completion_tokens
integer (16…+∞) 4096 Maximum number of output tokens the model may generate.
Reasoning · 1 param
Reasoning effort
reasoning_effort
enum (medium | high | xhigh) "medium" Controls how much reasoning the model should perform before producing an answer.
OpenAI Gpt 5.4 Pro Subscription 3 params
Parameter Type Default Description Condition
Reasoning · 2 params
Reasoning effort
reasoning.effort
enum (medium | high | xhigh) "medium" Controls how much reasoning the model should perform before producing an answer.
Reasoning summary
reasoning.summary
enum (auto | concise | detailed | none) "auto" Controls the level of reasoning summary returned with the response.
Output · 1 param
Verbosity
text.verbosity
enum (low | medium | high) "medium" Controls how concise or detailed the model's final text response should be.
OpenAI Gpt 5.4 Subscription 3 params
Parameter Type Default Description Condition
Reasoning · 2 params
Reasoning effort
reasoning.effort
enum (minimal | low | medium | high | xhigh) "medium" Controls how much reasoning the model should perform before producing an answer.
Reasoning summary
reasoning.summary
enum (auto | concise | detailed | none) "auto" Controls the level of reasoning summary returned with the response.
Output · 1 param
Verbosity
text.verbosity
enum (low | medium | high) "medium" Controls how concise or detailed the model's final text response should be.
OpenAI Gpt 5.5 2 params
Parameter Type Default Description Condition
Length · 1 param
Max completion tokens
max_completion_tokens
integer (16…+∞) 4096 Maximum number of output tokens the model may generate.
Reasoning · 1 param
Reasoning effort
reasoning_effort
enum (none | low | medium | high | xhigh) "medium" Controls how much reasoning the model should perform before producing an answer.
OpenAI Gpt 5.5 Pro 2 params
Parameter Type Default Description Condition
Length · 1 param
Max completion tokens
max_completion_tokens
integer (16…+∞) 4096 Maximum number of output tokens the model may generate.
Reasoning · 1 param
Reasoning effort
reasoning_effort
enum (medium | high | xhigh) "medium" Controls how much reasoning the model should perform before producing an answer.
OpenAI Gpt 5.5 Pro Subscription 3 params
Parameter Type Default Description Condition
Reasoning · 2 params
Reasoning effort
reasoning.effort
enum (medium | high | xhigh) "medium" Controls how much reasoning the model should perform before producing an answer.
Reasoning summary
reasoning.summary
enum (auto | concise | detailed | none) "auto" Controls the level of reasoning summary returned with the response.
Output · 1 param
Verbosity
text.verbosity
enum (low | medium | high) "medium" Controls how concise or detailed the model's final text response should be.
OpenAI Gpt 5.5 Subscription 3 params
Parameter Type Default Description Condition
Reasoning · 2 params
Reasoning effort
reasoning.effort
enum (minimal | low | medium | high | xhigh) "medium" Controls how much reasoning the model should perform before producing an answer.
Reasoning summary
reasoning.summary
enum (auto | concise | detailed | none) "auto" Controls the level of reasoning summary returned with the response.
Output · 1 param
Verbosity
text.verbosity
enum (low | medium | high) "medium" Controls how concise or detailed the model's final text response should be.
OpenAI o1 2 params
Parameter Type Default Description Condition
Length · 1 param
Max completion tokens
max_completion_tokens
integer (16…+∞) 4096 Maximum number of output tokens the model may generate.
Reasoning · 1 param
Reasoning effort
reasoning_effort
enum (low | medium | high | xhigh) "medium" Controls how much reasoning the model should perform before producing an answer.
OpenAI o1-mini 2 params
Parameter Type Default Description Condition
Length · 1 param
Max tokens
max_tokens
integer (1…+∞) 4096 Maximum number of output tokens the model may generate.
Reasoning · 1 param
Reasoning effort
reasoning_effort
enum (minimal | low | medium | high) "medium" Controls how much reasoning the model should perform before producing an answer.
OpenAI O1 Preview 2 params
Parameter Type Default Description Condition
Length · 1 param
Max tokens
max_tokens
integer (1…+∞) 4096 Maximum number of output tokens the model may generate.
Reasoning · 1 param
Reasoning effort
reasoning_effort
enum (minimal | low | medium | high) "medium" Controls how much reasoning the model should perform before producing an answer.
OpenAI o3 2 params
Parameter Type Default Description Condition
Length · 1 param
Max completion tokens
max_completion_tokens
integer (16…+∞) 4096 Maximum number of output tokens the model may generate.
Reasoning · 1 param
Reasoning effort
reasoning_effort
enum (low | medium | high | xhigh) "medium" Controls how much reasoning the model should perform before producing an answer.
OpenAI o3-mini 2 params
Parameter Type Default Description Condition
Length · 1 param
Max completion tokens
max_completion_tokens
integer (16…+∞) 4096 Maximum number of output tokens the model may generate.
Reasoning · 1 param
Reasoning effort
reasoning_effort
enum (low | medium | high | xhigh) "medium" Controls how much reasoning the model should perform before producing an answer.
OpenAI O3 Pro 2 params
Parameter Type Default Description Condition
Length · 1 param
Max completion tokens
max_completion_tokens
integer (16…+∞) 4096 Maximum number of output tokens the model may generate.
Reasoning · 1 param
Reasoning effort
reasoning_effort
enum (low | medium | high | xhigh) "medium" Controls how much reasoning the model should perform before producing an answer.
OpenAI o4-mini 2 params
Parameter Type Default Description Condition
Length · 1 param
Max completion tokens
max_completion_tokens
integer (16…+∞) 4096 Maximum number of output tokens the model may generate.
Reasoning · 1 param
Reasoning effort
reasoning_effort
enum (low | medium | high | xhigh) "medium" Controls how much reasoning the model should perform before producing an answer.

Anthropic 36

Anthropic Claude 3.5 Haiku 20241022 4 params
Parameter Type Default Description Condition
Length · 1 param
Max tokens
max_tokens
integer (1…+∞) 4096 Maximum number of output tokens the model may generate.
Sampling · 3 params
Temperature
temperature
number (0…1 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Not when thinking.type ∈ {"adaptive", "enabled"}
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.
Not when thinking.type ∈ {"adaptive", "enabled"} or temperature ≠ 1
Top K
top_k
integer (0…+∞) 0 Limits token sampling to the top K most likely next tokens.
Not when thinking.type ∈ {"adaptive", "enabled"}
Anthropic Claude 3.5 Haiku Latest 4 params
Parameter Type Default Description Condition
Length · 1 param
Max tokens
max_tokens
integer (1…+∞) 4096 Maximum number of output tokens the model may generate.
Sampling · 3 params
Temperature
temperature
number (0…1 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Not when thinking.type ∈ {"adaptive", "enabled"}
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.
Not when thinking.type ∈ {"adaptive", "enabled"} or temperature ≠ 1
Top K
top_k
integer (0…+∞) 0 Limits token sampling to the top K most likely next tokens.
Not when thinking.type ∈ {"adaptive", "enabled"}
Anthropic Claude 3.5 Sonnet 20241022 4 params
Parameter Type Default Description Condition
Length · 1 param
Max tokens
max_tokens
integer (1…+∞) 4096 Maximum number of output tokens the model may generate.
Sampling · 3 params
Temperature
temperature
number (0…1 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Not when thinking.type ∈ {"adaptive", "enabled"}
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.
Not when thinking.type ∈ {"adaptive", "enabled"} or temperature ≠ 1
Top K
top_k
integer (0…+∞) 0 Limits token sampling to the top K most likely next tokens.
Not when thinking.type ∈ {"adaptive", "enabled"}
Anthropic Claude 3.5 Sonnet Latest 4 params
Parameter Type Default Description Condition
Length · 1 param
Max tokens
max_tokens
integer (1…+∞) 4096 Maximum number of output tokens the model may generate.
Sampling · 3 params
Temperature
temperature
number (0…1 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Not when thinking.type ∈ {"adaptive", "enabled"}
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.
Not when thinking.type ∈ {"adaptive", "enabled"} or temperature ≠ 1
Top K
top_k
integer (0…+∞) 0 Limits token sampling to the top K most likely next tokens.
Not when thinking.type ∈ {"adaptive", "enabled"}
Anthropic Claude 3.7 Sonnet 20250219 6 params
Parameter Type Default Description Condition
Length · 1 param
Max tokens
max_tokens
integer (1…+∞) 4096 Maximum number of output tokens the model may generate.
Sampling · 3 params
Temperature
temperature
number (0…1 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Not when thinking.type ∈ {"adaptive", "enabled"}
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.
Not when thinking.type ∈ {"adaptive", "enabled"} or temperature ≠ 1
Top K
top_k
integer (0…+∞) 0 Limits token sampling to the top K most likely next tokens.
Not when thinking.type ∈ {"adaptive", "enabled"}
Reasoning · 2 params
Thinking mode
thinking.type
enum (disabled | enabled) "disabled" Controls the Anthropic thinking mode values supported by this model.
Budget tokens
thinking.budget_tokens
integer (1024…+∞) 4096 Maximum token budget Anthropic may use for extended thinking before producing the final answer.
Only when thinking.type = "enabled"
Anthropic Claude 3.7 Sonnet Latest 6 params
Parameter Type Default Description Condition
Length · 1 param
Max tokens
max_tokens
integer (1…+∞) 4096 Maximum number of output tokens the model may generate.
Sampling · 3 params
Temperature
temperature
number (0…1 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Not when thinking.type ∈ {"adaptive", "enabled"}
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.
Not when thinking.type ∈ {"adaptive", "enabled"} or temperature ≠ 1
Top K
top_k
integer (0…+∞) 0 Limits token sampling to the top K most likely next tokens.
Not when thinking.type ∈ {"adaptive", "enabled"}
Reasoning · 2 params
Thinking mode
thinking.type
enum (disabled | enabled) "disabled" Controls the Anthropic thinking mode values supported by this model.
Budget tokens
thinking.budget_tokens
integer (1024…+∞) 4096 Maximum token budget Anthropic may use for extended thinking before producing the final answer.
Only when thinking.type = "enabled"
Anthropic Claude 3 Opus 20240229 4 params
Parameter Type Default Description Condition
Length · 1 param
Max tokens
max_tokens
integer (1…+∞) 4096 Maximum number of output tokens the model may generate.
Sampling · 3 params
Temperature
temperature
number (0…1 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Not when thinking.type ∈ {"adaptive", "enabled"}
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.
Not when thinking.type ∈ {"adaptive", "enabled"} or temperature ≠ 1
Top K
top_k
integer (0…+∞) 0 Limits token sampling to the top K most likely next tokens.
Not when thinking.type ∈ {"adaptive", "enabled"}
Anthropic Claude 3 Opus Latest 4 params
Parameter Type Default Description Condition
Length · 1 param
Max tokens
max_tokens
integer (1…+∞) 4096 Maximum number of output tokens the model may generate.
Sampling · 3 params
Temperature
temperature
number (0…1 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Not when thinking.type ∈ {"adaptive", "enabled"}
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.
Not when thinking.type ∈ {"adaptive", "enabled"} or temperature ≠ 1
Top K
top_k
integer (0…+∞) 0 Limits token sampling to the top K most likely next tokens.
Not when thinking.type ∈ {"adaptive", "enabled"}
Anthropic Claude Haiku 4 6 params
Parameter Type Default Description Condition
Length · 1 param
Max tokens
max_tokens
integer (1…+∞) 4096 Maximum number of output tokens the model may generate.
Sampling · 3 params
Temperature
temperature
number (0…1 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Not when thinking.type ∈ {"adaptive", "enabled"}
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.
Not when thinking.type ∈ {"adaptive", "enabled"} or temperature ≠ 1
Top K
top_k
integer (0…+∞) 0 Limits token sampling to the top K most likely next tokens.
Not when thinking.type ∈ {"adaptive", "enabled"}
Reasoning · 2 params
Thinking mode
thinking.type
enum (disabled | enabled) "disabled" Controls the Anthropic thinking mode values supported by this model.
Budget tokens
thinking.budget_tokens
integer (1024…+∞) 4096 Maximum token budget Anthropic may use for extended thinking before producing the final answer.
Only when thinking.type = "enabled"
Anthropic Claude Haiku 4.5 6 params
Parameter Type Default Description Condition
Length · 1 param
Max tokens
max_tokens
integer (1…+∞) 4096 Maximum number of output tokens the model may generate.
Sampling · 3 params
Temperature
temperature
number (0…1 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Not when thinking.type ∈ {"adaptive", "enabled"}
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.
Not when thinking.type ∈ {"adaptive", "enabled"} or temperature ≠ 1
Top K
top_k
integer (0…+∞) 0 Limits token sampling to the top K most likely next tokens.
Not when thinking.type ∈ {"adaptive", "enabled"}
Reasoning · 2 params
Thinking mode
thinking.type
enum (disabled | enabled) "disabled" Controls the Anthropic thinking mode values supported by this model.
Budget tokens
thinking.budget_tokens
integer (1024…+∞) 4096 Maximum token budget Anthropic may use for extended thinking before producing the final answer.
Only when thinking.type = "enabled"
Anthropic Claude Haiku 4.5 20251001 6 params
Parameter Type Default Description Condition
Length · 1 param
Max tokens
max_tokens
integer (1…+∞) 4096 Maximum number of output tokens the model may generate.
Sampling · 3 params
Temperature
temperature
number (0…1 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Not when thinking.type = "enabled" or top_p ≠ null
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.
Not when thinking.type = "enabled" or temperature ≠ null
Top K
top_k
integer (0…+∞) 0 Limits token sampling to the top K most likely next tokens.
Not when thinking.type = "enabled"
Reasoning · 2 params
Thinking mode
thinking.type
enum (disabled | enabled) "disabled" Controls the Anthropic thinking mode values supported by this model.
Budget tokens
thinking.budget_tokens
integer (1024…+∞) 4096 Maximum token budget Anthropic may use for extended thinking before producing the final answer.
Only when thinking.type = "enabled"
Anthropic Claude Haiku 4.5 20251001 Subscription 6 params
Parameter Type Default Description Condition
Length · 1 param
Max tokens
max_tokens
integer (1…+∞) 4096 Maximum number of output tokens the model may generate.
Sampling · 3 params
Temperature
temperature
number (0…1 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Not when thinking.type = "enabled" or top_p ≠ null
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.
Not when thinking.type = "enabled" or temperature ≠ null
Top K
top_k
integer (0…+∞) 0 Limits token sampling to the top K most likely next tokens.
Not when thinking.type = "enabled"
Reasoning · 2 params
Thinking mode
thinking.type
enum (disabled | enabled) "disabled" Controls the Anthropic thinking mode values supported by this model.
Budget tokens
thinking.budget_tokens
integer (1024…+∞) 4096 Maximum token budget Anthropic may use for extended thinking before producing the final answer.
Only when thinking.type = "enabled"
Anthropic Claude Haiku 4.5 Subscription 6 params
Parameter Type Default Description Condition
Length · 1 param
Max tokens
max_tokens
integer (1…+∞) 4096 Maximum number of output tokens the model may generate.
Sampling · 3 params
Temperature
temperature
number (0…1 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Not when thinking.type ∈ {"adaptive", "enabled"}
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.
Not when thinking.type ∈ {"adaptive", "enabled"} or temperature ≠ 1
Top K
top_k
integer (0…+∞) 0 Limits token sampling to the top K most likely next tokens.
Not when thinking.type ∈ {"adaptive", "enabled"}
Reasoning · 2 params
Thinking mode
thinking.type
enum (disabled | enabled) "disabled" Controls the Anthropic thinking mode values supported by this model.
Budget tokens
thinking.budget_tokens
integer (1024…+∞) 4096 Maximum token budget Anthropic may use for extended thinking before producing the final answer.
Only when thinking.type = "enabled"
Anthropic Claude Haiku 4 Subscription 6 params
Parameter Type Default Description Condition
Length · 1 param
Max tokens
max_tokens
integer (1…+∞) 4096 Maximum number of output tokens the model may generate.
Sampling · 3 params
Temperature
temperature
number (0…1 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Not when thinking.type ∈ {"adaptive", "enabled"}
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.
Not when thinking.type ∈ {"adaptive", "enabled"} or temperature ≠ 1
Top K
top_k
integer (0…+∞) 0 Limits token sampling to the top K most likely next tokens.
Not when thinking.type ∈ {"adaptive", "enabled"}
Reasoning · 2 params
Thinking mode
thinking.type
enum (disabled | enabled) "disabled" Controls the Anthropic thinking mode values supported by this model.
Budget tokens
thinking.budget_tokens
integer (1024…+∞) 4096 Maximum token budget Anthropic may use for extended thinking before producing the final answer.
Only when thinking.type = "enabled"
Anthropic Claude Opus 4.1 20250805 7 params
Parameter Type Default Description Condition
Length · 1 param
Max tokens
max_tokens
integer (1…+∞) 4096 Maximum number of output tokens the model may generate.
Sampling · 3 params
Temperature
temperature
number (0…1 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Not when thinking.type = "enabled" or top_p ≠ null
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.
Not when thinking.type = "enabled" or temperature ≠ null
Top K
top_k
integer (0…+∞) 0 Limits token sampling to the top K most likely next tokens.
Not when thinking.type = "enabled"
Reasoning · 3 params
Thinking mode
thinking.type
enum (disabled | enabled) "disabled" Controls the Anthropic thinking mode values supported by this model.
Budget tokens
thinking.budget_tokens
integer (1024…+∞) 4096 Maximum token budget Anthropic may use for extended thinking before producing the final answer.
Only when thinking.type = "enabled"
Thinking display
thinking.display
enum (summarized | omitted) "summarized" Controls whether Anthropic returns summarized or omitted thinking content.
Only when thinking.type = "enabled"
Anthropic Claude Opus 4.1 20250805 Subscription 7 params
Parameter Type Default Description Condition
Length · 1 param
Max tokens
max_tokens
integer (1…+∞) 4096 Maximum number of output tokens the model may generate.
Sampling · 3 params
Temperature
temperature
number (0…1 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Not when thinking.type = "enabled" or top_p ≠ null
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.
Not when thinking.type = "enabled" or temperature ≠ null
Top K
top_k
integer (0…+∞) 0 Limits token sampling to the top K most likely next tokens.
Not when thinking.type = "enabled"
Reasoning · 3 params
Thinking mode
thinking.type
enum (disabled | enabled) "disabled" Controls the Anthropic thinking mode values supported by this model.
Budget tokens
thinking.budget_tokens
integer (1024…+∞) 4096 Maximum token budget Anthropic may use for extended thinking before producing the final answer.
Only when thinking.type = "enabled"
Thinking display
thinking.display
enum (summarized | omitted) "summarized" Controls whether Anthropic returns summarized or omitted thinking content.
Only when thinking.type = "enabled"
Anthropic Claude Opus 4 20250514 7 params
Parameter Type Default Description Condition
Length · 1 param
Max tokens
max_tokens
integer (1…+∞) 4096 Maximum number of output tokens the model may generate.
Sampling · 3 params
Temperature
temperature
number (0…1 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Not when thinking.type = "enabled"
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.
Not when thinking.type = "enabled"
Top K
top_k
integer (0…+∞) 0 Limits token sampling to the top K most likely next tokens.
Not when thinking.type = "enabled"
Reasoning · 3 params
Thinking mode
thinking.type
enum (disabled | enabled) "disabled" Controls the Anthropic thinking mode values supported by this model.
Budget tokens
thinking.budget_tokens
integer (1024…+∞) 4096 Maximum token budget Anthropic may use for extended thinking before producing the final answer.
Only when thinking.type = "enabled"
Thinking display
thinking.display
enum (summarized | omitted) "summarized" Controls whether Anthropic returns summarized or omitted thinking content.
Only when thinking.type = "enabled"
Anthropic Claude Opus 4 20250514 Subscription 7 params
Parameter Type Default Description Condition
Length · 1 param
Max tokens
max_tokens
integer (1…+∞) 4096 Maximum number of output tokens the model may generate.
Sampling · 3 params
Temperature
temperature
number (0…1 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Not when thinking.type = "enabled"
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.
Not when thinking.type = "enabled"
Top K
top_k
integer (0…+∞) 0 Limits token sampling to the top K most likely next tokens.
Not when thinking.type = "enabled"
Reasoning · 3 params
Thinking mode
thinking.type
enum (disabled | enabled) "disabled" Controls the Anthropic thinking mode values supported by this model.
Budget tokens
thinking.budget_tokens
integer (1024…+∞) 4096 Maximum token budget Anthropic may use for extended thinking before producing the final answer.
Only when thinking.type = "enabled"
Thinking display
thinking.display
enum (summarized | omitted) "summarized" Controls whether Anthropic returns summarized or omitted thinking content.
Only when thinking.type = "enabled"
Anthropic Claude Opus 4.5 20251101 8 params
Parameter Type Default Description Condition
Length · 1 param
Max tokens
max_tokens
integer (1…+∞) 4096 Maximum number of output tokens the model may generate.
Sampling · 3 params
Temperature
temperature
number (0…1 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Not when thinking.type = "enabled" or top_p ≠ null
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.
Not when thinking.type = "enabled" or temperature ≠ null
Top K
top_k
integer (0…+∞) 0 Limits token sampling to the top K most likely next tokens.
Not when thinking.type = "enabled"
Reasoning · 4 params
Thinking mode
thinking.type
enum (disabled | enabled) "disabled" Controls the Anthropic thinking mode values supported by this model.
Budget tokens
thinking.budget_tokens
integer (1024…+∞) 4096 Maximum token budget Anthropic may use for extended thinking before producing the final answer.
Only when thinking.type = "enabled"
Thinking display
thinking.display
enum (summarized | omitted) "summarized" Controls whether Anthropic returns summarized or omitted thinking content.
Only when thinking.type = "enabled"
Effort
output_config.effort
enum (low | medium | high) "high" Controls Anthropic response thoroughness and token spend.
Anthropic Claude Opus 4.5 20251101 Subscription 8 params
Parameter Type Default Description Condition
Length · 1 param
Max tokens
max_tokens
integer (1…+∞) 4096 Maximum number of output tokens the model may generate.
Sampling · 3 params
Temperature
temperature
number (0…1 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Not when thinking.type = "enabled" or top_p ≠ null
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.
Not when thinking.type = "enabled" or temperature ≠ null
Top K
top_k
integer (0…+∞) 0 Limits token sampling to the top K most likely next tokens.
Not when thinking.type = "enabled"
Reasoning · 4 params
Thinking mode
thinking.type
enum (disabled | enabled) "disabled" Controls the Anthropic thinking mode values supported by this model.
Budget tokens
thinking.budget_tokens
integer (1024…+∞) 4096 Maximum token budget Anthropic may use for extended thinking before producing the final answer.
Only when thinking.type = "enabled"
Thinking display
thinking.display
enum (summarized | omitted) "summarized" Controls whether Anthropic returns summarized or omitted thinking content.
Only when thinking.type = "enabled"
Effort
output_config.effort
enum (low | medium | high) "high" Controls Anthropic response thoroughness and token spend.
Anthropic Claude Opus 4.6 8 params
Parameter Type Default Description Condition
Length · 1 param
Max tokens
max_tokens
integer (1…+∞) 4096 Maximum number of output tokens the model may generate.
Sampling · 3 params
Temperature
temperature
number (0…1 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Not when thinking.type ∈ {"enabled", "adaptive"} or top_p ≠ null
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.
Not when thinking.type ∈ {"enabled", "adaptive"} or temperature ≠ null
Top K
top_k
integer (0…+∞) 0 Limits token sampling to the top K most likely next tokens.
Not when thinking.type ∈ {"enabled", "adaptive"}
Reasoning · 4 params
Thinking mode
thinking.type
enum (disabled | adaptive | enabled) "disabled" Controls the Anthropic thinking mode values supported by this model.
Budget tokens
thinking.budget_tokens
integer (1024…+∞) 4096 Maximum token budget Anthropic may use for extended thinking before producing the final answer.
Only when thinking.type = "enabled"
Thinking display
thinking.display
enum (summarized | omitted) "summarized" Controls whether Anthropic returns summarized or omitted thinking content.
Only when thinking.type ∈ {"adaptive", "enabled"}
Effort
output_config.effort
enum (low | medium | high | max) "high" Controls Anthropic response thoroughness and token spend.
Anthropic Claude Opus 4.6 Subscription 8 params
Parameter Type Default Description Condition
Length · 1 param
Max tokens
max_tokens
integer (1…+∞) 4096 Maximum number of output tokens the model may generate.
Sampling · 3 params
Temperature
temperature
number (0…1 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Not when thinking.type ∈ {"enabled", "adaptive"} or top_p ≠ null
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.
Not when thinking.type ∈ {"enabled", "adaptive"} or temperature ≠ null
Top K
top_k
integer (0…+∞) 0 Limits token sampling to the top K most likely next tokens.
Not when thinking.type ∈ {"enabled", "adaptive"}
Reasoning · 4 params
Thinking mode
thinking.type
enum (disabled | adaptive | enabled) "disabled" Controls the Anthropic thinking mode values supported by this model.
Budget tokens
thinking.budget_tokens
integer (1024…+∞) 4096 Maximum token budget Anthropic may use for extended thinking before producing the final answer.
Only when thinking.type = "enabled"
Thinking display
thinking.display
enum (summarized | omitted) "summarized" Controls whether Anthropic returns summarized or omitted thinking content.
Only when thinking.type ∈ {"adaptive", "enabled"}
Effort
output_config.effort
enum (low | medium | high | max) "high" Controls Anthropic response thoroughness and token spend.
Anthropic Claude Opus 4.7 4 params
Parameter Type Default Description Condition
Length · 1 param
Max tokens
max_tokens
integer (1…+∞) 4096 Maximum number of output tokens the model may generate.
Reasoning · 3 params
Thinking mode
thinking.type
enum (disabled | adaptive) "disabled" Controls the Anthropic thinking mode values supported by this model.
Thinking display
thinking.display
enum (summarized | omitted) "omitted" Controls whether Anthropic returns summarized or omitted thinking content.
Only when thinking.type = "adaptive"
Effort
output_config.effort
enum (low | medium | high | xhigh | max) "high" Controls Anthropic response thoroughness and token spend.
Anthropic Claude Opus 4.7 Subscription 4 params
Parameter Type Default Description Condition
Length · 1 param
Max tokens
max_tokens
integer (1…+∞) 4096 Maximum number of output tokens the model may generate.
Reasoning · 3 params
Thinking mode
thinking.type
enum (disabled | adaptive) "disabled" Controls the Anthropic thinking mode values supported by this model.
Thinking display
thinking.display
enum (summarized | omitted) "omitted" Controls whether Anthropic returns summarized or omitted thinking content.
Only when thinking.type = "adaptive"
Effort
output_config.effort
enum (low | medium | high | xhigh | max) "high" Controls Anthropic response thoroughness and token spend.
Anthropic Claude Opus 4.8 4 params
Parameter Type Default Description Condition
Length · 1 param
Max tokens
max_tokens
integer (1…+∞) 4096 Maximum number of output tokens the model may generate.
Reasoning · 3 params
Thinking mode
thinking.type
enum (disabled | adaptive) "disabled" Controls the Anthropic thinking mode values supported by this model.
Thinking display
thinking.display
enum (summarized | omitted) "omitted" Controls whether Anthropic returns summarized or omitted thinking content.
Only when thinking.type = "adaptive"
Effort
output_config.effort
enum (low | medium | high | xhigh | max) "high" Controls Anthropic response thoroughness and token spend.
Anthropic Claude Opus 4.8 Subscription 4 params
Parameter Type Default Description Condition
Length · 1 param
Max tokens
max_tokens
integer (1…+∞) 4096 Maximum number of output tokens the model may generate.
Reasoning · 3 params
Thinking mode
thinking.type
enum (disabled | adaptive) "disabled" Controls the Anthropic thinking mode values supported by this model.
Thinking display
thinking.display
enum (summarized | omitted) "omitted" Controls whether Anthropic returns summarized or omitted thinking content.
Only when thinking.type = "adaptive"
Effort
output_config.effort
enum (low | medium | high | xhigh | max) "high" Controls Anthropic response thoroughness and token spend.
Anthropic Claude Opus 4 Subscription 6 params
Parameter Type Default Description Condition
Length · 1 param
Max tokens
max_tokens
integer (1…+∞) 4096 Maximum number of output tokens the model may generate.
Sampling · 3 params
Temperature
temperature
number (0…1 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Not when thinking.type ∈ {"adaptive", "enabled"}
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.
Not when thinking.type ∈ {"adaptive", "enabled"} or temperature ≠ 1
Top K
top_k
integer (0…+∞) 0 Limits token sampling to the top K most likely next tokens.
Not when thinking.type ∈ {"adaptive", "enabled"}
Reasoning · 2 params
Thinking mode
thinking.type
enum (disabled | adaptive | enabled) "disabled" Controls the Anthropic thinking mode values supported by this model.
Budget tokens
thinking.budget_tokens
integer (1024…+∞) 4096 Maximum token budget Anthropic may use for extended thinking before producing the final answer.
Only when thinking.type = "enabled"
Anthropic Claude Sonnet 4 20250514 7 params
Parameter Type Default Description Condition
Length · 1 param
Max tokens
max_tokens
integer (1…+∞) 4096 Maximum number of output tokens the model may generate.
Sampling · 3 params
Temperature
temperature
number (0…1 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Not when thinking.type = "enabled"
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.
Not when thinking.type = "enabled"
Top K
top_k
integer (0…+∞) 0 Limits token sampling to the top K most likely next tokens.
Not when thinking.type = "enabled"
Reasoning · 3 params
Thinking mode
thinking.type
enum (disabled | enabled) "disabled" Controls the Anthropic thinking mode values supported by this model.
Budget tokens
thinking.budget_tokens
integer (1024…+∞) 4096 Maximum token budget Anthropic may use for extended thinking before producing the final answer.
Only when thinking.type = "enabled"
Thinking display
thinking.display
enum (summarized | omitted) "summarized" Controls whether Anthropic returns summarized or omitted thinking content.
Only when thinking.type = "enabled"
Anthropic Claude Sonnet 4 20250514 Subscription 7 params
Parameter Type Default Description Condition
Length · 1 param
Max tokens
max_tokens
integer (1…+∞) 4096 Maximum number of output tokens the model may generate.
Sampling · 3 params
Temperature
temperature
number (0…1 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Not when thinking.type = "enabled"
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.
Not when thinking.type = "enabled"
Top K
top_k
integer (0…+∞) 0 Limits token sampling to the top K most likely next tokens.
Not when thinking.type = "enabled"
Reasoning · 3 params
Thinking mode
thinking.type
enum (disabled | enabled) "disabled" Controls the Anthropic thinking mode values supported by this model.
Budget tokens
thinking.budget_tokens
integer (1024…+∞) 4096 Maximum token budget Anthropic may use for extended thinking before producing the final answer.
Only when thinking.type = "enabled"
Thinking display
thinking.display
enum (summarized | omitted) "summarized" Controls whether Anthropic returns summarized or omitted thinking content.
Only when thinking.type = "enabled"
Anthropic Claude Sonnet 4.5 6 params
Parameter Type Default Description Condition
Length · 1 param
Max tokens
max_tokens
integer (1…+∞) 4096 Maximum number of output tokens the model may generate.
Sampling · 3 params
Temperature
temperature
number (0…1 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Not when thinking.type ∈ {"adaptive", "enabled"}
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.
Not when thinking.type ∈ {"adaptive", "enabled"} or temperature ≠ 1
Top K
top_k
integer (0…+∞) 0 Limits token sampling to the top K most likely next tokens.
Not when thinking.type ∈ {"adaptive", "enabled"}
Reasoning · 2 params
Thinking mode
thinking.type
enum (disabled | adaptive | enabled) "disabled" Controls the Anthropic thinking mode values supported by this model.
Budget tokens
thinking.budget_tokens
integer (1024…+∞) 4096 Maximum token budget Anthropic may use for extended thinking before producing the final answer.
Only when thinking.type = "enabled"
Anthropic Claude Sonnet 4.5 20250929 6 params
Parameter Type Default Description Condition
Length · 1 param
Max tokens
max_tokens
integer (1…+∞) 4096 Maximum number of output tokens the model may generate.
Sampling · 3 params
Temperature
temperature
number (0…1 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Not when thinking.type = "enabled" or top_p ≠ null
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.
Not when thinking.type = "enabled" or temperature ≠ null
Top K
top_k
integer (0…+∞) 0 Limits token sampling to the top K most likely next tokens.
Not when thinking.type = "enabled"
Reasoning · 2 params
Thinking mode
thinking.type
enum (disabled | enabled) "disabled" Controls the Anthropic thinking mode values supported by this model.
Budget tokens
thinking.budget_tokens
integer (1024…+∞) 4096 Maximum token budget Anthropic may use for extended thinking before producing the final answer.
Only when thinking.type = "enabled"
Anthropic Claude Sonnet 4.5 20250929 Subscription 6 params
Parameter Type Default Description Condition
Length · 1 param
Max tokens
max_tokens
integer (1…+∞) 4096 Maximum number of output tokens the model may generate.
Sampling · 3 params
Temperature
temperature
number (0…1 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Not when thinking.type = "enabled" or top_p ≠ null
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.
Not when thinking.type = "enabled" or temperature ≠ null
Top K
top_k
integer (0…+∞) 0 Limits token sampling to the top K most likely next tokens.
Not when thinking.type = "enabled"
Reasoning · 2 params
Thinking mode
thinking.type
enum (disabled | enabled) "disabled" Controls the Anthropic thinking mode values supported by this model.
Budget tokens
thinking.budget_tokens
integer (1024…+∞) 4096 Maximum token budget Anthropic may use for extended thinking before producing the final answer.
Only when thinking.type = "enabled"
Anthropic Claude Sonnet 4.5 Subscription 6 params
Parameter Type Default Description Condition
Length · 1 param
Max tokens
max_tokens
integer (1…+∞) 4096 Maximum number of output tokens the model may generate.
Sampling · 3 params
Temperature
temperature
number (0…1 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Not when thinking.type ∈ {"adaptive", "enabled"}
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.
Not when thinking.type ∈ {"adaptive", "enabled"} or temperature ≠ 1
Top K
top_k
integer (0…+∞) 0 Limits token sampling to the top K most likely next tokens.
Not when thinking.type ∈ {"adaptive", "enabled"}
Reasoning · 2 params
Thinking mode
thinking.type
enum (disabled | adaptive | enabled) "disabled" Controls the Anthropic thinking mode values supported by this model.
Budget tokens
thinking.budget_tokens
integer (1024…+∞) 4096 Maximum token budget Anthropic may use for extended thinking before producing the final answer.
Only when thinking.type = "enabled"
Anthropic Claude Sonnet 4.6 8 params
Parameter Type Default Description Condition
Length · 1 param
Max tokens
max_tokens
integer (1…+∞) 4096 Maximum number of output tokens the model may generate.
Sampling · 3 params
Temperature
temperature
number (0…1 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Not when thinking.type ∈ {"enabled", "adaptive"} or top_p ≠ null
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.
Not when thinking.type ∈ {"enabled", "adaptive"} or temperature ≠ null
Top K
top_k
integer (0…+∞) 0 Limits token sampling to the top K most likely next tokens.
Not when thinking.type ∈ {"enabled", "adaptive"}
Reasoning · 4 params
Thinking mode
thinking.type
enum (disabled | adaptive | enabled) "disabled" Controls the Anthropic thinking mode values supported by this model.
Budget tokens
thinking.budget_tokens
integer (1024…+∞) 4096 Maximum token budget Anthropic may use for extended thinking before producing the final answer.
Only when thinking.type = "enabled"
Thinking display
thinking.display
enum (summarized | omitted) "summarized" Controls whether Anthropic returns summarized or omitted thinking content.
Only when thinking.type ∈ {"adaptive", "enabled"}
Effort
output_config.effort
enum (low | medium | high | max) "high" Controls Anthropic response thoroughness and token spend.
Anthropic Claude Sonnet 4.6 Subscription 8 params
Parameter Type Default Description Condition
Length · 1 param
Max tokens
max_tokens
integer (1…+∞) 4096 Maximum number of output tokens the model may generate.
Sampling · 3 params
Temperature
temperature
number (0…1 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Not when thinking.type ∈ {"enabled", "adaptive"} or top_p ≠ null
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.
Not when thinking.type ∈ {"enabled", "adaptive"} or temperature ≠ null
Top K
top_k
integer (0…+∞) 0 Limits token sampling to the top K most likely next tokens.
Not when thinking.type ∈ {"enabled", "adaptive"}
Reasoning · 4 params
Thinking mode
thinking.type
enum (disabled | adaptive | enabled) "disabled" Controls the Anthropic thinking mode values supported by this model.
Budget tokens
thinking.budget_tokens
integer (1024…+∞) 4096 Maximum token budget Anthropic may use for extended thinking before producing the final answer.
Only when thinking.type = "enabled"
Thinking display
thinking.display
enum (summarized | omitted) "summarized" Controls whether Anthropic returns summarized or omitted thinking content.
Only when thinking.type ∈ {"adaptive", "enabled"}
Effort
output_config.effort
enum (low | medium | high | max) "high" Controls Anthropic response thoroughness and token spend.
Anthropic Claude Sonnet 4 Subscription 6 params
Parameter Type Default Description Condition
Length · 1 param
Max tokens
max_tokens
integer (1…+∞) 4096 Maximum number of output tokens the model may generate.
Sampling · 3 params
Temperature
temperature
number (0…1 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Not when thinking.type ∈ {"adaptive", "enabled"}
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.
Not when thinking.type ∈ {"adaptive", "enabled"} or temperature ≠ 1
Top K
top_k
integer (0…+∞) 0 Limits token sampling to the top K most likely next tokens.
Not when thinking.type ∈ {"adaptive", "enabled"}
Reasoning · 2 params
Thinking mode
thinking.type
enum (disabled | adaptive | enabled) "disabled" Controls the Anthropic thinking mode values supported by this model.
Budget tokens
thinking.budget_tokens
integer (1024…+∞) 4096 Maximum token budget Anthropic may use for extended thinking before producing the final answer.
Only when thinking.type = "enabled"

Z.ai 19

Z.ai GLM-4.5 6 params
Parameter Type Default Description Condition
Length · 1 param
Max tokens
max_tokens
integer (1…+∞) Maximum number of tokens to generate in the response.
Sampling · 3 params
Temperature
temperature
number (0…1 step 0.1) 0.6 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Not when do_sample = false
Top P
top_p
number (0.01…1 step 0.01) 0.95 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Not when do_sample = false
Do sample
do_sample
boolean true When false, the model uses greedy decoding and ignores temperature and top_p.
Reasoning · 1 param
Thinking mode
thinking.type
enum (enabled | disabled) "enabled" Toggles the model's extended reasoning before it produces the final answer.
Output · 1 param
Response format
response_format.type
enum (text | json_object) "text" Forces the response into plain text or a JSON object.
Z.ai GLM-4.5-Air 6 params
Parameter Type Default Description Condition
Length · 1 param
Max tokens
max_tokens
integer (1…+∞) Maximum number of tokens to generate in the response.
Sampling · 3 params
Temperature
temperature
number (0…1 step 0.1) 0.6 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Not when do_sample = false
Top P
top_p
number (0.01…1 step 0.01) 0.95 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Not when do_sample = false
Do sample
do_sample
boolean true When false, the model uses greedy decoding and ignores temperature and top_p.
Reasoning · 1 param
Thinking mode
thinking.type
enum (enabled | disabled) "enabled" Toggles the model's extended reasoning before it produces the final answer.
Output · 1 param
Response format
response_format.type
enum (text | json_object) "text" Forces the response into plain text or a JSON object.
Z.ai GLM-4.5-Air Subscription 6 params
Parameter Type Default Description Condition
Length · 1 param
Max tokens
max_tokens
integer (1…+∞) Maximum number of tokens to generate in the response.
Sampling · 3 params
Temperature
temperature
number (0…1 step 0.1) 0.6 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Not when do_sample = false
Top P
top_p
number (0.01…1 step 0.01) 0.95 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Not when do_sample = false
Do sample
do_sample
boolean true When false, the model uses greedy decoding and ignores temperature and top_p.
Reasoning · 1 param
Thinking mode
thinking.type
enum (enabled | disabled) "enabled" Toggles the model's extended reasoning before it produces the final answer.
Output · 1 param
Response format
response_format.type
enum (text | json_object) "text" Forces the response into plain text or a JSON object.
Z.ai GLM-4.5-AirX 6 params
Parameter Type Default Description Condition
Length · 1 param
Max tokens
max_tokens
integer (1…+∞) Maximum number of tokens to generate in the response.
Sampling · 3 params
Temperature
temperature
number (0…1 step 0.1) 0.6 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Not when do_sample = false
Top P
top_p
number (0.01…1 step 0.01) 0.95 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Not when do_sample = false
Do sample
do_sample
boolean true When false, the model uses greedy decoding and ignores temperature and top_p.
Reasoning · 1 param
Thinking mode
thinking.type
enum (enabled | disabled) "enabled" Toggles the model's extended reasoning before it produces the final answer.
Output · 1 param
Response format
response_format.type
enum (text | json_object) "text" Forces the response into plain text or a JSON object.
Z.ai GLM-4.5-Flash 6 params
Parameter Type Default Description Condition
Length · 1 param
Max tokens
max_tokens
integer (1…+∞) Maximum number of tokens to generate in the response.
Sampling · 3 params
Temperature
temperature
number (0…1 step 0.1) 0.6 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Not when do_sample = false
Top P
top_p
number (0.01…1 step 0.01) 0.95 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Not when do_sample = false
Do sample
do_sample
boolean true When false, the model uses greedy decoding and ignores temperature and top_p.
Reasoning · 1 param
Thinking mode
thinking.type
enum (enabled | disabled) "enabled" Toggles the model's extended reasoning before it produces the final answer.
Output · 1 param
Response format
response_format.type
enum (text | json_object) "text" Forces the response into plain text or a JSON object.
Z.ai GLM-4.5 Subscription 6 params
Parameter Type Default Description Condition
Length · 1 param
Max tokens
max_tokens
integer (1…+∞) Maximum number of tokens to generate in the response.
Sampling · 3 params
Temperature
temperature
number (0…1 step 0.1) 0.6 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Not when do_sample = false
Top P
top_p
number (0.01…1 step 0.01) 0.95 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Not when do_sample = false
Do sample
do_sample
boolean true When false, the model uses greedy decoding and ignores temperature and top_p.
Reasoning · 1 param
Thinking mode
thinking.type
enum (enabled | disabled) "enabled" Toggles the model's extended reasoning before it produces the final answer.
Output · 1 param
Response format
response_format.type
enum (text | json_object) "text" Forces the response into plain text or a JSON object.
Z.ai GLM-4.5-X 6 params
Parameter Type Default Description Condition
Length · 1 param
Max tokens
max_tokens
integer (1…+∞) Maximum number of tokens to generate in the response.
Sampling · 3 params
Temperature
temperature
number (0…1 step 0.1) 0.6 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Not when do_sample = false
Top P
top_p
number (0.01…1 step 0.01) 0.95 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Not when do_sample = false
Do sample
do_sample
boolean true When false, the model uses greedy decoding and ignores temperature and top_p.
Reasoning · 1 param
Thinking mode
thinking.type
enum (enabled | disabled) "enabled" Toggles the model's extended reasoning before it produces the final answer.
Output · 1 param
Response format
response_format.type
enum (text | json_object) "text" Forces the response into plain text or a JSON object.
Z.ai GLM-4.6 6 params
Parameter Type Default Description Condition
Length · 1 param
Max tokens
max_tokens
integer (1…+∞) Maximum number of tokens to generate in the response.
Sampling · 3 params
Temperature
temperature
number (0…1 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Not when do_sample = false
Top P
top_p
number (0.01…1 step 0.01) 0.95 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Not when do_sample = false
Do sample
do_sample
boolean true When false, the model uses greedy decoding and ignores temperature and top_p.
Reasoning · 1 param
Thinking mode
thinking.type
enum (enabled | disabled) "enabled" Toggles the model's extended reasoning before it produces the final answer.
Output · 1 param
Response format
response_format.type
enum (text | json_object) "text" Forces the response into plain text or a JSON object.
Z.ai GLM-4.6 Subscription 6 params
Parameter Type Default Description Condition
Length · 1 param
Max tokens
max_tokens
integer (1…+∞) Maximum number of tokens to generate in the response.
Sampling · 3 params
Temperature
temperature
number (0…1 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Not when do_sample = false
Top P
top_p
number (0.01…1 step 0.01) 0.95 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Not when do_sample = false
Do sample
do_sample
boolean true When false, the model uses greedy decoding and ignores temperature and top_p.
Reasoning · 1 param
Thinking mode
thinking.type
enum (enabled | disabled) "enabled" Toggles the model's extended reasoning before it produces the final answer.
Output · 1 param
Response format
response_format.type
enum (text | json_object) "text" Forces the response into plain text or a JSON object.
Z.ai GLM-4.7 6 params
Parameter Type Default Description Condition
Length · 1 param
Max tokens
max_tokens
integer (1…+∞) Maximum number of tokens to generate in the response.
Sampling · 3 params
Temperature
temperature
number (0…1 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Not when do_sample = false
Top P
top_p
number (0.01…1 step 0.01) 0.95 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Not when do_sample = false
Do sample
do_sample
boolean true When false, the model uses greedy decoding and ignores temperature and top_p.
Reasoning · 1 param
Thinking mode
thinking.type
enum (enabled | disabled) "enabled" Toggles the model's extended reasoning before it produces the final answer.
Output · 1 param
Response format
response_format.type
enum (text | json_object) "text" Forces the response into plain text or a JSON object.
Z.ai GLM-4.7-Flash 6 params
Parameter Type Default Description Condition
Length · 1 param
Max tokens
max_tokens
integer (1…+∞) Maximum number of tokens to generate in the response.
Sampling · 3 params
Temperature
temperature
number (0…1 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Not when do_sample = false
Top P
top_p
number (0.01…1 step 0.01) 0.95 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Not when do_sample = false
Do sample
do_sample
boolean true When false, the model uses greedy decoding and ignores temperature and top_p.
Reasoning · 1 param
Thinking mode
thinking.type
enum (enabled | disabled) "enabled" Toggles the model's extended reasoning before it produces the final answer.
Output · 1 param
Response format
response_format.type
enum (text | json_object) "text" Forces the response into plain text or a JSON object.
Z.ai GLM-4.7-FlashX 6 params
Parameter Type Default Description Condition
Length · 1 param
Max tokens
max_tokens
integer (1…+∞) Maximum number of tokens to generate in the response.
Sampling · 3 params
Temperature
temperature
number (0…1 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Not when do_sample = false
Top P
top_p
number (0.01…1 step 0.01) 0.95 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Not when do_sample = false
Do sample
do_sample
boolean true When false, the model uses greedy decoding and ignores temperature and top_p.
Reasoning · 1 param
Thinking mode
thinking.type
enum (enabled | disabled) "enabled" Toggles the model's extended reasoning before it produces the final answer.
Output · 1 param
Response format
response_format.type
enum (text | json_object) "text" Forces the response into plain text or a JSON object.
Z.ai GLM-4.7 Subscription 6 params
Parameter Type Default Description Condition
Length · 1 param
Max tokens
max_tokens
integer (1…+∞) Maximum number of tokens to generate in the response.
Sampling · 3 params
Temperature
temperature
number (0…1 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Not when do_sample = false
Top P
top_p
number (0.01…1 step 0.01) 0.95 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Not when do_sample = false
Do sample
do_sample
boolean true When false, the model uses greedy decoding and ignores temperature and top_p.
Reasoning · 1 param
Thinking mode
thinking.type
enum (enabled | disabled) "enabled" Toggles the model's extended reasoning before it produces the final answer.
Output · 1 param
Response format
response_format.type
enum (text | json_object) "text" Forces the response into plain text or a JSON object.
Z.ai GLM-5 6 params
Parameter Type Default Description Condition
Length · 1 param
Max tokens
max_tokens
integer (1…+∞) Maximum number of tokens to generate in the response.
Sampling · 3 params
Temperature
temperature
number (0…1 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Not when do_sample = false
Top P
top_p
number (0.01…1 step 0.01) 0.95 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Not when do_sample = false
Do sample
do_sample
boolean true When false, the model uses greedy decoding and ignores temperature and top_p.
Reasoning · 1 param
Thinking mode
thinking.type
enum (enabled | disabled) "enabled" Toggles the model's extended reasoning before it produces the final answer.
Output · 1 param
Response format
response_format.type
enum (text | json_object) "text" Forces the response into plain text or a JSON object.
Z.ai GLM-5 Subscription 6 params
Parameter Type Default Description Condition
Length · 1 param
Max tokens
max_tokens
integer (1…+∞) Maximum number of tokens to generate in the response.
Sampling · 3 params
Temperature
temperature
number (0…1 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Not when do_sample = false
Top P
top_p
number (0.01…1 step 0.01) 0.95 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Not when do_sample = false
Do sample
do_sample
boolean true When false, the model uses greedy decoding and ignores temperature and top_p.
Reasoning · 1 param
Thinking mode
thinking.type
enum (enabled | disabled) "enabled" Toggles the model's extended reasoning before it produces the final answer.
Output · 1 param
Response format
response_format.type
enum (text | json_object) "text" Forces the response into plain text or a JSON object.
Z.ai GLM-5-Turbo 6 params
Parameter Type Default Description Condition
Length · 1 param
Max tokens
max_tokens
integer (1…+∞) Maximum number of tokens to generate in the response.
Sampling · 3 params
Temperature
temperature
number (0…1 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Not when do_sample = false
Top P
top_p
number (0.01…1 step 0.01) 0.95 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Not when do_sample = false
Do sample
do_sample
boolean true When false, the model uses greedy decoding and ignores temperature and top_p.
Reasoning · 1 param
Thinking mode
thinking.type
enum (enabled | disabled) "enabled" Toggles the model's extended reasoning before it produces the final answer.
Output · 1 param
Response format
response_format.type
enum (text | json_object) "text" Forces the response into plain text or a JSON object.
Z.ai GLM-5-Turbo Subscription 6 params
Parameter Type Default Description Condition
Length · 1 param
Max tokens
max_tokens
integer (1…+∞) Maximum number of tokens to generate in the response.
Sampling · 3 params
Temperature
temperature
number (0…1 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Not when do_sample = false
Top P
top_p
number (0.01…1 step 0.01) 0.95 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Not when do_sample = false
Do sample
do_sample
boolean true When false, the model uses greedy decoding and ignores temperature and top_p.
Reasoning · 1 param
Thinking mode
thinking.type
enum (enabled | disabled) "enabled" Toggles the model's extended reasoning before it produces the final answer.
Output · 1 param
Response format
response_format.type
enum (text | json_object) "text" Forces the response into plain text or a JSON object.
Z.ai GLM-5.1 6 params
Parameter Type Default Description Condition
Length · 1 param
Max tokens
max_tokens
integer (1…+∞) Maximum number of tokens to generate in the response.
Sampling · 3 params
Temperature
temperature
number (0…1 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Not when do_sample = false
Top P
top_p
number (0.01…1 step 0.01) 0.95 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Not when do_sample = false
Do sample
do_sample
boolean true When false, the model uses greedy decoding and ignores temperature and top_p.
Reasoning · 1 param
Thinking mode
thinking.type
enum (enabled | disabled) "enabled" Toggles the model's extended reasoning before it produces the final answer.
Output · 1 param
Response format
response_format.type
enum (text | json_object) "text" Forces the response into plain text or a JSON object.
Z.ai GLM-5.1 Subscription 6 params
Parameter Type Default Description Condition
Length · 1 param
Max tokens
max_tokens
integer (1…+∞) Maximum number of tokens to generate in the response.
Sampling · 3 params
Temperature
temperature
number (0…1 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Not when do_sample = false
Top P
top_p
number (0.01…1 step 0.01) 0.95 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Not when do_sample = false
Do sample
do_sample
boolean true When false, the model uses greedy decoding and ignores temperature and top_p.
Reasoning · 1 param
Thinking mode
thinking.type
enum (enabled | disabled) "enabled" Toggles the model's extended reasoning before it produces the final answer.
Output · 1 param
Response format
response_format.type
enum (text | json_object) "text" Forces the response into plain text or a JSON object.

MiniMax 16

MiniMax MiniMax M2 4 params
Parameter Type Default Description Condition
Length · 1 param
Max completion tokens
max_completion_tokens
integer (1…+∞) Maximum number of tokens to generate in the completion.
Sampling · 2 params
Temperature
temperature
number (0.01…1 step 0.01) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied. Values must be greater than 0 and at most 1.
Top P
top_p
number (0.01…1 step 0.01) 0.95 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Reasoning · 1 param
Split reasoning
reasoning_split
boolean false Returns the model's reasoning in a separate reasoning_details field instead of inline with the response.
MiniMax MiniMax M2 Subscription 3 params
Parameter Type Default Description Condition
Length · 1 param
Max tokens
max_tokens
integer (1…+∞) Maximum number of tokens to generate in the response.
Sampling · 2 params
Temperature
temperature
number (0.01…1 step 0.01) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied. Values must be greater than 0 and at most 1.
Top P
top_p
number (0.01…1 step 0.01) 0.95 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
MiniMax MiniMax M2.1 4 params
Parameter Type Default Description Condition
Length · 1 param
Max completion tokens
max_completion_tokens
integer (1…+∞) Maximum number of tokens to generate in the completion.
Sampling · 2 params
Temperature
temperature
number (0.01…1 step 0.01) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied. Values must be greater than 0 and at most 1.
Top P
top_p
number (0.01…1 step 0.01) 0.95 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Reasoning · 1 param
Split reasoning
reasoning_split
boolean false Returns the model's reasoning in a separate reasoning_details field instead of inline with the response.
MiniMax MiniMax M2.1 Highspeed 4 params
Parameter Type Default Description Condition
Length · 1 param
Max completion tokens
max_completion_tokens
integer (1…+∞) Maximum number of tokens to generate in the completion.
Sampling · 2 params
Temperature
temperature
number (0.01…1 step 0.01) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied. Values must be greater than 0 and at most 1.
Top P
top_p
number (0.01…1 step 0.01) 0.95 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Reasoning · 1 param
Split reasoning
reasoning_split
boolean false Returns the model's reasoning in a separate reasoning_details field instead of inline with the response.
MiniMax MiniMax M2.1 Highspeed Subscription 3 params
Parameter Type Default Description Condition
Length · 1 param
Max tokens
max_tokens
integer (1…+∞) Maximum number of tokens to generate in the response.
Sampling · 2 params
Temperature
temperature
number (0.01…1 step 0.01) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied. Values must be greater than 0 and at most 1.
Top P
top_p
number (0.01…1 step 0.01) 0.95 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
MiniMax MiniMax M2.1 Subscription 3 params
Parameter Type Default Description Condition
Length · 1 param
Max tokens
max_tokens
integer (1…+∞) Maximum number of tokens to generate in the response.
Sampling · 2 params
Temperature
temperature
number (0.01…1 step 0.01) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied. Values must be greater than 0 and at most 1.
Top P
top_p
number (0.01…1 step 0.01) 0.95 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
MiniMax MiniMax M2.5 4 params
Parameter Type Default Description Condition
Length · 1 param
Max completion tokens
max_completion_tokens
integer (1…+∞) Maximum number of tokens to generate in the completion.
Sampling · 2 params
Temperature
temperature
number (0.01…1 step 0.01) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied. Values must be greater than 0 and at most 1.
Top P
top_p
number (0.01…1 step 0.01) 0.95 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Reasoning · 1 param
Split reasoning
reasoning_split
boolean false Returns the model's reasoning in a separate reasoning_details field instead of inline with the response.
MiniMax MiniMax M2.5 Highspeed 4 params
Parameter Type Default Description Condition
Length · 1 param
Max completion tokens
max_completion_tokens
integer (1…+∞) Maximum number of tokens to generate in the completion.
Sampling · 2 params
Temperature
temperature
number (0.01…1 step 0.01) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied. Values must be greater than 0 and at most 1.
Top P
top_p
number (0.01…1 step 0.01) 0.95 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Reasoning · 1 param
Split reasoning
reasoning_split
boolean false Returns the model's reasoning in a separate reasoning_details field instead of inline with the response.
MiniMax MiniMax M2.5 Highspeed Subscription 3 params
Parameter Type Default Description Condition
Length · 1 param
Max tokens
max_tokens
integer (1…+∞) Maximum number of tokens to generate in the response.
Sampling · 2 params
Temperature
temperature
number (0.01…1 step 0.01) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied. Values must be greater than 0 and at most 1.
Top P
top_p
number (0.01…1 step 0.01) 0.95 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
MiniMax MiniMax M2.5 Subscription 3 params
Parameter Type Default Description Condition
Length · 1 param
Max tokens
max_tokens
integer (1…+∞) Maximum number of tokens to generate in the response.
Sampling · 2 params
Temperature
temperature
number (0.01…1 step 0.01) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied. Values must be greater than 0 and at most 1.
Top P
top_p
number (0.01…1 step 0.01) 0.95 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
MiniMax MiniMax M2.7 4 params
Parameter Type Default Description Condition
Length · 1 param
Max completion tokens
max_completion_tokens
integer (1…+∞) Maximum number of tokens to generate in the completion.
Sampling · 2 params
Temperature
temperature
number (0.01…1 step 0.01) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied. Values must be greater than 0 and at most 1.
Top P
top_p
number (0.01…1 step 0.01) 0.95 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Reasoning · 1 param
Split reasoning
reasoning_split
boolean false Returns the model's reasoning in a separate reasoning_details field instead of inline with the response.
MiniMax MiniMax M2.7 Highspeed 4 params
Parameter Type Default Description Condition
Length · 1 param
Max completion tokens
max_completion_tokens
integer (1…+∞) Maximum number of tokens to generate in the completion.
Sampling · 2 params
Temperature
temperature
number (0.01…1 step 0.01) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied. Values must be greater than 0 and at most 1.
Top P
top_p
number (0.01…1 step 0.01) 0.95 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Reasoning · 1 param
Split reasoning
reasoning_split
boolean false Returns the model's reasoning in a separate reasoning_details field instead of inline with the response.
MiniMax MiniMax M2.7 Highspeed Subscription 3 params
Parameter Type Default Description Condition
Length · 1 param
Max tokens
max_tokens
integer (1…+∞) Maximum number of tokens to generate in the response.
Sampling · 2 params
Temperature
temperature
number (0.01…1 step 0.01) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied. Values must be greater than 0 and at most 1.
Top P
top_p
number (0.01…1 step 0.01) 0.95 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
MiniMax MiniMax M2.7 Subscription 3 params
Parameter Type Default Description Condition
Length · 1 param
Max tokens
max_tokens
integer (1…+∞) Maximum number of tokens to generate in the response.
Sampling · 2 params
Temperature
temperature
number (0.01…1 step 0.01) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied. Values must be greater than 0 and at most 1.
Top P
top_p
number (0.01…1 step 0.01) 0.95 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
MiniMax Minimax M3 4 params
Parameter Type Default Description Condition
Length · 1 param
Max completion tokens
max_completion_tokens
integer (1…+∞) Maximum number of tokens to generate in the completion.
Sampling · 2 params
Temperature
temperature
number (0.01…1 step 0.01) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied. Values must be greater than 0 and at most 1.
Top P
top_p
number (0.01…1 step 0.01) 0.95 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Reasoning · 1 param
Split reasoning
reasoning_split
boolean false Returns the model's reasoning in a separate reasoning_details field instead of inline with the response.
MiniMax MiniMax M3 Subscription 3 params
Parameter Type Default Description Condition
Length · 1 param
Max tokens
max_tokens
integer (1…+∞) Maximum number of tokens to generate in the response.
Sampling · 2 params
Temperature
temperature
number (0.01…1 step 0.01) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied. Values must be greater than 0 and at most 1.
Top P
top_p
number (0.01…1 step 0.01) 0.95 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.

Mistral 13

Mistral Codestral Latest 9 params
Parameter Type Default Description Condition
Length · 2 params
Max tokens
max_tokens
integer (1…+∞) Maximum number of tokens to generate in the completion.
Stop sequence
stop
string Stops generation when this string is detected.
Sampling · 5 params
Temperature
temperature
number (0…1.5 step 0.1) Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Random seed
random_seed
integer (0…+∞) Seed used for deterministic sampling when reproducible outputs are desired.
Presence penalty
presence_penalty
number (-2…2 step 0.1) 0 Penalizes repeated words or phrases to encourage a wider variety of generated content.
Frequency penalty
frequency_penalty
number (-2…2 step 0.1) 0 Penalizes words based on how often they already appear in the generated text.
Output · 1 param
Response format
response_format.type
enum (text | json_object) "text" Controls whether the model returns normal text or JSON mode output.
Metadata · 1 param
Safe prompt
safe_prompt
boolean false Controls whether Mistral injects its safety prompt before the conversation.
Mistral Devstral 2512 9 params
Parameter Type Default Description Condition
Length · 2 params
Max tokens
max_tokens
integer (1…+∞) Maximum number of tokens to generate in the completion.
Stop sequence
stop
string Stops generation when this string is detected.
Sampling · 5 params
Temperature
temperature
number (0…1.5 step 0.1) Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Random seed
random_seed
integer (0…+∞) Seed used for deterministic sampling when reproducible outputs are desired.
Presence penalty
presence_penalty
number (-2…2 step 0.1) 0 Penalizes repeated words or phrases to encourage a wider variety of generated content.
Frequency penalty
frequency_penalty
number (-2…2 step 0.1) 0 Penalizes words based on how often they already appear in the generated text.
Output · 1 param
Response format
response_format.type
enum (text | json_object) "text" Controls whether the model returns normal text or JSON mode output.
Metadata · 1 param
Safe prompt
safe_prompt
boolean false Controls whether Mistral injects its safety prompt before the conversation.
Mistral Devstral Latest 9 params
Parameter Type Default Description Condition
Length · 2 params
Max tokens
max_tokens
integer (1…+∞) Maximum number of tokens to generate in the completion.
Stop sequence
stop
string Stops generation when this string is detected.
Sampling · 5 params
Temperature
temperature
number (0…1.5 step 0.1) Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Random seed
random_seed
integer (0…+∞) Seed used for deterministic sampling when reproducible outputs are desired.
Presence penalty
presence_penalty
number (-2…2 step 0.1) 0 Penalizes repeated words or phrases to encourage a wider variety of generated content.
Frequency penalty
frequency_penalty
number (-2…2 step 0.1) 0 Penalizes words based on how often they already appear in the generated text.
Output · 1 param
Response format
response_format.type
enum (text | json_object) "text" Controls whether the model returns normal text or JSON mode output.
Metadata · 1 param
Safe prompt
safe_prompt
boolean false Controls whether Mistral injects its safety prompt before the conversation.
Mistral Magistral Medium Latest 10 params
Parameter Type Default Description Condition
Length · 2 params
Max tokens
max_tokens
integer (1…+∞) Maximum number of tokens to generate in the completion.
Stop sequence
stop
string Stops generation when this string is detected.
Sampling · 5 params
Temperature
temperature
number (0…1.5 step 0.1) Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Random seed
random_seed
integer (0…+∞) Seed used for deterministic sampling when reproducible outputs are desired.
Presence penalty
presence_penalty
number (-2…2 step 0.1) 0 Penalizes repeated words or phrases to encourage a wider variety of generated content.
Frequency penalty
frequency_penalty
number (-2…2 step 0.1) 0 Penalizes words based on how often they already appear in the generated text.
Reasoning · 1 param
Prompt mode
prompt_mode
enum (reasoning) Enables Mistral's reasoning system prompt; leave unset to disable the default reasoning behavior.
Output · 1 param
Response format
response_format.type
enum (text | json_object) "text" Controls whether the model returns normal text or JSON mode output.
Metadata · 1 param
Safe prompt
safe_prompt
boolean false Controls whether Mistral injects its safety prompt before the conversation.
Mistral Magistral Small Latest 10 params
Parameter Type Default Description Condition
Length · 2 params
Max tokens
max_tokens
integer (1…+∞) Maximum number of tokens to generate in the completion.
Stop sequence
stop
string Stops generation when this string is detected.
Sampling · 5 params
Temperature
temperature
number (0…1.5 step 0.1) Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Random seed
random_seed
integer (0…+∞) Seed used for deterministic sampling when reproducible outputs are desired.
Presence penalty
presence_penalty
number (-2…2 step 0.1) 0 Penalizes repeated words or phrases to encourage a wider variety of generated content.
Frequency penalty
frequency_penalty
number (-2…2 step 0.1) 0 Penalizes words based on how often they already appear in the generated text.
Reasoning · 1 param
Prompt mode
prompt_mode
enum (reasoning) Enables Mistral's reasoning system prompt; leave unset to disable the default reasoning behavior.
Output · 1 param
Response format
response_format.type
enum (text | json_object) "text" Controls whether the model returns normal text or JSON mode output.
Metadata · 1 param
Safe prompt
safe_prompt
boolean false Controls whether Mistral injects its safety prompt before the conversation.
Mistral Ministral 14b Latest 9 params
Parameter Type Default Description Condition
Length · 2 params
Max tokens
max_tokens
integer (1…+∞) Maximum number of tokens to generate in the completion.
Stop sequence
stop
string Stops generation when this string is detected.
Sampling · 5 params
Temperature
temperature
number (0…1.5 step 0.1) Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Random seed
random_seed
integer (0…+∞) Seed used for deterministic sampling when reproducible outputs are desired.
Presence penalty
presence_penalty
number (-2…2 step 0.1) 0 Penalizes repeated words or phrases to encourage a wider variety of generated content.
Frequency penalty
frequency_penalty
number (-2…2 step 0.1) 0 Penalizes words based on how often they already appear in the generated text.
Output · 1 param
Response format
response_format.type
enum (text | json_object) "text" Controls whether the model returns normal text or JSON mode output.
Metadata · 1 param
Safe prompt
safe_prompt
boolean false Controls whether Mistral injects its safety prompt before the conversation.
Mistral Ministral 3b Latest 9 params
Parameter Type Default Description Condition
Length · 2 params
Max tokens
max_tokens
integer (1…+∞) Maximum number of tokens to generate in the completion.
Stop sequence
stop
string Stops generation when this string is detected.
Sampling · 5 params
Temperature
temperature
number (0…1.5 step 0.1) Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Random seed
random_seed
integer (0…+∞) Seed used for deterministic sampling when reproducible outputs are desired.
Presence penalty
presence_penalty
number (-2…2 step 0.1) 0 Penalizes repeated words or phrases to encourage a wider variety of generated content.
Frequency penalty
frequency_penalty
number (-2…2 step 0.1) 0 Penalizes words based on how often they already appear in the generated text.
Output · 1 param
Response format
response_format.type
enum (text | json_object) "text" Controls whether the model returns normal text or JSON mode output.
Metadata · 1 param
Safe prompt
safe_prompt
boolean false Controls whether Mistral injects its safety prompt before the conversation.
Mistral Ministral 8b Latest 9 params
Parameter Type Default Description Condition
Length · 2 params
Max tokens
max_tokens
integer (1…+∞) Maximum number of tokens to generate in the completion.
Stop sequence
stop
string Stops generation when this string is detected.
Sampling · 5 params
Temperature
temperature
number (0…1.5 step 0.1) Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Random seed
random_seed
integer (0…+∞) Seed used for deterministic sampling when reproducible outputs are desired.
Presence penalty
presence_penalty
number (-2…2 step 0.1) 0 Penalizes repeated words or phrases to encourage a wider variety of generated content.
Frequency penalty
frequency_penalty
number (-2…2 step 0.1) 0 Penalizes words based on how often they already appear in the generated text.
Output · 1 param
Response format
response_format.type
enum (text | json_object) "text" Controls whether the model returns normal text or JSON mode output.
Metadata · 1 param
Safe prompt
safe_prompt
boolean false Controls whether Mistral injects its safety prompt before the conversation.
Mistral Mistral Large Latest 9 params
Parameter Type Default Description Condition
Length · 2 params
Max tokens
max_tokens
integer (1…+∞) Maximum number of tokens to generate in the completion.
Stop sequence
stop
string Stops generation when this string is detected.
Sampling · 5 params
Temperature
temperature
number (0…1.5 step 0.1) Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Random seed
random_seed
integer (0…+∞) Seed used for deterministic sampling when reproducible outputs are desired.
Presence penalty
presence_penalty
number (-2…2 step 0.1) 0 Penalizes repeated words or phrases to encourage a wider variety of generated content.
Frequency penalty
frequency_penalty
number (-2…2 step 0.1) 0 Penalizes words based on how often they already appear in the generated text.
Output · 1 param
Response format
response_format.type
enum (text | json_object) "text" Controls whether the model returns normal text or JSON mode output.
Metadata · 1 param
Safe prompt
safe_prompt
boolean false Controls whether Mistral injects its safety prompt before the conversation.
Mistral Mistral Medium 3.5 9 params
Parameter Type Default Description Condition
Length · 2 params
Max tokens
max_tokens
integer (1…+∞) Maximum number of tokens to generate in the completion.
Stop sequence
stop
string Stops generation when this string is detected.
Sampling · 5 params
Temperature
temperature
number (0…1.5 step 0.1) Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Random seed
random_seed
integer (0…+∞) Seed used for deterministic sampling when reproducible outputs are desired.
Presence penalty
presence_penalty
number (-2…2 step 0.1) 0 Penalizes repeated words or phrases to encourage a wider variety of generated content.
Frequency penalty
frequency_penalty
number (-2…2 step 0.1) 0 Penalizes words based on how often they already appear in the generated text.
Output · 1 param
Response format
response_format.type
enum (text | json_object) "text" Controls whether the model returns normal text or JSON mode output.
Metadata · 1 param
Safe prompt
safe_prompt
boolean false Controls whether Mistral injects its safety prompt before the conversation.
Mistral Mistral Medium Latest 9 params
Parameter Type Default Description Condition
Length · 2 params
Max tokens
max_tokens
integer (1…+∞) Maximum number of tokens to generate in the completion.
Stop sequence
stop
string Stops generation when this string is detected.
Sampling · 5 params
Temperature
temperature
number (0…1.5 step 0.1) Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Random seed
random_seed
integer (0…+∞) Seed used for deterministic sampling when reproducible outputs are desired.
Presence penalty
presence_penalty
number (-2…2 step 0.1) 0 Penalizes repeated words or phrases to encourage a wider variety of generated content.
Frequency penalty
frequency_penalty
number (-2…2 step 0.1) 0 Penalizes words based on how often they already appear in the generated text.
Output · 1 param
Response format
response_format.type
enum (text | json_object) "text" Controls whether the model returns normal text or JSON mode output.
Metadata · 1 param
Safe prompt
safe_prompt
boolean false Controls whether Mistral injects its safety prompt before the conversation.
Mistral Mistral Small Latest 9 params
Parameter Type Default Description Condition
Length · 2 params
Max tokens
max_tokens
integer (1…+∞) Maximum number of tokens to generate in the completion.
Stop sequence
stop
string Stops generation when this string is detected.
Sampling · 5 params
Temperature
temperature
number (0…1.5 step 0.1) Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Random seed
random_seed
integer (0…+∞) Seed used for deterministic sampling when reproducible outputs are desired.
Presence penalty
presence_penalty
number (-2…2 step 0.1) 0 Penalizes repeated words or phrases to encourage a wider variety of generated content.
Frequency penalty
frequency_penalty
number (-2…2 step 0.1) 0 Penalizes words based on how often they already appear in the generated text.
Output · 1 param
Response format
response_format.type
enum (text | json_object) "text" Controls whether the model returns normal text or JSON mode output.
Metadata · 1 param
Safe prompt
safe_prompt
boolean false Controls whether Mistral injects its safety prompt before the conversation.
Mistral Open Mistral Nemo 9 params
Parameter Type Default Description Condition
Length · 2 params
Max tokens
max_tokens
integer (1…+∞) Maximum number of tokens to generate in the completion.
Stop sequence
stop
string Stops generation when this string is detected.
Sampling · 5 params
Temperature
temperature
number (0…1.5 step 0.1) Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Random seed
random_seed
integer (0…+∞) Seed used for deterministic sampling when reproducible outputs are desired.
Presence penalty
presence_penalty
number (-2…2 step 0.1) 0 Penalizes repeated words or phrases to encourage a wider variety of generated content.
Frequency penalty
frequency_penalty
number (-2…2 step 0.1) 0 Penalizes words based on how often they already appear in the generated text.
Output · 1 param
Response format
response_format.type
enum (text | json_object) "text" Controls whether the model returns normal text or JSON mode output.
Metadata · 1 param
Safe prompt
safe_prompt
boolean false Controls whether Mistral injects its safety prompt before the conversation.

Google 11

Google Gemini 2.5 Flash 8 params
Parameter Type Default Description Condition
Length · 1 param
Max output tokens
generationConfig.maxOutputTokens
integer (1…65536) Maximum number of tokens to include in a response candidate.
Sampling · 4 params
Temperature
generationConfig.temperature
number (0…2 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
generationConfig.topP
number (0…1 step 0.01) 0.95 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Top K
generationConfig.topK
integer (0…+∞) 64 Limits token sampling to the top K most likely next tokens.
Seed
generationConfig.seed
integer Optional seed used for decoding when reproducible sampling is desired.
Reasoning · 2 params
Thinking budget
generationConfig.thinkingConfig.thinkingBudget
integer (-1…24576) -1 Number of thinking tokens Gemini should use; 0 disables thinking and -1 uses dynamic thinking.
Include thoughts
generationConfig.thinkingConfig.includeThoughts
boolean false Controls whether Gemini returns available thought summaries in the response parts.
Output · 1 param
Response MIME type
generationConfig.responseMimeType
enum (text/plain | application/json) "text/plain" MIME type for generated text candidates.
Google Gemini 2.5 Flash Lite 8 params
Parameter Type Default Description Condition
Length · 1 param
Max output tokens
generationConfig.maxOutputTokens
integer (1…65536) Maximum number of tokens to include in a response candidate.
Sampling · 4 params
Temperature
generationConfig.temperature
number (0…2 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
generationConfig.topP
number (0…1 step 0.01) 0.95 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Top K
generationConfig.topK
integer (0…+∞) 64 Limits token sampling to the top K most likely next tokens.
Seed
generationConfig.seed
integer Optional seed used for decoding when reproducible sampling is desired.
Reasoning · 2 params
Thinking budget
generationConfig.thinkingConfig.thinkingBudget
integer 0 Number of thinking tokens Gemini should use; -1 uses dynamic thinking, 0 disables thinking, and fixed budgets start at 512 tokens.
Include thoughts
generationConfig.thinkingConfig.includeThoughts
boolean false Controls whether Gemini returns available thought summaries in the response parts.
Output · 1 param
Response MIME type
generationConfig.responseMimeType
enum (text/plain | application/json) "text/plain" MIME type for generated text candidates.
Google Gemini 2.5 Flash Lite Subscription 8 params
Parameter Type Default Description Condition
Length · 1 param
Max output tokens
generationConfig.maxOutputTokens
integer (1…65536) Maximum number of tokens to include in a response candidate.
Sampling · 4 params
Temperature
generationConfig.temperature
number (0…2 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
generationConfig.topP
number (0…1 step 0.01) 0.95 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Top K
generationConfig.topK
integer (0…+∞) 64 Limits token sampling to the top K most likely next tokens.
Seed
generationConfig.seed
integer Optional seed used for decoding when reproducible sampling is desired.
Reasoning · 2 params
Thinking budget
generationConfig.thinkingConfig.thinkingBudget
integer 0 Number of thinking tokens Gemini should use; -1 uses dynamic thinking, 0 disables thinking, and fixed budgets start at 512 tokens.
Include thoughts
generationConfig.thinkingConfig.includeThoughts
boolean false Controls whether Gemini returns available thought summaries in the response parts.
Output · 1 param
Response MIME type
generationConfig.responseMimeType
enum (text/plain | application/json) "text/plain" MIME type for generated text candidates.
Google Gemini 2.5 Flash Subscription 8 params
Parameter Type Default Description Condition
Length · 1 param
Max output tokens
generationConfig.maxOutputTokens
integer (1…65536) Maximum number of tokens to include in a response candidate.
Sampling · 4 params
Temperature
generationConfig.temperature
number (0…2 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
generationConfig.topP
number (0…1 step 0.01) 0.95 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Top K
generationConfig.topK
integer (0…+∞) 64 Limits token sampling to the top K most likely next tokens.
Seed
generationConfig.seed
integer Optional seed used for decoding when reproducible sampling is desired.
Reasoning · 2 params
Thinking budget
generationConfig.thinkingConfig.thinkingBudget
integer (-1…24576) -1 Number of thinking tokens Gemini should use; 0 disables thinking and -1 uses dynamic thinking.
Include thoughts
generationConfig.thinkingConfig.includeThoughts
boolean false Controls whether Gemini returns available thought summaries in the response parts.
Output · 1 param
Response MIME type
generationConfig.responseMimeType
enum (text/plain | application/json) "text/plain" MIME type for generated text candidates.
Google Gemini 2.5 Pro 8 params
Parameter Type Default Description Condition
Length · 1 param
Max output tokens
generationConfig.maxOutputTokens
integer (1…65536) Maximum number of tokens to include in a response candidate.
Sampling · 4 params
Temperature
generationConfig.temperature
number (0…2 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
generationConfig.topP
number (0…1 step 0.01) 0.95 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Top K
generationConfig.topK
integer (0…+∞) 64 Limits token sampling to the top K most likely next tokens.
Seed
generationConfig.seed
integer Optional seed used for decoding when reproducible sampling is desired.
Reasoning · 2 params
Thinking budget
generationConfig.thinkingConfig.thinkingBudget
integer (128…32768) Maximum number of thinking tokens Gemini should use before producing the final answer.
Include thoughts
generationConfig.thinkingConfig.includeThoughts
boolean false Controls whether Gemini returns available thought summaries in the response parts.
Output · 1 param
Response MIME type
generationConfig.responseMimeType
enum (text/plain | application/json) "text/plain" MIME type for generated text candidates.
Google Gemini 2.5 Pro Subscription 8 params
Parameter Type Default Description Condition
Length · 1 param
Max output tokens
generationConfig.maxOutputTokens
integer (1…65536) Maximum number of tokens to include in a response candidate.
Sampling · 4 params
Temperature
generationConfig.temperature
number (0…2 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
generationConfig.topP
number (0…1 step 0.01) 0.95 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Top K
generationConfig.topK
integer (0…+∞) 64 Limits token sampling to the top K most likely next tokens.
Seed
generationConfig.seed
integer Optional seed used for decoding when reproducible sampling is desired.
Reasoning · 2 params
Thinking budget
generationConfig.thinkingConfig.thinkingBudget
integer (128…32768) Maximum number of thinking tokens Gemini should use before producing the final answer.
Include thoughts
generationConfig.thinkingConfig.includeThoughts
boolean false Controls whether Gemini returns available thought summaries in the response parts.
Output · 1 param
Response MIME type
generationConfig.responseMimeType
enum (text/plain | application/json) "text/plain" MIME type for generated text candidates.
Google Gemini 3 Flash Preview Subscription 8 params
Parameter Type Default Description Condition
Length · 1 param
Max output tokens
generationConfig.maxOutputTokens
integer (1…65536) Maximum number of tokens to include in a response candidate.
Sampling · 4 params
Temperature
generationConfig.temperature
number (0…2 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
generationConfig.topP
number (0…1 step 0.01) 0.95 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Top K
generationConfig.topK
integer (0…+∞) 64 Limits token sampling to the top K most likely next tokens.
Seed
generationConfig.seed
integer Optional seed used for decoding when reproducible sampling is desired.
Reasoning · 2 params
Thinking level
generationConfig.thinkingConfig.thinkingLevel
enum (minimal | low | medium | high) "high" Controls Gemini 3 Flash reasoning effort.
Include thoughts
generationConfig.thinkingConfig.includeThoughts
boolean false Controls whether Gemini returns available thought summaries in the response parts.
Output · 1 param
Response MIME type
generationConfig.responseMimeType
enum (text/plain | application/json) "text/plain" MIME type for generated text candidates.
Google Gemini 3.1 Flash Lite Preview Subscription 8 params
Parameter Type Default Description Condition
Length · 1 param
Max output tokens
generationConfig.maxOutputTokens
integer (1…65536) Maximum number of tokens to include in a response candidate.
Sampling · 4 params
Temperature
generationConfig.temperature
number (0…2 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
generationConfig.topP
number (0…1 step 0.01) 0.95 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Top K
generationConfig.topK
integer (0…+∞) 64 Limits token sampling to the top K most likely next tokens.
Seed
generationConfig.seed
integer Optional seed used for decoding when reproducible sampling is desired.
Reasoning · 2 params
Thinking level
generationConfig.thinkingConfig.thinkingLevel
enum (minimal | low | medium | high) "high" Controls Gemini 3.1 Flash-Lite reasoning effort.
Include thoughts
generationConfig.thinkingConfig.includeThoughts
boolean false Controls whether Gemini returns available thought summaries in the response parts.
Output · 1 param
Response MIME type
generationConfig.responseMimeType
enum (text/plain | application/json) "text/plain" MIME type for generated text candidates.
Google Gemini 3.1 Flash Lite Subscription 8 params
Parameter Type Default Description Condition
Length · 1 param
Max output tokens
generationConfig.maxOutputTokens
integer (1…65536) Maximum number of tokens to include in a response candidate.
Sampling · 4 params
Temperature
generationConfig.temperature
number (0…2 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
generationConfig.topP
number (0…1 step 0.01) 0.95 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Top K
generationConfig.topK
integer (0…+∞) 64 Limits token sampling to the top K most likely next tokens.
Seed
generationConfig.seed
integer Optional seed used for decoding when reproducible sampling is desired.
Reasoning · 2 params
Thinking level
generationConfig.thinkingConfig.thinkingLevel
enum (minimal | low | medium | high) "high" Controls Gemini 3.1 Flash-Lite reasoning effort.
Include thoughts
generationConfig.thinkingConfig.includeThoughts
boolean false Controls whether Gemini returns available thought summaries in the response parts.
Output · 1 param
Response MIME type
generationConfig.responseMimeType
enum (text/plain | application/json) "text/plain" MIME type for generated text candidates.
Google Gemini 3.1 Pro Preview Subscription 8 params
Parameter Type Default Description Condition
Length · 1 param
Max output tokens
generationConfig.maxOutputTokens
integer (1…65536) Maximum number of tokens to include in a response candidate.
Sampling · 4 params
Temperature
generationConfig.temperature
number (0…2 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
generationConfig.topP
number (0…1 step 0.01) 0.95 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Top K
generationConfig.topK
integer (0…+∞) 64 Limits token sampling to the top K most likely next tokens.
Seed
generationConfig.seed
integer Optional seed used for decoding when reproducible sampling is desired.
Reasoning · 2 params
Thinking level
generationConfig.thinkingConfig.thinkingLevel
enum (low | high) "high" Controls Gemini 3 Pro reasoning effort.
Include thoughts
generationConfig.thinkingConfig.includeThoughts
boolean false Controls whether Gemini returns available thought summaries in the response parts.
Output · 1 param
Response MIME type
generationConfig.responseMimeType
enum (text/plain | application/json) "text/plain" MIME type for generated text candidates.
Google Gemini 3.5 Flash 8 params
Parameter Type Default Description Condition
Length · 1 param
Max output tokens
generationConfig.maxOutputTokens
integer (1…65536) Maximum number of tokens to include in a response candidate.
Sampling · 4 params
Temperature
generationConfig.temperature
number (0…2 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
generationConfig.topP
number (0…1 step 0.01) 0.95 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Top K
generationConfig.topK
integer (0…+∞) 64 Limits token sampling to the top K most likely next tokens.
Seed
generationConfig.seed
integer Optional seed used for decoding when reproducible sampling is desired.
Reasoning · 2 params
Thinking level
generationConfig.thinkingConfig.thinkingLevel
enum (minimal | low | medium | high) "medium" Controls Gemini 3.5 Flash reasoning effort.
Include thoughts
generationConfig.thinkingConfig.includeThoughts
boolean false Controls whether Gemini returns available thought summaries in the response parts.
Output · 1 param
Response MIME type
generationConfig.responseMimeType
enum (text/plain | application/json) "text/plain" MIME type for generated text candidates.

Alibaba 8

Alibaba Qwen Flash 5 params
Parameter Type Default Description Condition
Length · 1 param
Max tokens
max_tokens
integer (1…+∞) Maximum number of output tokens the model may generate.
Sampling · 3 params
Temperature
temperature
number (0…2 step 0.1) Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
top_p
number (0…1 step 0.01) Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Top K
extra_body.top_k
integer (1…+∞) 20 Limits generation to the selected number of highest-probability tokens.
Reasoning · 1 param
Enable thinking
extra_body.chat_template_kwargs.enable_thinking
boolean true Controls Qwen3 thinking mode when using OpenAI-compatible clients that pass provider-specific extra body fields.
Alibaba Qwen Plus 5 params
Parameter Type Default Description Condition
Length · 1 param
Max tokens
max_tokens
integer (1…+∞) Maximum number of output tokens the model may generate.
Sampling · 3 params
Temperature
temperature
number (0…2 step 0.1) Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
top_p
number (0…1 step 0.01) Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Top K
extra_body.top_k
integer (1…+∞) 20 Limits generation to the selected number of highest-probability tokens.
Reasoning · 1 param
Enable thinking
extra_body.chat_template_kwargs.enable_thinking
boolean true Controls Qwen3 thinking mode when using OpenAI-compatible clients that pass provider-specific extra body fields.
Alibaba Qwen3 Coder Flash 4 params
Parameter Type Default Description Condition
Length · 1 param
Max tokens
max_tokens
integer (1…+∞) Maximum number of output tokens the model may generate.
Sampling · 3 params
Temperature
temperature
number (0…2 step 0.1) Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
top_p
number (0…1 step 0.01) Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Top K
extra_body.top_k
integer (1…+∞) 20 Limits generation to the selected number of highest-probability tokens.
Alibaba Qwen3 Coder Plus 4 params
Parameter Type Default Description Condition
Length · 1 param
Max tokens
max_tokens
integer (1…+∞) Maximum number of output tokens the model may generate.
Sampling · 3 params
Temperature
temperature
number (0…2 step 0.1) Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
top_p
number (0…1 step 0.01) Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Top K
extra_body.top_k
integer (1…+∞) 20 Limits generation to the selected number of highest-probability tokens.
Alibaba Qwen3 Max 5 params
Parameter Type Default Description Condition
Length · 1 param
Max tokens
max_tokens
integer (1…+∞) Maximum number of output tokens the model may generate.
Sampling · 3 params
Temperature
temperature
number (0…2 step 0.1) Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
top_p
number (0…1 step 0.01) Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Top K
extra_body.top_k
integer (1…+∞) 20 Limits generation to the selected number of highest-probability tokens.
Reasoning · 1 param
Enable thinking
extra_body.chat_template_kwargs.enable_thinking
boolean false Controls Qwen3 thinking mode when using OpenAI-compatible clients that pass provider-specific extra body fields.
Alibaba Qwen3.5 5 params
Parameter Type Default Description Condition
Length · 1 param
Max tokens
max_tokens
integer (1…+∞) Maximum number of output tokens the model may generate.
Sampling · 3 params
Temperature
temperature
number (0…2 step 0.1) Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
top_p
number (0…1 step 0.01) Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Top K
extra_body.top_k
integer (1…+∞) 20 Limits generation to the selected number of highest-probability tokens.
Reasoning · 1 param
Enable thinking
extra_body.chat_template_kwargs.enable_thinking
boolean true Controls Qwen3 thinking mode when using OpenAI-compatible clients that pass provider-specific extra body fields.
Alibaba Qwen3.5 Flash 5 params
Parameter Type Default Description Condition
Length · 1 param
Max tokens
max_tokens
integer (1…+∞) Maximum number of output tokens the model may generate.
Sampling · 3 params
Temperature
temperature
number (0…2 step 0.1) Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
top_p
number (0…1 step 0.01) Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Top K
extra_body.top_k
integer (1…+∞) 20 Limits generation to the selected number of highest-probability tokens.
Reasoning · 1 param
Enable thinking
extra_body.chat_template_kwargs.enable_thinking
boolean true Controls Qwen3 thinking mode when using OpenAI-compatible clients that pass provider-specific extra body fields.
Alibaba Qwq Plus 4 params
Parameter Type Default Description Condition
Length · 1 param
Max tokens
max_tokens
integer (1…+∞) Maximum number of output tokens the model may generate.
Sampling · 3 params
Temperature
temperature
number (0…2 step 0.1) Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
top_p
number (0…1 step 0.01) Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Top K
extra_body.top_k
integer (1…+∞) 20 Limits generation to the selected number of highest-probability tokens.

Cohere 8

Cohere Command A 03 2025 12 params
Parameter Type Default Description Condition
Length · 2 params
Max tokens
max_tokens
integer (1…+∞) Maximum number of output tokens the model may generate.
Stop sequences
stop_sequences
string Stops generation when one of these sequences is detected; up to five are allowed.
Sampling · 6 params
Temperature
temperature
number (0…+∞ step 0.1) 0.3 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
p
number (0.01…0.99 step 0.01) 0.75 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Top K
k
integer (0…500) 0 Limits sampling to the K most likely tokens; 0 disables top-k sampling.
Frequency penalty
frequency_penalty
number (0…1 step 0.1) 0 Penalizes tokens proportional to how often they have already appeared to reduce repetition.
Presence penalty
presence_penalty
number (0…1 step 0.1) 0 Penalizes tokens that have already appeared to encourage a wider variety of content.
Seed
seed
integer Seed used for best-effort deterministic sampling when reproducible outputs are desired.
Tools · 1 param
Tool choice
tool_choice
enum (REQUIRED | NONE) Forces the model to either call a tool or skip tool calls for this request.
Output · 1 param
Response format
response_format.type
enum (text | json_object) "text" Controls whether the model returns normal text or JSON object output.
Observability · 1 param
Log probabilities
logprobs
boolean false Controls whether the response includes log probabilities for the generated tokens.
Metadata · 1 param
Safety mode
safety_mode
enum (CONTEXTUAL | STRICT) "CONTEXTUAL" Controls Cohere's built-in safety instructions applied to the generation.
Cohere Command A Plus 05 2026 12 params
Parameter Type Default Description Condition
Length · 2 params
Max tokens
max_tokens
integer (1…+∞) Maximum number of output tokens the model may generate.
Stop sequences
stop_sequences
string Stops generation when one of these sequences is detected; up to five are allowed.
Sampling · 6 params
Temperature
temperature
number (0…+∞ step 0.1) 0.3 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
p
number (0.01…0.99 step 0.01) 0.75 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Top K
k
integer (0…500) 0 Limits sampling to the K most likely tokens; 0 disables top-k sampling.
Frequency penalty
frequency_penalty
number (0…1 step 0.1) 0 Penalizes tokens proportional to how often they have already appeared to reduce repetition.
Presence penalty
presence_penalty
number (0…1 step 0.1) 0 Penalizes tokens that have already appeared to encourage a wider variety of content.
Seed
seed
integer Seed used for best-effort deterministic sampling when reproducible outputs are desired.
Tools · 1 param
Tool choice
tool_choice
enum (REQUIRED | NONE) Forces the model to either call a tool or skip tool calls for this request.
Output · 1 param
Response format
response_format.type
enum (text | json_object) "text" Controls whether the model returns normal text or JSON object output.
Observability · 1 param
Log probabilities
logprobs
boolean false Controls whether the response includes log probabilities for the generated tokens.
Metadata · 1 param
Safety mode
safety_mode
enum (CONTEXTUAL | STRICT) "CONTEXTUAL" Controls Cohere's built-in safety instructions applied to the generation.
Cohere Command A Reasoning 08 2025 14 params
Parameter Type Default Description Condition
Length · 2 params
Max tokens
max_tokens
integer (1…+∞) Maximum number of output tokens the model may generate.
Stop sequences
stop_sequences
string Stops generation when one of these sequences is detected; up to five are allowed.
Sampling · 6 params
Temperature
temperature
number (0…+∞ step 0.1) 0.3 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
p
number (0.01…0.99 step 0.01) 0.75 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Top K
k
integer (0…500) 0 Limits sampling to the K most likely tokens; 0 disables top-k sampling.
Frequency penalty
frequency_penalty
number (0…1 step 0.1) 0 Penalizes tokens proportional to how often they have already appeared to reduce repetition.
Presence penalty
presence_penalty
number (0…1 step 0.1) 0 Penalizes tokens that have already appeared to encourage a wider variety of content.
Seed
seed
integer Seed used for best-effort deterministic sampling when reproducible outputs are desired.
Reasoning · 2 params
Thinking mode
thinking.type
enum (enabled | disabled) "disabled" Controls whether the model reasons step by step before producing its final answer.
Thinking token budget
thinking.token_budget
integer (1…+∞) Maximum number of tokens the model may spend on reasoning before answering.
Only when thinking.type = "enabled"
Tools · 1 param
Tool choice
tool_choice
enum (REQUIRED | NONE) Forces the model to either call a tool or skip tool calls for this request.
Output · 1 param
Response format
response_format.type
enum (text | json_object) "text" Controls whether the model returns normal text or JSON object output.
Observability · 1 param
Log probabilities
logprobs
boolean false Controls whether the response includes log probabilities for the generated tokens.
Metadata · 1 param
Safety mode
safety_mode
enum (CONTEXTUAL | STRICT) "CONTEXTUAL" Controls Cohere's built-in safety instructions applied to the generation.
Cohere Command A Translate 08 2025 12 params
Parameter Type Default Description Condition
Length · 2 params
Max tokens
max_tokens
integer (1…+∞) Maximum number of output tokens the model may generate.
Stop sequences
stop_sequences
string Stops generation when one of these sequences is detected; up to five are allowed.
Sampling · 6 params
Temperature
temperature
number (0…+∞ step 0.1) 0.3 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
p
number (0.01…0.99 step 0.01) 0.75 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Top K
k
integer (0…500) 0 Limits sampling to the K most likely tokens; 0 disables top-k sampling.
Frequency penalty
frequency_penalty
number (0…1 step 0.1) 0 Penalizes tokens proportional to how often they have already appeared to reduce repetition.
Presence penalty
presence_penalty
number (0…1 step 0.1) 0 Penalizes tokens that have already appeared to encourage a wider variety of content.
Seed
seed
integer Seed used for best-effort deterministic sampling when reproducible outputs are desired.
Tools · 1 param
Tool choice
tool_choice
enum (REQUIRED | NONE) Forces the model to either call a tool or skip tool calls for this request.
Output · 1 param
Response format
response_format.type
enum (text | json_object) "text" Controls whether the model returns normal text or JSON object output.
Observability · 1 param
Log probabilities
logprobs
boolean false Controls whether the response includes log probabilities for the generated tokens.
Metadata · 1 param
Safety mode
safety_mode
enum (CONTEXTUAL | STRICT) "CONTEXTUAL" Controls Cohere's built-in safety instructions applied to the generation.
Cohere Command A Vision 07 2025 12 params
Parameter Type Default Description Condition
Length · 2 params
Max tokens
max_tokens
integer (1…+∞) Maximum number of output tokens the model may generate.
Stop sequences
stop_sequences
string Stops generation when one of these sequences is detected; up to five are allowed.
Sampling · 6 params
Temperature
temperature
number (0…+∞ step 0.1) 0.3 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
p
number (0.01…0.99 step 0.01) 0.75 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Top K
k
integer (0…500) 0 Limits sampling to the K most likely tokens; 0 disables top-k sampling.
Frequency penalty
frequency_penalty
number (0…1 step 0.1) 0 Penalizes tokens proportional to how often they have already appeared to reduce repetition.
Presence penalty
presence_penalty
number (0…1 step 0.1) 0 Penalizes tokens that have already appeared to encourage a wider variety of content.
Seed
seed
integer Seed used for best-effort deterministic sampling when reproducible outputs are desired.
Tools · 1 param
Tool choice
tool_choice
enum (REQUIRED | NONE) Forces the model to either call a tool or skip tool calls for this request.
Output · 1 param
Response format
response_format.type
enum (text | json_object) "text" Controls whether the model returns normal text or JSON object output.
Observability · 1 param
Log probabilities
logprobs
boolean false Controls whether the response includes log probabilities for the generated tokens.
Metadata · 1 param
Safety mode
safety_mode
enum (CONTEXTUAL | STRICT) "CONTEXTUAL" Controls Cohere's built-in safety instructions applied to the generation.
Cohere Command R 08 2024 11 params
Parameter Type Default Description Condition
Length · 2 params
Max tokens
max_tokens
integer (1…+∞) Maximum number of output tokens the model may generate.
Stop sequences
stop_sequences
string Stops generation when one of these sequences is detected; up to five are allowed.
Sampling · 6 params
Temperature
temperature
number (0…+∞ step 0.1) 0.3 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
p
number (0.01…0.99 step 0.01) 0.75 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Top K
k
integer (0…500) 0 Limits sampling to the K most likely tokens; 0 disables top-k sampling.
Frequency penalty
frequency_penalty
number (0…1 step 0.1) 0 Penalizes tokens proportional to how often they have already appeared to reduce repetition.
Presence penalty
presence_penalty
number (0…1 step 0.1) 0 Penalizes tokens that have already appeared to encourage a wider variety of content.
Seed
seed
integer Seed used for best-effort deterministic sampling when reproducible outputs are desired.
Output · 1 param
Response format
response_format.type
enum (text | json_object) "text" Controls whether the model returns normal text or JSON object output.
Observability · 1 param
Log probabilities
logprobs
boolean false Controls whether the response includes log probabilities for the generated tokens.
Metadata · 1 param
Safety mode
safety_mode
enum (CONTEXTUAL | STRICT | OFF) "CONTEXTUAL" Controls Cohere's built-in safety instructions applied to the generation.
Cohere Command R Plus 08 2024 11 params
Parameter Type Default Description Condition
Length · 2 params
Max tokens
max_tokens
integer (1…+∞) Maximum number of output tokens the model may generate.
Stop sequences
stop_sequences
string Stops generation when one of these sequences is detected; up to five are allowed.
Sampling · 6 params
Temperature
temperature
number (0…+∞ step 0.1) 0.3 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
p
number (0.01…0.99 step 0.01) 0.75 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Top K
k
integer (0…500) 0 Limits sampling to the K most likely tokens; 0 disables top-k sampling.
Frequency penalty
frequency_penalty
number (0…1 step 0.1) 0 Penalizes tokens proportional to how often they have already appeared to reduce repetition.
Presence penalty
presence_penalty
number (0…1 step 0.1) 0 Penalizes tokens that have already appeared to encourage a wider variety of content.
Seed
seed
integer Seed used for best-effort deterministic sampling when reproducible outputs are desired.
Output · 1 param
Response format
response_format.type
enum (text | json_object) "text" Controls whether the model returns normal text or JSON object output.
Observability · 1 param
Log probabilities
logprobs
boolean false Controls whether the response includes log probabilities for the generated tokens.
Metadata · 1 param
Safety mode
safety_mode
enum (CONTEXTUAL | STRICT | OFF) "CONTEXTUAL" Controls Cohere's built-in safety instructions applied to the generation.
Cohere Command R7b 12 2024 12 params
Parameter Type Default Description Condition
Length · 2 params
Max tokens
max_tokens
integer (1…+∞) Maximum number of output tokens the model may generate.
Stop sequences
stop_sequences
string Stops generation when one of these sequences is detected; up to five are allowed.
Sampling · 6 params
Temperature
temperature
number (0…+∞ step 0.1) 0.3 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
p
number (0.01…0.99 step 0.01) 0.75 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Top K
k
integer (0…500) 0 Limits sampling to the K most likely tokens; 0 disables top-k sampling.
Frequency penalty
frequency_penalty
number (0…1 step 0.1) 0 Penalizes tokens proportional to how often they have already appeared to reduce repetition.
Presence penalty
presence_penalty
number (0…1 step 0.1) 0 Penalizes tokens that have already appeared to encourage a wider variety of content.
Seed
seed
integer Seed used for best-effort deterministic sampling when reproducible outputs are desired.
Tools · 1 param
Tool choice
tool_choice
enum (REQUIRED | NONE) Forces the model to either call a tool or skip tool calls for this request.
Output · 1 param
Response format
response_format.type
enum (text | json_object) "text" Controls whether the model returns normal text or JSON object output.
Observability · 1 param
Log probabilities
logprobs
boolean false Controls whether the response includes log probabilities for the generated tokens.
Metadata · 1 param
Safety mode
safety_mode
enum (CONTEXTUAL | STRICT) "CONTEXTUAL" Controls Cohere's built-in safety instructions applied to the generation.

Moonshot AI 5

Moonshot AI Kimi K2.5 3 params
Parameter Type Default Description Condition
Length · 1 param
Max completion tokens
max_completion_tokens
integer (1…+∞) Maximum number of tokens to generate in the chat completion.
Reasoning · 1 param
Thinking mode
thinking.type
enum (enabled | disabled) Controls whether Kimi reasons step by step before answering, or responds directly when set to disabled.
Output · 1 param
Response format
response_format.type
enum (text | json_object) "text" Forces the response into plain text or a JSON object.
Moonshot AI Kimi K2.6 3 params
Parameter Type Default Description Condition
Length · 1 param
Max completion tokens
max_completion_tokens
integer (1…+∞) Maximum number of tokens to generate in the chat completion.
Reasoning · 1 param
Thinking mode
thinking.type
enum (enabled | disabled) "enabled" Controls whether Kimi reasons step by step before answering. Thinking is enabled by default; set disabled to respond directly.
Output · 1 param
Response format
response_format.type
enum (text | json_object) "text" Forces the response into plain text or a JSON object.
Moonshot AI Moonshot v1 128K 7 params
Parameter Type Default Description Condition
Length · 2 params
Max completion tokens
max_completion_tokens
integer (1…+∞) Maximum number of tokens to generate in the chat completion.
Number of completions
n
integer (1…5) 1 How many chat completion choices to generate for the request.
Sampling · 4 params
Temperature
temperature
number (0…1 step 0.1) 0.3 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Presence penalty
presence_penalty
number (-2…2 step 0.1) 0 Penalizes tokens that have already appeared, encouraging the model to talk about new topics.
Frequency penalty
frequency_penalty
number (-2…2 step 0.1) 0 Penalizes tokens by how often they have appeared, reducing verbatim repetition.
Output · 1 param
Response format
response_format.type
enum (text | json_object) "text" Forces the response into plain text or a JSON object.
Moonshot AI Moonshot v1 32K 7 params
Parameter Type Default Description Condition
Length · 2 params
Max completion tokens
max_completion_tokens
integer (1…+∞) Maximum number of tokens to generate in the chat completion.
Number of completions
n
integer (1…5) 1 How many chat completion choices to generate for the request.
Sampling · 4 params
Temperature
temperature
number (0…1 step 0.1) 0.3 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Presence penalty
presence_penalty
number (-2…2 step 0.1) 0 Penalizes tokens that have already appeared, encouraging the model to talk about new topics.
Frequency penalty
frequency_penalty
number (-2…2 step 0.1) 0 Penalizes tokens by how often they have appeared, reducing verbatim repetition.
Output · 1 param
Response format
response_format.type
enum (text | json_object) "text" Forces the response into plain text or a JSON object.
Moonshot AI Moonshot v1 8K 7 params
Parameter Type Default Description Condition
Length · 2 params
Max completion tokens
max_completion_tokens
integer (1…+∞) Maximum number of tokens to generate in the chat completion.
Number of completions
n
integer (1…5) 1 How many chat completion choices to generate for the request.
Sampling · 4 params
Temperature
temperature
number (0…1 step 0.1) 0.3 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Presence penalty
presence_penalty
number (-2…2 step 0.1) 0 Penalizes tokens that have already appeared, encouraging the model to talk about new topics.
Frequency penalty
frequency_penalty
number (-2…2 step 0.1) 0 Penalizes tokens by how often they have appeared, reducing verbatim repetition.
Output · 1 param
Response format
response_format.type
enum (text | json_object) "text" Forces the response into plain text or a JSON object.

xAI 5

xAI Grok 4.20 0309 Non Reasoning 6 params
Parameter Type Default Description Condition
Length · 2 params
Max completion tokens
max_completion_tokens
integer (1…+∞) Upper bound for visible output tokens generated in the chat completion.
Stop sequence
stop
string Stops generation when this sequence is produced. xAI accepts up to four stop sequences.
Sampling · 3 params
Temperature
temperature
number (0…2 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Seed
seed
integer Optional seed used for decoding when reproducible sampling is desired.
Output · 1 param
Response format
response_format.type
enum (text | json_object | json_schema) "text" Controls whether the model returns text, JSON mode output, or structured JSON schema output.
xAI Grok 4.20 0309 Reasoning 5 params
Parameter Type Default Description Condition
Length · 1 param
Max completion tokens
max_completion_tokens
integer (1…+∞) Upper bound for visible output tokens generated in the chat completion.
Sampling · 3 params
Temperature
temperature
number (0…2 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Seed
seed
integer Optional seed used for decoding when reproducible sampling is desired.
Output · 1 param
Response format
response_format.type
enum (text | json_object | json_schema) "text" Controls whether the model returns text, JSON mode output, or structured JSON schema output.
xAI Grok 4.20 Multi Agent 0309 5 params
Parameter Type Default Description Condition
Length · 1 param
Max output tokens
max_output_tokens
integer (1…+∞) Upper bound for output tokens generated in the Responses API response.
Sampling · 2 params
Temperature
temperature
number (0…2 step 0.1) 0.7 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
top_p
number (0…1 step 0.01) 0.95 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Reasoning · 1 param
Reasoning effort
reasoning.effort
enum (low | medium | high | xhigh) Controls whether the Responses API request uses the 4-agent or 16-agent multi-agent setup.
Output · 1 param
Text format
text.format.type
enum (text | json_object | json_schema) "text" Controls whether the Responses API returns free-form text, JSON mode output, or structured JSON schema output.
xAI Grok 4.3 6 params
Parameter Type Default Description Condition
Length · 1 param
Max completion tokens
max_completion_tokens
integer (1…+∞) Upper bound for visible output tokens generated in the chat completion.
Sampling · 3 params
Temperature
temperature
number (0…2 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Seed
seed
integer Optional seed used for decoding when reproducible sampling is desired.
Reasoning · 1 param
Reasoning effort
reasoning_effort
enum (none | low | medium | high) "low" Controls how much reasoning Grok performs before responding. Set to none for non-reasoning requests.
Output · 1 param
Response format
response_format.type
enum (text | json_object | json_schema) "text" Controls whether the model returns text, JSON mode output, or structured JSON schema output.
xAI Grok Build 0.1 5 params
Parameter Type Default Description Condition
Length · 1 param
Max completion tokens
max_completion_tokens
integer (1…+∞) Upper bound for visible output tokens generated in the chat completion.
Sampling · 3 params
Temperature
temperature
number (0…2 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Seed
seed
integer Optional seed used for decoding when reproducible sampling is desired.
Output · 1 param
Response format
response_format.type
enum (text | json_object | json_schema) "text" Controls whether the model returns text, JSON mode output, or structured JSON schema output.

DeepSeek 4

DeepSeek Deepseek Chat 4 params
Parameter Type Default Description Condition
Length · 1 param
Max tokens
max_tokens
integer (1…+∞) 4096 Maximum number of output tokens the model may generate.
Sampling · 2 params
Temperature
temperature
number (0…2 step 0.1) 1 Controls randomness. In DeepSeek thinking mode this parameter is accepted for compatibility but has no effect.
Not when thinking.type = "enabled"
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling. In DeepSeek thinking mode this parameter is accepted for compatibility but has no effect.
Not when thinking.type = "enabled"
Reasoning · 1 param
Thinking mode
thinking.type
enum (disabled | enabled) "disabled" Controls whether DeepSeek uses thinking mode before producing the final answer.
DeepSeek Deepseek Reasoner 5 params
Parameter Type Default Description Condition
Length · 1 param
Max tokens
max_tokens
integer (1…+∞) 4096 Maximum number of output tokens the model may generate.
Sampling · 2 params
Temperature
temperature
number (0…2 step 0.1) 1 Controls randomness. In DeepSeek thinking mode this parameter is accepted for compatibility but has no effect.
Not when thinking.type = "enabled"
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling. In DeepSeek thinking mode this parameter is accepted for compatibility but has no effect.
Not when thinking.type = "enabled"
Reasoning · 2 params
Thinking mode
thinking.type
enum (enabled | disabled) "enabled" Controls whether DeepSeek uses thinking mode before producing the final answer.
Reasoning effort
reasoning_effort
enum (high | max) "high" Controls DeepSeek thinking effort when thinking mode is enabled.
Only when thinking.type = "enabled"
DeepSeek Deepseek V4 Flash 5 params
Parameter Type Default Description Condition
Length · 1 param
Max tokens
max_tokens
integer (1…+∞) 4096 Maximum number of output tokens the model may generate.
Sampling · 2 params
Temperature
temperature
number (0…2 step 0.1) 1 Controls randomness. In DeepSeek thinking mode this parameter is accepted for compatibility but has no effect.
Not when thinking.type = "enabled"
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling. In DeepSeek thinking mode this parameter is accepted for compatibility but has no effect.
Not when thinking.type = "enabled"
Reasoning · 2 params
Thinking mode
thinking.type
enum (enabled | disabled) "enabled" Controls whether DeepSeek uses thinking mode before producing the final answer.
Reasoning effort
reasoning_effort
enum (high | max) "high" Controls DeepSeek thinking effort when thinking mode is enabled.
Only when thinking.type = "enabled"
DeepSeek Deepseek V4 Pro 5 params
Parameter Type Default Description Condition
Length · 1 param
Max tokens
max_tokens
integer (1…+∞) 4096 Maximum number of output tokens the model may generate.
Sampling · 2 params
Temperature
temperature
number (0…2 step 0.1) 1 Controls randomness. In DeepSeek thinking mode this parameter is accepted for compatibility but has no effect.
Not when thinking.type = "enabled"
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling. In DeepSeek thinking mode this parameter is accepted for compatibility but has no effect.
Not when thinking.type = "enabled"
Reasoning · 2 params
Thinking mode
thinking.type
enum (enabled | disabled) "enabled" Controls whether DeepSeek uses thinking mode before producing the final answer.
Reasoning effort
reasoning_effort
enum (high | max) "high" Controls DeepSeek thinking effort when thinking mode is enabled.
Only when thinking.type = "enabled"

Meta 4

Meta Llama 3.3 70B Instruct 7 params
Parameter Type Default Description Condition
Length · 1 param
Max completion tokens
max_completion_tokens
integer (1…+∞) Maximum number of output tokens the model may generate.
Sampling · 4 params
Temperature
temperature
number Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
top_p
number Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Top K
top_k
integer Limits generation to the selected number of highest-probability tokens.
Repetition penalty
repetition_penalty
number Penalizes tokens that have already appeared to reduce repetition in the output.
Tools · 1 param
Tool choice
tool_choice
enum (auto | none | required) Controls whether the model may call tools, must call one, or skips tool calls.
Output · 1 param
Response format
response_format.type
enum (text | json_schema) "text" Controls whether the model returns normal text or a schema-constrained JSON object.
Meta Llama 3.3 8B Instruct 7 params
Parameter Type Default Description Condition
Length · 1 param
Max completion tokens
max_completion_tokens
integer (1…+∞) Maximum number of output tokens the model may generate.
Sampling · 4 params
Temperature
temperature
number Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
top_p
number Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Top K
top_k
integer Limits generation to the selected number of highest-probability tokens.
Repetition penalty
repetition_penalty
number Penalizes tokens that have already appeared to reduce repetition in the output.
Tools · 1 param
Tool choice
tool_choice
enum (auto | none | required) Controls whether the model may call tools, must call one, or skips tool calls.
Output · 1 param
Response format
response_format.type
enum (text | json_schema) "text" Controls whether the model returns normal text or a schema-constrained JSON object.
Meta Llama 4 Maverick 17B 128E Instruct FP8 7 params
Parameter Type Default Description Condition
Length · 1 param
Max completion tokens
max_completion_tokens
integer (1…+∞) Maximum number of output tokens the model may generate.
Sampling · 4 params
Temperature
temperature
number Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
top_p
number Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Top K
top_k
integer Limits generation to the selected number of highest-probability tokens.
Repetition penalty
repetition_penalty
number Penalizes tokens that have already appeared to reduce repetition in the output.
Tools · 1 param
Tool choice
tool_choice
enum (auto | none | required) Controls whether the model may call tools, must call one, or skips tool calls.
Output · 1 param
Response format
response_format.type
enum (text | json_schema) "text" Controls whether the model returns normal text or a schema-constrained JSON object.
Meta Llama 4 Scout 17B 16E Instruct FP8 7 params
Parameter Type Default Description Condition
Length · 1 param
Max completion tokens
max_completion_tokens
integer (1…+∞) Maximum number of output tokens the model may generate.
Sampling · 4 params
Temperature
temperature
number Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
top_p
number Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Top K
top_k
integer Limits generation to the selected number of highest-probability tokens.
Repetition penalty
repetition_penalty
number Penalizes tokens that have already appeared to reduce repetition in the output.
Tools · 1 param
Tool choice
tool_choice
enum (auto | none | required) Controls whether the model may call tools, must call one, or skips tool calls.
Output · 1 param
Response format
response_format.type
enum (text | json_schema) "text" Controls whether the model returns normal text or a schema-constrained JSON object.

Perplexity 4

Perplexity Sonar 12 params
Parameter Type Default Description Condition
Length · 1 param
Max tokens
max_tokens
integer (1…128000) Maximum number of output tokens the model may generate.
Sampling · 2 params
Temperature
temperature
number (0…2 step 0.1) Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
top_p
number (0…1 step 0.01) Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Metadata · 9 params
Search mode
search_mode
enum (web | academic | sec) Selects the corpus the model searches when grounding its answer.
Search recency filter
search_recency_filter
enum (hour | day | week | month | year) Restricts web search results to a recent time window.
Search domain filter
search_domain_filter
string Limits search to, or excludes, specific domains.
Search after date
search_after_date_filter
string Restricts search results to content published after this date (MM/DD/YYYY).
Search before date
search_before_date_filter
string Restricts search results to content published before this date (MM/DD/YYYY).
Search context size
web_search_options.search_context_size
enum (low | medium | high) "low" Controls how much web search context is retrieved before generating the answer.
Return images
return_images
boolean false Controls whether the response may include related images from the search.
Return related questions
return_related_questions
boolean false Controls whether the response includes suggested follow-up questions.
Disable search
disable_search
boolean false Turns off web search so the model answers from its own knowledge only.
Perplexity Sonar Deep Research 12 params
Parameter Type Default Description Condition
Length · 1 param
Max tokens
max_tokens
integer (1…128000) Maximum number of output tokens the model may generate.
Sampling · 2 params
Temperature
temperature
number (0…2 step 0.1) Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
top_p
number (0…1 step 0.01) Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Reasoning · 1 param
Reasoning effort
reasoning_effort
enum (minimal | low | medium | high) Controls how much reasoning and searching the model performs before producing the report.
Metadata · 8 params
Search mode
search_mode
enum (web | academic | sec) Selects the corpus the model searches when grounding its answer.
Search recency filter
search_recency_filter
enum (hour | day | week | month | year) Restricts web search results to a recent time window.
Search domain filter
search_domain_filter
string Limits search to, or excludes, specific domains.
Search after date
search_after_date_filter
string Restricts search results to content published after this date (MM/DD/YYYY).
Search before date
search_before_date_filter
string Restricts search results to content published before this date (MM/DD/YYYY).
Search context size
web_search_options.search_context_size
enum (low | medium | high) "low" Controls how much web search context is retrieved before generating the answer.
Return images
return_images
boolean false Controls whether the response may include related images from the search.
Return related questions
return_related_questions
boolean false Controls whether the response includes suggested follow-up questions.
Perplexity Sonar Pro 12 params
Parameter Type Default Description Condition
Length · 1 param
Max tokens
max_tokens
integer (1…128000) Maximum number of output tokens the model may generate.
Sampling · 2 params
Temperature
temperature
number (0…2 step 0.1) Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
top_p
number (0…1 step 0.01) Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Metadata · 9 params
Search mode
search_mode
enum (web | academic | sec) Selects the corpus the model searches when grounding its answer.
Search recency filter
search_recency_filter
enum (hour | day | week | month | year) Restricts web search results to a recent time window.
Search domain filter
search_domain_filter
string Limits search to, or excludes, specific domains.
Search after date
search_after_date_filter
string Restricts search results to content published after this date (MM/DD/YYYY).
Search before date
search_before_date_filter
string Restricts search results to content published before this date (MM/DD/YYYY).
Search context size
web_search_options.search_context_size
enum (low | medium | high) "low" Controls how much web search context is retrieved before generating the answer.
Return images
return_images
boolean false Controls whether the response may include related images from the search.
Return related questions
return_related_questions
boolean false Controls whether the response includes suggested follow-up questions.
Disable search
disable_search
boolean false Turns off web search so the model answers from its own knowledge only.
Perplexity Sonar Reasoning Pro 12 params
Parameter Type Default Description Condition
Length · 1 param
Max tokens
max_tokens
integer (1…128000) Maximum number of output tokens the model may generate.
Sampling · 2 params
Temperature
temperature
number (0…2 step 0.1) Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
top_p
number (0…1 step 0.01) Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Metadata · 9 params
Search mode
search_mode
enum (web | academic | sec) Selects the corpus the model searches when grounding its answer.
Search recency filter
search_recency_filter
enum (hour | day | week | month | year) Restricts web search results to a recent time window.
Search domain filter
search_domain_filter
string Limits search to, or excludes, specific domains.
Search after date
search_after_date_filter
string Restricts search results to content published after this date (MM/DD/YYYY).
Search before date
search_before_date_filter
string Restricts search results to content published before this date (MM/DD/YYYY).
Search context size
web_search_options.search_context_size
enum (low | medium | high) "low" Controls how much web search context is retrieved before generating the answer.
Return images
return_images
boolean false Controls whether the response may include related images from the search.
Return related questions
return_related_questions
boolean false Controls whether the response includes suggested follow-up questions.
Disable search
disable_search
boolean false Turns off web search so the model answers from its own knowledge only.

How to use

Building with an AI agent? Hit Copy to grab this whole guide as Markdown and paste it in — or point your agent straight at /llms.txt.

modelparams.dev is an open, community-maintained catalog of LLM model parameters. Each entry shows the knobs you can turn — type, default, range, and the conditions that gate it.

The same model accessed via an API key and via a subscription usually exposes a different set of parameters. We list both as separate entries so the data stays honest.

Catalog API

The full catalog is static JSON, CORS-enabled, served from the edge.

curl https://modelparams.dev/api/v1/models.json

Each entry is keyed by provider/model for API-key variants; subscription variants append -subscription.

Single model

curl https://modelparams.dev/api/v1/models/anthropic/claude-opus-4-7.json
curl https://modelparams.dev/api/v1/models/anthropic/claude-opus-4-7-subscription.json

JSON Schema

Every entry validates against a JSON Schema you can use in your editor or pipeline.

curl https://modelparams.dev/api/v1/schema.json

Add this header to any YAML you author for autocomplete in VS Code:

# yaml-language-server: $schema=https://modelparams.dev/api/v1/schema.json

Logos

Provider logos are available at /assets/logos/{provider}.svg where {provider} is the provider slug. They use currentColor so they inherit your text color.

curl https://modelparams.dev/assets/logos/anthropic.svg

Logos are sourced from the models.dev repo (MIT) and used under nominative fair use.

Contribute

The data lives in YAML under models/{provider}/{model}-{auth}.yaml in the GitHub repo. Open a PR; CI validates against the schema and rebuilds.

Edit on GitHub MIT licensed