modelparams.dev

Every LLM parameter,
for every model.

An open, community-maintained catalog of LLM model parameters. Browse the UI below, query the API, or install the npm package.

$ npm i modelparams
GitHub Use the API

Access type

Providers

Parameters

191 of 191 models

OpenAI

Chatgpt 4o Latest OpenAI 3 params
Length 1 param
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…+∞) 4096 Maximum number of output tokens the model may generate.
Sampling 2 params
Parameter Type Default Description Condition
Temperature
temperature
number (0…2 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.
Gpt 3.5 Turbo OpenAI 3 params
Length 1 param
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…+∞) 4096 Maximum number of output tokens the model may generate.
Sampling 2 params
Parameter Type Default Description Condition
Temperature
temperature
number (0…2 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.
Gpt 4 Turbo OpenAI 3 params
Length 1 param
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…+∞) 4096 Maximum number of output tokens the model may generate.
Sampling 2 params
Parameter Type Default Description Condition
Temperature
temperature
number (0…2 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.
Gpt 4 Turbo 2024-04-09 OpenAI 3 params
Length 1 param
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…+∞) 4096 Maximum number of output tokens the model may generate.
Sampling 2 params
Parameter Type Default Description Condition
Temperature
temperature
number (0…2 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.
Gpt 4.1 OpenAI 3 params
Length 1 param
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…+∞) 4096 Maximum number of output tokens the model may generate.
Sampling 2 params
Parameter Type Default Description Condition
Temperature
temperature
number (0…2 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.
Gpt 4.1 Mini OpenAI 3 params
Length 1 param
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…+∞) 4096 Maximum number of output tokens the model may generate.
Sampling 2 params
Parameter Type Default Description Condition
Temperature
temperature
number (0…2 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.
Gpt 4.1 Nano OpenAI 3 params
Length 1 param
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…+∞) 4096 Maximum number of output tokens the model may generate.
Sampling 2 params
Parameter Type Default Description Condition
Temperature
temperature
number (0…2 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.
GPT-4o OpenAI 3 params
Length 1 param
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…+∞) 4096 Maximum number of output tokens the model may generate.
Sampling 2 params
Parameter Type Default Description Condition
Temperature
temperature
number (0…2 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.
Gpt 4o 2024-11-20 OpenAI 3 params
Length 1 param
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…+∞) 4096 Maximum number of output tokens the model may generate.
Sampling 2 params
Parameter Type Default Description Condition
Temperature
temperature
number (0…2 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.
GPT-4o mini OpenAI 3 params
Length 1 param
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…+∞) 4096 Maximum number of output tokens the model may generate.
Sampling 2 params
Parameter Type Default Description Condition
Temperature
temperature
number (0…2 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.
Gpt 5 OpenAI 2 params
Length 1 param
Parameter Type Default Description Condition
Max completion tokens
max_completion_tokens
integer (16…+∞) 4096 Maximum number of output tokens the model may generate.
Reasoning 1 param
Parameter Type Default Description Condition
Reasoning effort
reasoning_effort
enum (minimal | low | medium | high) "medium" Controls how much reasoning the model should perform before producing an answer.
Gpt 5 Chat Latest OpenAI 1 param
Length 1 param
Parameter Type Default Description Condition
Max completion tokens
max_completion_tokens
integer (16…+∞) 4096 Maximum number of output tokens the model may generate.
Gpt 5 Mini OpenAI 2 params
Length 1 param
Parameter Type Default Description Condition
Max completion tokens
max_completion_tokens
integer (16…+∞) 4096 Maximum number of output tokens the model may generate.
Reasoning 1 param
Parameter Type Default Description Condition
Reasoning effort
reasoning_effort
enum (minimal | low | medium | high) "medium" Controls how much reasoning the model should perform before producing an answer.
Gpt 5 Nano OpenAI 2 params
Length 1 param
Parameter Type Default Description Condition
Max completion tokens
max_completion_tokens
integer (16…+∞) 4096 Maximum number of output tokens the model may generate.
Reasoning 1 param
Parameter Type Default Description Condition
Reasoning effort
reasoning_effort
enum (minimal | low | medium | high) "medium" Controls how much reasoning the model should perform before producing an answer.
Gpt 5.1 OpenAI 2 params
Length 1 param
Parameter Type Default Description Condition
Max completion tokens
max_completion_tokens
integer (16…+∞) 4096 Maximum number of output tokens the model may generate.
Reasoning 1 param
Parameter Type Default Description Condition
Reasoning effort
reasoning_effort
enum (none | low | medium | high) "none" Controls how much reasoning the model should perform before producing an answer.
Gpt 5.1 Codex Max OpenAI Subscription 3 params
Reasoning 2 params
Parameter Type Default Description Condition
Reasoning effort
reasoning.effort
enum (minimal | low | medium | high | xhigh) "medium" Controls how much reasoning the model should perform before producing an answer.
Reasoning summary
reasoning.summary
enum (auto | concise | detailed | none) "auto" Controls the level of reasoning summary returned with the response.
Output 1 param
Parameter Type Default Description Condition
Verbosity
text.verbosity
enum (low | medium | high) "medium" Controls how concise or detailed the model's final text response should be.
Gpt 5.1 Codex OpenAI Subscription 3 params
Reasoning 2 params
Parameter Type Default Description Condition
Reasoning effort
reasoning.effort
enum (minimal | low | medium | high) "medium" Controls how much reasoning the model should perform before producing an answer.
Reasoning summary
reasoning.summary
enum (auto | concise | detailed | none) "auto" Controls the level of reasoning summary returned with the response.
Output 1 param
Parameter Type Default Description Condition
Verbosity
text.verbosity
enum (low | medium | high) "medium" Controls how concise or detailed the model's final text response should be.
Gpt 5.2 OpenAI 2 params
Length 1 param
Parameter Type Default Description Condition
Max completion tokens
max_completion_tokens
integer (16…+∞) 4096 Maximum number of output tokens the model may generate.
Reasoning 1 param
Parameter Type Default Description Condition
Reasoning effort
reasoning_effort
enum (none | low | medium | high | xhigh) "medium" Controls how much reasoning the model should perform before producing an answer.
Gpt 5.2 Codex OpenAI Subscription 3 params
Reasoning 2 params
Parameter Type Default Description Condition
Reasoning effort
reasoning.effort
enum (minimal | low | medium | high | xhigh) "medium" Controls how much reasoning the model should perform before producing an answer.
Reasoning summary
reasoning.summary
enum (auto | concise | detailed | none) "auto" Controls the level of reasoning summary returned with the response.
Output 1 param
Parameter Type Default Description Condition
Verbosity
text.verbosity
enum (low | medium | high) "medium" Controls how concise or detailed the model's final text response should be.
Gpt 5.2 OpenAI Subscription 3 params
Reasoning 2 params
Parameter Type Default Description Condition
Reasoning effort
reasoning.effort
enum (minimal | low | medium | high | xhigh) "medium" Controls how much reasoning the model should perform before producing an answer.
Reasoning summary
reasoning.summary
enum (auto | concise | detailed | none) "auto" Controls the level of reasoning summary returned with the response.
Output 1 param
Parameter Type Default Description Condition
Verbosity
text.verbosity
enum (low | medium | high) "medium" Controls how concise or detailed the model's final text response should be.
Gpt 5.3 Codex OpenAI 2 params
Length 1 param
Parameter Type Default Description Condition
Max completion tokens
max_completion_tokens
integer (16…+∞) 4096 Maximum number of output tokens the model may generate.
Reasoning 1 param
Parameter Type Default Description Condition
Reasoning effort
reasoning_effort
enum (low | medium | high | xhigh) "medium" Controls how much reasoning the model should perform before producing an answer.
Gpt 5.3 Codex Spark OpenAI Subscription 3 params
Reasoning 2 params
Parameter Type Default Description Condition
Reasoning effort
reasoning.effort
enum (minimal | low | medium | high | xhigh) "medium" Controls how much reasoning the model should perform before producing an answer.
Reasoning summary
reasoning.summary
enum (auto | concise | detailed | none) "auto" Controls the level of reasoning summary returned with the response.
Output 1 param
Parameter Type Default Description Condition
Verbosity
text.verbosity
enum (low | medium | high) "medium" Controls how concise or detailed the model's final text response should be.
Gpt 5.3 Codex OpenAI Subscription 3 params
Reasoning 2 params
Parameter Type Default Description Condition
Reasoning effort
reasoning.effort
enum (minimal | low | medium | high | xhigh) "medium" Controls how much reasoning the model should perform before producing an answer.
Reasoning summary
reasoning.summary
enum (auto | concise | detailed | none) "auto" Controls the level of reasoning summary returned with the response.
Output 1 param
Parameter Type Default Description Condition
Verbosity
text.verbosity
enum (low | medium | high) "medium" Controls how concise or detailed the model's final text response should be.
Gpt 5.4 OpenAI 2 params
Length 1 param
Parameter Type Default Description Condition
Max completion tokens
max_completion_tokens
integer (16…+∞) 4096 Maximum number of output tokens the model may generate.
Reasoning 1 param
Parameter Type Default Description Condition
Reasoning effort
reasoning_effort
enum (none | low | medium | high | xhigh) "medium" Controls how much reasoning the model should perform before producing an answer.
Gpt 5.4 Mini OpenAI 2 params
Length 1 param
Parameter Type Default Description Condition
Max completion tokens
max_completion_tokens
integer (16…+∞) 4096 Maximum number of output tokens the model may generate.
Reasoning 1 param
Parameter Type Default Description Condition
Reasoning effort
reasoning_effort
enum (none | low | medium | high | xhigh) "medium" Controls how much reasoning the model should perform before producing an answer.
Gpt 5.4 Mini OpenAI Subscription 3 params
Reasoning 2 params
Parameter Type Default Description Condition
Reasoning effort
reasoning.effort
enum (minimal | low | medium | high | xhigh) "medium" Controls how much reasoning the model should perform before producing an answer.
Reasoning summary
reasoning.summary
enum (auto | concise | detailed | none) "auto" Controls the level of reasoning summary returned with the response.
Output 1 param
Parameter Type Default Description Condition
Verbosity
text.verbosity
enum (low | medium | high) "medium" Controls how concise or detailed the model's final text response should be.
Gpt 5.4 Nano OpenAI 2 params
Length 1 param
Parameter Type Default Description Condition
Max completion tokens
max_completion_tokens
integer (16…+∞) 4096 Maximum number of output tokens the model may generate.
Reasoning 1 param
Parameter Type Default Description Condition
Reasoning effort
reasoning_effort
enum (none | low | medium | high | xhigh) "medium" Controls how much reasoning the model should perform before producing an answer.
Gpt 5.4 Pro OpenAI 2 params
Length 1 param
Parameter Type Default Description Condition
Max completion tokens
max_completion_tokens
integer (16…+∞) 4096 Maximum number of output tokens the model may generate.
Reasoning 1 param
Parameter Type Default Description Condition
Reasoning effort
reasoning_effort
enum (medium | high | xhigh) "medium" Controls how much reasoning the model should perform before producing an answer.
Gpt 5.4 Pro OpenAI Subscription 3 params
Reasoning 2 params
Parameter Type Default Description Condition
Reasoning effort
reasoning.effort
enum (medium | high | xhigh) "medium" Controls how much reasoning the model should perform before producing an answer.
Reasoning summary
reasoning.summary
enum (auto | concise | detailed | none) "auto" Controls the level of reasoning summary returned with the response.
Output 1 param
Parameter Type Default Description Condition
Verbosity
text.verbosity
enum (low | medium | high) "medium" Controls how concise or detailed the model's final text response should be.
Gpt 5.4 OpenAI Subscription 3 params
Reasoning 2 params
Parameter Type Default Description Condition
Reasoning effort
reasoning.effort
enum (minimal | low | medium | high | xhigh) "medium" Controls how much reasoning the model should perform before producing an answer.
Reasoning summary
reasoning.summary
enum (auto | concise | detailed | none) "auto" Controls the level of reasoning summary returned with the response.
Output 1 param
Parameter Type Default Description Condition
Verbosity
text.verbosity
enum (low | medium | high) "medium" Controls how concise or detailed the model's final text response should be.
Gpt 5.5 OpenAI 2 params
Length 1 param
Parameter Type Default Description Condition
Max completion tokens
max_completion_tokens
integer (16…+∞) 4096 Maximum number of output tokens the model may generate.
Reasoning 1 param
Parameter Type Default Description Condition
Reasoning effort
reasoning_effort
enum (none | low | medium | high | xhigh) "medium" Controls how much reasoning the model should perform before producing an answer.
Gpt 5.5 Pro OpenAI 2 params
Length 1 param
Parameter Type Default Description Condition
Max completion tokens
max_completion_tokens
integer (16…+∞) 4096 Maximum number of output tokens the model may generate.
Reasoning 1 param
Parameter Type Default Description Condition
Reasoning effort
reasoning_effort
enum (medium | high | xhigh) "medium" Controls how much reasoning the model should perform before producing an answer.
Gpt 5.5 Pro OpenAI Subscription 3 params
Reasoning 2 params
Parameter Type Default Description Condition
Reasoning effort
reasoning.effort
enum (medium | high | xhigh) "medium" Controls how much reasoning the model should perform before producing an answer.
Reasoning summary
reasoning.summary
enum (auto | concise | detailed | none) "auto" Controls the level of reasoning summary returned with the response.
Output 1 param
Parameter Type Default Description Condition
Verbosity
text.verbosity
enum (low | medium | high) "medium" Controls how concise or detailed the model's final text response should be.
Gpt 5.5 OpenAI Subscription 3 params
Reasoning 2 params
Parameter Type Default Description Condition
Reasoning effort
reasoning.effort
enum (minimal | low | medium | high | xhigh) "medium" Controls how much reasoning the model should perform before producing an answer.
Reasoning summary
reasoning.summary
enum (auto | concise | detailed | none) "auto" Controls the level of reasoning summary returned with the response.
Output 1 param
Parameter Type Default Description Condition
Verbosity
text.verbosity
enum (low | medium | high) "medium" Controls how concise or detailed the model's final text response should be.
o1 OpenAI 2 params
Length 1 param
Parameter Type Default Description Condition
Max completion tokens
max_completion_tokens
integer (16…+∞) 4096 Maximum number of output tokens the model may generate.
Reasoning 1 param
Parameter Type Default Description Condition
Reasoning effort
reasoning_effort
enum (low | medium | high | xhigh) "medium" Controls how much reasoning the model should perform before producing an answer.
o1-mini OpenAI 2 params
Length 1 param
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…+∞) 4096 Maximum number of output tokens the model may generate.
Reasoning 1 param
Parameter Type Default Description Condition
Reasoning effort
reasoning_effort
enum (minimal | low | medium | high) "medium" Controls how much reasoning the model should perform before producing an answer.
O1 Preview OpenAI 2 params
Length 1 param
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…+∞) 4096 Maximum number of output tokens the model may generate.
Reasoning 1 param
Parameter Type Default Description Condition
Reasoning effort
reasoning_effort
enum (minimal | low | medium | high) "medium" Controls how much reasoning the model should perform before producing an answer.
o3 OpenAI 2 params
Length 1 param
Parameter Type Default Description Condition
Max completion tokens
max_completion_tokens
integer (16…+∞) 4096 Maximum number of output tokens the model may generate.
Reasoning 1 param
Parameter Type Default Description Condition
Reasoning effort
reasoning_effort
enum (low | medium | high | xhigh) "medium" Controls how much reasoning the model should perform before producing an answer.
o3-mini OpenAI 2 params
Length 1 param
Parameter Type Default Description Condition
Max completion tokens
max_completion_tokens
integer (16…+∞) 4096 Maximum number of output tokens the model may generate.
Reasoning 1 param
Parameter Type Default Description Condition
Reasoning effort
reasoning_effort
enum (low | medium | high | xhigh) "medium" Controls how much reasoning the model should perform before producing an answer.
O3 Pro OpenAI 2 params
Length 1 param
Parameter Type Default Description Condition
Max completion tokens
max_completion_tokens
integer (16…+∞) 4096 Maximum number of output tokens the model may generate.
Reasoning 1 param
Parameter Type Default Description Condition
Reasoning effort
reasoning_effort
enum (low | medium | high | xhigh) "medium" Controls how much reasoning the model should perform before producing an answer.
o4-mini OpenAI 2 params
Length 1 param
Parameter Type Default Description Condition
Max completion tokens
max_completion_tokens
integer (16…+∞) 4096 Maximum number of output tokens the model may generate.
Reasoning 1 param
Parameter Type Default Description Condition
Reasoning effort
reasoning_effort
enum (low | medium | high | xhigh) "medium" Controls how much reasoning the model should perform before producing an answer.

Anthropic

Claude 3.5 Haiku 20241022 Anthropic 4 params
Length 1 param
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…+∞) 4096 Maximum number of output tokens the model may generate.
Sampling 3 params
Parameter Type Default Description Condition
Temperature
temperature
number (0…1 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Not when thinking.type ∈ {"adaptive", "enabled"}
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.
Not when thinking.type ∈ {"adaptive", "enabled"} or temperature ≠ 1
Top K
top_k
integer (0…+∞) 0 Limits token sampling to the top K most likely next tokens.
Not when thinking.type ∈ {"adaptive", "enabled"}
Claude 3.5 Haiku Latest Anthropic 4 params
Length 1 param
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…+∞) 4096 Maximum number of output tokens the model may generate.
Sampling 3 params
Parameter Type Default Description Condition
Temperature
temperature
number (0…1 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Not when thinking.type ∈ {"adaptive", "enabled"}
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.
Not when thinking.type ∈ {"adaptive", "enabled"} or temperature ≠ 1
Top K
top_k
integer (0…+∞) 0 Limits token sampling to the top K most likely next tokens.
Not when thinking.type ∈ {"adaptive", "enabled"}
Claude 3.5 Sonnet 20241022 Anthropic 4 params
Length 1 param
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…+∞) 4096 Maximum number of output tokens the model may generate.
Sampling 3 params
Parameter Type Default Description Condition
Temperature
temperature
number (0…1 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Not when thinking.type ∈ {"adaptive", "enabled"}
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.
Not when thinking.type ∈ {"adaptive", "enabled"} or temperature ≠ 1
Top K
top_k
integer (0…+∞) 0 Limits token sampling to the top K most likely next tokens.
Not when thinking.type ∈ {"adaptive", "enabled"}
Claude 3.5 Sonnet Latest Anthropic 4 params
Length 1 param
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…+∞) 4096 Maximum number of output tokens the model may generate.
Sampling 3 params
Parameter Type Default Description Condition
Temperature
temperature
number (0…1 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Not when thinking.type ∈ {"adaptive", "enabled"}
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.
Not when thinking.type ∈ {"adaptive", "enabled"} or temperature ≠ 1
Top K
top_k
integer (0…+∞) 0 Limits token sampling to the top K most likely next tokens.
Not when thinking.type ∈ {"adaptive", "enabled"}
Claude 3.7 Sonnet 20250219 Anthropic 6 params
Length 1 param
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…+∞) 4096 Maximum number of output tokens the model may generate.
Sampling 3 params
Parameter Type Default Description Condition
Temperature
temperature
number (0…1 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Not when thinking.type ∈ {"adaptive", "enabled"}
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.
Not when thinking.type ∈ {"adaptive", "enabled"} or temperature ≠ 1
Top K
top_k
integer (0…+∞) 0 Limits token sampling to the top K most likely next tokens.
Not when thinking.type ∈ {"adaptive", "enabled"}
Reasoning 2 params
Parameter Type Default Description Condition
Thinking mode
thinking.type
enum (disabled | enabled) "disabled" Controls the Anthropic thinking mode values supported by this model.
Budget tokens
thinking.budget_tokens
integer (1024…+∞) 4096 Maximum token budget Anthropic may use for extended thinking before producing the final answer.
Only when thinking.type = "enabled"
Claude 3.7 Sonnet Latest Anthropic 6 params
Length 1 param
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…+∞) 4096 Maximum number of output tokens the model may generate.
Sampling 3 params
Parameter Type Default Description Condition
Temperature
temperature
number (0…1 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Not when thinking.type ∈ {"adaptive", "enabled"}
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.
Not when thinking.type ∈ {"adaptive", "enabled"} or temperature ≠ 1
Top K
top_k
integer (0…+∞) 0 Limits token sampling to the top K most likely next tokens.
Not when thinking.type ∈ {"adaptive", "enabled"}
Reasoning 2 params
Parameter Type Default Description Condition
Thinking mode
thinking.type
enum (disabled | enabled) "disabled" Controls the Anthropic thinking mode values supported by this model.
Budget tokens
thinking.budget_tokens
integer (1024…+∞) 4096 Maximum token budget Anthropic may use for extended thinking before producing the final answer.
Only when thinking.type = "enabled"
Claude 3 Opus 20240229 Anthropic 4 params
Length 1 param
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…+∞) 4096 Maximum number of output tokens the model may generate.
Sampling 3 params
Parameter Type Default Description Condition
Temperature
temperature
number (0…1 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Not when thinking.type ∈ {"adaptive", "enabled"}
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.
Not when thinking.type ∈ {"adaptive", "enabled"} or temperature ≠ 1
Top K
top_k
integer (0…+∞) 0 Limits token sampling to the top K most likely next tokens.
Not when thinking.type ∈ {"adaptive", "enabled"}
Claude 3 Opus Latest Anthropic 4 params
Length 1 param
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…+∞) 4096 Maximum number of output tokens the model may generate.
Sampling 3 params
Parameter Type Default Description Condition
Temperature
temperature
number (0…1 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Not when thinking.type ∈ {"adaptive", "enabled"}
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.
Not when thinking.type ∈ {"adaptive", "enabled"} or temperature ≠ 1
Top K
top_k
integer (0…+∞) 0 Limits token sampling to the top K most likely next tokens.
Not when thinking.type ∈ {"adaptive", "enabled"}
Claude Fable 5 Anthropic 4 params
Length 1 param
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…+∞) 4096 Maximum number of output tokens the model may generate.
Reasoning 3 params
Parameter Type Default Description Condition
Thinking mode
thinking.type
enum (adaptive) Only adaptive thinking is supported; omit the parameter entirely to run without thinking (an explicit disabled value is rejected).
Thinking display
thinking.display
enum (summarized | omitted) "omitted" Controls whether Anthropic returns summarized or omitted thinking content.
Only when thinking.type = "adaptive"
Effort
output_config.effort
enum (low | medium | high | xhigh | max) "high" Controls Anthropic response thoroughness and token spend.
Claude Fable 5 Anthropic Subscription 4 params
Length 1 param
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…+∞) 4096 Maximum number of output tokens the model may generate.
Reasoning 3 params
Parameter Type Default Description Condition
Thinking mode
thinking.type
enum (adaptive) Only adaptive thinking is supported; omit the parameter entirely to run without thinking (an explicit disabled value is rejected).
Thinking display
thinking.display
enum (summarized | omitted) "omitted" Controls whether Anthropic returns summarized or omitted thinking content.
Only when thinking.type = "adaptive"
Effort
output_config.effort
enum (low | medium | high | xhigh | max) "high" Controls Anthropic response thoroughness and token spend.
Claude Haiku 4 Anthropic 6 params
Length 1 param
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…+∞) 4096 Maximum number of output tokens the model may generate.
Sampling 3 params
Parameter Type Default Description Condition
Temperature
temperature
number (0…1 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Not when thinking.type ∈ {"adaptive", "enabled"}
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.
Not when thinking.type ∈ {"adaptive", "enabled"} or temperature ≠ 1
Top K
top_k
integer (0…+∞) 0 Limits token sampling to the top K most likely next tokens.
Not when thinking.type ∈ {"adaptive", "enabled"}
Reasoning 2 params
Parameter Type Default Description Condition
Thinking mode
thinking.type
enum (disabled | enabled) "disabled" Controls the Anthropic thinking mode values supported by this model.
Budget tokens
thinking.budget_tokens
integer (1024…+∞) 4096 Maximum token budget Anthropic may use for extended thinking before producing the final answer.
Only when thinking.type = "enabled"
Claude Haiku 4.5 Anthropic 6 params
Length 1 param
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…+∞) 4096 Maximum number of output tokens the model may generate.
Sampling 3 params
Parameter Type Default Description Condition
Temperature
temperature
number (0…1 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Not when thinking.type ∈ {"adaptive", "enabled"}
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.
Not when thinking.type ∈ {"adaptive", "enabled"} or temperature ≠ 1
Top K
top_k
integer (0…+∞) 0 Limits token sampling to the top K most likely next tokens.
Not when thinking.type ∈ {"adaptive", "enabled"}
Reasoning 2 params
Parameter Type Default Description Condition
Thinking mode
thinking.type
enum (disabled | enabled) "disabled" Controls the Anthropic thinking mode values supported by this model.
Budget tokens
thinking.budget_tokens
integer (1024…+∞) 4096 Maximum token budget Anthropic may use for extended thinking before producing the final answer.
Only when thinking.type = "enabled"
Claude Haiku 4.5 20251001 Anthropic 6 params
Length 1 param
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…+∞) 4096 Maximum number of output tokens the model may generate.
Sampling 3 params
Parameter Type Default Description Condition
Temperature
temperature
number (0…1 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Not when thinking.type = "enabled" or top_p ≠ null
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.
Not when thinking.type = "enabled" or temperature ≠ null
Top K
top_k
integer (0…+∞) 0 Limits token sampling to the top K most likely next tokens.
Not when thinking.type = "enabled"
Reasoning 2 params
Parameter Type Default Description Condition
Thinking mode
thinking.type
enum (disabled | enabled) "disabled" Controls the Anthropic thinking mode values supported by this model.
Budget tokens
thinking.budget_tokens
integer (1024…+∞) 4096 Maximum token budget Anthropic may use for extended thinking before producing the final answer.
Only when thinking.type = "enabled"
Claude Haiku 4.5 20251001 Anthropic Subscription 6 params
Length 1 param
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…+∞) 4096 Maximum number of output tokens the model may generate.
Sampling 3 params
Parameter Type Default Description Condition
Temperature
temperature
number (0…1 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Not when thinking.type = "enabled" or top_p ≠ null
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.
Not when thinking.type = "enabled" or temperature ≠ null
Top K
top_k
integer (0…+∞) 0 Limits token sampling to the top K most likely next tokens.
Not when thinking.type = "enabled"
Reasoning 2 params
Parameter Type Default Description Condition
Thinking mode
thinking.type
enum (disabled | enabled) "disabled" Controls the Anthropic thinking mode values supported by this model.
Budget tokens
thinking.budget_tokens
integer (1024…+∞) 4096 Maximum token budget Anthropic may use for extended thinking before producing the final answer.
Only when thinking.type = "enabled"
Claude Haiku 4.5 Anthropic Subscription 6 params
Length 1 param
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…+∞) 4096 Maximum number of output tokens the model may generate.
Sampling 3 params
Parameter Type Default Description Condition
Temperature
temperature
number (0…1 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Not when thinking.type ∈ {"adaptive", "enabled"}
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.
Not when thinking.type ∈ {"adaptive", "enabled"} or temperature ≠ 1
Top K
top_k
integer (0…+∞) 0 Limits token sampling to the top K most likely next tokens.
Not when thinking.type ∈ {"adaptive", "enabled"}
Reasoning 2 params
Parameter Type Default Description Condition
Thinking mode
thinking.type
enum (disabled | enabled) "disabled" Controls the Anthropic thinking mode values supported by this model.
Budget tokens
thinking.budget_tokens
integer (1024…+∞) 4096 Maximum token budget Anthropic may use for extended thinking before producing the final answer.
Only when thinking.type = "enabled"
Claude Haiku 4 Anthropic Subscription 6 params
Length 1 param
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…+∞) 4096 Maximum number of output tokens the model may generate.
Sampling 3 params
Parameter Type Default Description Condition
Temperature
temperature
number (0…1 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Not when thinking.type ∈ {"adaptive", "enabled"}
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.
Not when thinking.type ∈ {"adaptive", "enabled"} or temperature ≠ 1
Top K
top_k
integer (0…+∞) 0 Limits token sampling to the top K most likely next tokens.
Not when thinking.type ∈ {"adaptive", "enabled"}
Reasoning 2 params
Parameter Type Default Description Condition
Thinking mode
thinking.type
enum (disabled | enabled) "disabled" Controls the Anthropic thinking mode values supported by this model.
Budget tokens
thinking.budget_tokens
integer (1024…+∞) 4096 Maximum token budget Anthropic may use for extended thinking before producing the final answer.
Only when thinking.type = "enabled"
Claude Opus 4.1 20250805 Anthropic 7 params
Length 1 param
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…+∞) 4096 Maximum number of output tokens the model may generate.
Sampling 3 params
Parameter Type Default Description Condition
Temperature
temperature
number (0…1 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Not when thinking.type = "enabled" or top_p ≠ null
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.
Not when thinking.type = "enabled" or temperature ≠ null
Top K
top_k
integer (0…+∞) 0 Limits token sampling to the top K most likely next tokens.
Not when thinking.type = "enabled"
Reasoning 3 params
Parameter Type Default Description Condition
Thinking mode
thinking.type
enum (disabled | enabled) "disabled" Controls the Anthropic thinking mode values supported by this model.
Budget tokens
thinking.budget_tokens
integer (1024…+∞) 4096 Maximum token budget Anthropic may use for extended thinking before producing the final answer.
Only when thinking.type = "enabled"
Thinking display
thinking.display
enum (summarized | omitted) "summarized" Controls whether Anthropic returns summarized or omitted thinking content.
Only when thinking.type = "enabled"
Claude Opus 4.1 20250805 Anthropic Subscription 7 params
Length 1 param
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…+∞) 4096 Maximum number of output tokens the model may generate.
Sampling 3 params
Parameter Type Default Description Condition
Temperature
temperature
number (0…1 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Not when thinking.type = "enabled" or top_p ≠ null
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.
Not when thinking.type = "enabled" or temperature ≠ null
Top K
top_k
integer (0…+∞) 0 Limits token sampling to the top K most likely next tokens.
Not when thinking.type = "enabled"
Reasoning 3 params
Parameter Type Default Description Condition
Thinking mode
thinking.type
enum (disabled | enabled) "disabled" Controls the Anthropic thinking mode values supported by this model.
Budget tokens
thinking.budget_tokens
integer (1024…+∞) 4096 Maximum token budget Anthropic may use for extended thinking before producing the final answer.
Only when thinking.type = "enabled"
Thinking display
thinking.display
enum (summarized | omitted) "summarized" Controls whether Anthropic returns summarized or omitted thinking content.
Only when thinking.type = "enabled"
Claude Opus 4 20250514 Anthropic 7 params
Length 1 param
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…+∞) 4096 Maximum number of output tokens the model may generate.
Sampling 3 params
Parameter Type Default Description Condition
Temperature
temperature
number (0…1 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Not when thinking.type = "enabled"
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.
Not when thinking.type = "enabled"
Top K
top_k
integer (0…+∞) 0 Limits token sampling to the top K most likely next tokens.
Not when thinking.type = "enabled"
Reasoning 3 params
Parameter Type Default Description Condition
Thinking mode
thinking.type
enum (disabled | enabled) "disabled" Controls the Anthropic thinking mode values supported by this model.
Budget tokens
thinking.budget_tokens
integer (1024…+∞) 4096 Maximum token budget Anthropic may use for extended thinking before producing the final answer.
Only when thinking.type = "enabled"
Thinking display
thinking.display
enum (summarized | omitted) "summarized" Controls whether Anthropic returns summarized or omitted thinking content.
Only when thinking.type = "enabled"
Claude Opus 4 20250514 Anthropic Subscription 7 params
Length 1 param
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…+∞) 4096 Maximum number of output tokens the model may generate.
Sampling 3 params
Parameter Type Default Description Condition
Temperature
temperature
number (0…1 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Not when thinking.type = "enabled"
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.
Not when thinking.type = "enabled"
Top K
top_k
integer (0…+∞) 0 Limits token sampling to the top K most likely next tokens.
Not when thinking.type = "enabled"
Reasoning 3 params
Parameter Type Default Description Condition
Thinking mode
thinking.type
enum (disabled | enabled) "disabled" Controls the Anthropic thinking mode values supported by this model.
Budget tokens
thinking.budget_tokens
integer (1024…+∞) 4096 Maximum token budget Anthropic may use for extended thinking before producing the final answer.
Only when thinking.type = "enabled"
Thinking display
thinking.display
enum (summarized | omitted) "summarized" Controls whether Anthropic returns summarized or omitted thinking content.
Only when thinking.type = "enabled"
Claude Opus 4.5 20251101 Anthropic 8 params
Length 1 param
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…+∞) 4096 Maximum number of output tokens the model may generate.
Sampling 3 params
Parameter Type Default Description Condition
Temperature
temperature
number (0…1 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Not when thinking.type = "enabled" or top_p ≠ null
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.
Not when thinking.type = "enabled" or temperature ≠ null
Top K
top_k
integer (0…+∞) 0 Limits token sampling to the top K most likely next tokens.
Not when thinking.type = "enabled"
Reasoning 4 params
Parameter Type Default Description Condition
Thinking mode
thinking.type
enum (disabled | enabled) "disabled" Controls the Anthropic thinking mode values supported by this model.
Budget tokens
thinking.budget_tokens
integer (1024…+∞) 4096 Maximum token budget Anthropic may use for extended thinking before producing the final answer.
Only when thinking.type = "enabled"
Thinking display
thinking.display
enum (summarized | omitted) "summarized" Controls whether Anthropic returns summarized or omitted thinking content.
Only when thinking.type = "enabled"
Effort
output_config.effort
enum (low | medium | high) "high" Controls Anthropic response thoroughness and token spend.
Claude Opus 4.5 20251101 Anthropic Subscription 8 params
Length 1 param
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…+∞) 4096 Maximum number of output tokens the model may generate.
Sampling 3 params
Parameter Type Default Description Condition
Temperature
temperature
number (0…1 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Not when thinking.type = "enabled" or top_p ≠ null
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.
Not when thinking.type = "enabled" or temperature ≠ null
Top K
top_k
integer (0…+∞) 0 Limits token sampling to the top K most likely next tokens.
Not when thinking.type = "enabled"
Reasoning 4 params
Parameter Type Default Description Condition
Thinking mode
thinking.type
enum (disabled | enabled) "disabled" Controls the Anthropic thinking mode values supported by this model.
Budget tokens
thinking.budget_tokens
integer (1024…+∞) 4096 Maximum token budget Anthropic may use for extended thinking before producing the final answer.
Only when thinking.type = "enabled"
Thinking display
thinking.display
enum (summarized | omitted) "summarized" Controls whether Anthropic returns summarized or omitted thinking content.
Only when thinking.type = "enabled"
Effort
output_config.effort
enum (low | medium | high) "high" Controls Anthropic response thoroughness and token spend.
Claude Opus 4.6 Anthropic 8 params
Length 1 param
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…+∞) 4096 Maximum number of output tokens the model may generate.
Sampling 3 params
Parameter Type Default Description Condition
Temperature
temperature
number (0…1 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Not when thinking.type ∈ {"enabled", "adaptive"} or top_p ≠ null
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.
Not when thinking.type ∈ {"enabled", "adaptive"} or temperature ≠ null
Top K
top_k
integer (0…+∞) 0 Limits token sampling to the top K most likely next tokens.
Not when thinking.type ∈ {"enabled", "adaptive"}
Reasoning 4 params
Parameter Type Default Description Condition
Thinking mode
thinking.type
enum (disabled | adaptive | enabled) "disabled" Controls the Anthropic thinking mode values supported by this model.
Budget tokens
thinking.budget_tokens
integer (1024…+∞) 4096 Maximum token budget Anthropic may use for extended thinking before producing the final answer.
Only when thinking.type = "enabled"
Thinking display
thinking.display
enum (summarized | omitted) "summarized" Controls whether Anthropic returns summarized or omitted thinking content.
Only when thinking.type ∈ {"adaptive", "enabled"}
Effort
output_config.effort
enum (low | medium | high | max) "high" Controls Anthropic response thoroughness and token spend.
Claude Opus 4.6 Anthropic Subscription 8 params
Length 1 param
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…+∞) 4096 Maximum number of output tokens the model may generate.
Sampling 3 params
Parameter Type Default Description Condition
Temperature
temperature
number (0…1 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Not when thinking.type ∈ {"enabled", "adaptive"} or top_p ≠ null
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.
Not when thinking.type ∈ {"enabled", "adaptive"} or temperature ≠ null
Top K
top_k
integer (0…+∞) 0 Limits token sampling to the top K most likely next tokens.
Not when thinking.type ∈ {"enabled", "adaptive"}
Reasoning 4 params
Parameter Type Default Description Condition
Thinking mode
thinking.type
enum (disabled | adaptive | enabled) "disabled" Controls the Anthropic thinking mode values supported by this model.
Budget tokens
thinking.budget_tokens
integer (1024…+∞) 4096 Maximum token budget Anthropic may use for extended thinking before producing the final answer.
Only when thinking.type = "enabled"
Thinking display
thinking.display
enum (summarized | omitted) "summarized" Controls whether Anthropic returns summarized or omitted thinking content.
Only when thinking.type ∈ {"adaptive", "enabled"}
Effort
output_config.effort
enum (low | medium | high | max) "high" Controls Anthropic response thoroughness and token spend.
Claude Opus 4.7 Anthropic 4 params
Length 1 param
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…+∞) 4096 Maximum number of output tokens the model may generate.
Reasoning 3 params
Parameter Type Default Description Condition
Thinking mode
thinking.type
enum (disabled | adaptive) "disabled" Controls the Anthropic thinking mode values supported by this model.
Thinking display
thinking.display
enum (summarized | omitted) "omitted" Controls whether Anthropic returns summarized or omitted thinking content.
Only when thinking.type = "adaptive"
Effort
output_config.effort
enum (low | medium | high | xhigh | max) "high" Controls Anthropic response thoroughness and token spend.
Claude Opus 4.7 Anthropic Subscription 4 params
Length 1 param
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…+∞) 4096 Maximum number of output tokens the model may generate.
Reasoning 3 params
Parameter Type Default Description Condition
Thinking mode
thinking.type
enum (disabled | adaptive) "disabled" Controls the Anthropic thinking mode values supported by this model.
Thinking display
thinking.display
enum (summarized | omitted) "omitted" Controls whether Anthropic returns summarized or omitted thinking content.
Only when thinking.type = "adaptive"
Effort
output_config.effort
enum (low | medium | high | xhigh | max) "high" Controls Anthropic response thoroughness and token spend.
Claude Opus 4.8 Anthropic 4 params
Length 1 param
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…+∞) 4096 Maximum number of output tokens the model may generate.
Reasoning 3 params
Parameter Type Default Description Condition
Thinking mode
thinking.type
enum (disabled | adaptive) "disabled" Controls the Anthropic thinking mode values supported by this model.
Thinking display
thinking.display
enum (summarized | omitted) "omitted" Controls whether Anthropic returns summarized or omitted thinking content.
Only when thinking.type = "adaptive"
Effort
output_config.effort
enum (low | medium | high | xhigh | max) "high" Controls Anthropic response thoroughness and token spend.
Claude Opus 4.8 Anthropic Subscription 4 params
Length 1 param
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…+∞) 4096 Maximum number of output tokens the model may generate.
Reasoning 3 params
Parameter Type Default Description Condition
Thinking mode
thinking.type
enum (disabled | adaptive) "disabled" Controls the Anthropic thinking mode values supported by this model.
Thinking display
thinking.display
enum (summarized | omitted) "omitted" Controls whether Anthropic returns summarized or omitted thinking content.
Only when thinking.type = "adaptive"
Effort
output_config.effort
enum (low | medium | high | xhigh | max) "high" Controls Anthropic response thoroughness and token spend.
Claude Opus 4 Anthropic Subscription 6 params
Length 1 param
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…+∞) 4096 Maximum number of output tokens the model may generate.
Sampling 3 params
Parameter Type Default Description Condition
Temperature
temperature
number (0…1 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Not when thinking.type ∈ {"adaptive", "enabled"}
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.
Not when thinking.type ∈ {"adaptive", "enabled"} or temperature ≠ 1
Top K
top_k
integer (0…+∞) 0 Limits token sampling to the top K most likely next tokens.
Not when thinking.type ∈ {"adaptive", "enabled"}
Reasoning 2 params
Parameter Type Default Description Condition
Thinking mode
thinking.type
enum (disabled | adaptive | enabled) "disabled" Controls the Anthropic thinking mode values supported by this model.
Budget tokens
thinking.budget_tokens
integer (1024…+∞) 4096 Maximum token budget Anthropic may use for extended thinking before producing the final answer.
Only when thinking.type = "enabled"
Claude Sonnet 4 20250514 Anthropic 7 params
Length 1 param
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…+∞) 4096 Maximum number of output tokens the model may generate.
Sampling 3 params
Parameter Type Default Description Condition
Temperature
temperature
number (0…1 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Not when thinking.type = "enabled"
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.
Not when thinking.type = "enabled"
Top K
top_k
integer (0…+∞) 0 Limits token sampling to the top K most likely next tokens.
Not when thinking.type = "enabled"
Reasoning 3 params
Parameter Type Default Description Condition
Thinking mode
thinking.type
enum (disabled | enabled) "disabled" Controls the Anthropic thinking mode values supported by this model.
Budget tokens
thinking.budget_tokens
integer (1024…+∞) 4096 Maximum token budget Anthropic may use for extended thinking before producing the final answer.
Only when thinking.type = "enabled"
Thinking display
thinking.display
enum (summarized | omitted) "summarized" Controls whether Anthropic returns summarized or omitted thinking content.
Only when thinking.type = "enabled"
Claude Sonnet 4 20250514 Anthropic Subscription 7 params
Length 1 param
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…+∞) 4096 Maximum number of output tokens the model may generate.
Sampling 3 params
Parameter Type Default Description Condition
Temperature
temperature
number (0…1 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Not when thinking.type = "enabled"
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.
Not when thinking.type = "enabled"
Top K
top_k
integer (0…+∞) 0 Limits token sampling to the top K most likely next tokens.
Not when thinking.type = "enabled"
Reasoning 3 params
Parameter Type Default Description Condition
Thinking mode
thinking.type
enum (disabled | enabled) "disabled" Controls the Anthropic thinking mode values supported by this model.
Budget tokens
thinking.budget_tokens
integer (1024…+∞) 4096 Maximum token budget Anthropic may use for extended thinking before producing the final answer.
Only when thinking.type = "enabled"
Thinking display
thinking.display
enum (summarized | omitted) "summarized" Controls whether Anthropic returns summarized or omitted thinking content.
Only when thinking.type = "enabled"
Claude Sonnet 4.5 Anthropic 6 params
Length 1 param
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…+∞) 4096 Maximum number of output tokens the model may generate.
Sampling 3 params
Parameter Type Default Description Condition
Temperature
temperature
number (0…1 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Not when thinking.type ∈ {"adaptive", "enabled"}
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.
Not when thinking.type ∈ {"adaptive", "enabled"} or temperature ≠ 1
Top K
top_k
integer (0…+∞) 0 Limits token sampling to the top K most likely next tokens.
Not when thinking.type ∈ {"adaptive", "enabled"}
Reasoning 2 params
Parameter Type Default Description Condition
Thinking mode
thinking.type
enum (disabled | adaptive | enabled) "disabled" Controls the Anthropic thinking mode values supported by this model.
Budget tokens
thinking.budget_tokens
integer (1024…+∞) 4096 Maximum token budget Anthropic may use for extended thinking before producing the final answer.
Only when thinking.type = "enabled"
Claude Sonnet 4.5 20250929 Anthropic 6 params
Length 1 param
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…+∞) 4096 Maximum number of output tokens the model may generate.
Sampling 3 params
Parameter Type Default Description Condition
Temperature
temperature
number (0…1 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Not when thinking.type = "enabled" or top_p ≠ null
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.
Not when thinking.type = "enabled" or temperature ≠ null
Top K
top_k
integer (0…+∞) 0 Limits token sampling to the top K most likely next tokens.
Not when thinking.type = "enabled"
Reasoning 2 params
Parameter Type Default Description Condition
Thinking mode
thinking.type
enum (disabled | enabled) "disabled" Controls the Anthropic thinking mode values supported by this model.
Budget tokens
thinking.budget_tokens
integer (1024…+∞) 4096 Maximum token budget Anthropic may use for extended thinking before producing the final answer.
Only when thinking.type = "enabled"
Claude Sonnet 4.5 20250929 Anthropic Subscription 6 params
Length 1 param
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…+∞) 4096 Maximum number of output tokens the model may generate.
Sampling 3 params
Parameter Type Default Description Condition
Temperature
temperature
number (0…1 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Not when thinking.type = "enabled" or top_p ≠ null
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.
Not when thinking.type = "enabled" or temperature ≠ null
Top K
top_k
integer (0…+∞) 0 Limits token sampling to the top K most likely next tokens.
Not when thinking.type = "enabled"
Reasoning 2 params
Parameter Type Default Description Condition
Thinking mode
thinking.type
enum (disabled | enabled) "disabled" Controls the Anthropic thinking mode values supported by this model.
Budget tokens
thinking.budget_tokens
integer (1024…+∞) 4096 Maximum token budget Anthropic may use for extended thinking before producing the final answer.
Only when thinking.type = "enabled"
Claude Sonnet 4.5 Anthropic Subscription 6 params
Length 1 param
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…+∞) 4096 Maximum number of output tokens the model may generate.
Sampling 3 params
Parameter Type Default Description Condition
Temperature
temperature
number (0…1 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Not when thinking.type ∈ {"adaptive", "enabled"}
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.
Not when thinking.type ∈ {"adaptive", "enabled"} or temperature ≠ 1
Top K
top_k
integer (0…+∞) 0 Limits token sampling to the top K most likely next tokens.
Not when thinking.type ∈ {"adaptive", "enabled"}
Reasoning 2 params
Parameter Type Default Description Condition
Thinking mode
thinking.type
enum (disabled | adaptive | enabled) "disabled" Controls the Anthropic thinking mode values supported by this model.
Budget tokens
thinking.budget_tokens
integer (1024…+∞) 4096 Maximum token budget Anthropic may use for extended thinking before producing the final answer.
Only when thinking.type = "enabled"
Claude Sonnet 4.6 Anthropic 8 params
Length 1 param
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…+∞) 4096 Maximum number of output tokens the model may generate.
Sampling 3 params
Parameter Type Default Description Condition
Temperature
temperature
number (0…1 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Not when thinking.type ∈ {"enabled", "adaptive"} or top_p ≠ null
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.
Not when thinking.type ∈ {"enabled", "adaptive"} or temperature ≠ null
Top K
top_k
integer (0…+∞) 0 Limits token sampling to the top K most likely next tokens.
Not when thinking.type ∈ {"enabled", "adaptive"}
Reasoning 4 params
Parameter Type Default Description Condition
Thinking mode
thinking.type
enum (disabled | adaptive | enabled) "disabled" Controls the Anthropic thinking mode values supported by this model.
Budget tokens
thinking.budget_tokens
integer (1024…+∞) 4096 Maximum token budget Anthropic may use for extended thinking before producing the final answer.
Only when thinking.type = "enabled"
Thinking display
thinking.display
enum (summarized | omitted) "summarized" Controls whether Anthropic returns summarized or omitted thinking content.
Only when thinking.type ∈ {"adaptive", "enabled"}
Effort
output_config.effort
enum (low | medium | high | max) "high" Controls Anthropic response thoroughness and token spend.
Claude Sonnet 4.6 Anthropic Subscription 8 params
Length 1 param
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…+∞) 4096 Maximum number of output tokens the model may generate.
Sampling 3 params
Parameter Type Default Description Condition
Temperature
temperature
number (0…1 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Not when thinking.type ∈ {"enabled", "adaptive"} or top_p ≠ null
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.
Not when thinking.type ∈ {"enabled", "adaptive"} or temperature ≠ null
Top K
top_k
integer (0…+∞) 0 Limits token sampling to the top K most likely next tokens.
Not when thinking.type ∈ {"enabled", "adaptive"}
Reasoning 4 params
Parameter Type Default Description Condition
Thinking mode
thinking.type
enum (disabled | adaptive | enabled) "disabled" Controls the Anthropic thinking mode values supported by this model.
Budget tokens
thinking.budget_tokens
integer (1024…+∞) 4096 Maximum token budget Anthropic may use for extended thinking before producing the final answer.
Only when thinking.type = "enabled"
Thinking display
thinking.display
enum (summarized | omitted) "summarized" Controls whether Anthropic returns summarized or omitted thinking content.
Only when thinking.type ∈ {"adaptive", "enabled"}
Effort
output_config.effort
enum (low | medium | high | max) "high" Controls Anthropic response thoroughness and token spend.
Claude Sonnet 4 Anthropic Subscription 6 params
Length 1 param
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…+∞) 4096 Maximum number of output tokens the model may generate.
Sampling 3 params
Parameter Type Default Description Condition
Temperature
temperature
number (0…1 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Not when thinking.type ∈ {"adaptive", "enabled"}
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.
Not when thinking.type ∈ {"adaptive", "enabled"} or temperature ≠ 1
Top K
top_k
integer (0…+∞) 0 Limits token sampling to the top K most likely next tokens.
Not when thinking.type ∈ {"adaptive", "enabled"}
Reasoning 2 params
Parameter Type Default Description Condition
Thinking mode
thinking.type
enum (disabled | adaptive | enabled) "disabled" Controls the Anthropic thinking mode values supported by this model.
Budget tokens
thinking.budget_tokens
integer (1024…+∞) 4096 Maximum token budget Anthropic may use for extended thinking before producing the final answer.
Only when thinking.type = "enabled"

Z.ai

GLM-4.5 Z.ai 6 params
Length 1 param
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…+∞) Maximum number of tokens to generate in the response.
Sampling 3 params
Parameter Type Default Description Condition
Temperature
temperature
number (0…1 step 0.1) 0.6 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Not when do_sample = false
Top P
top_p
number (0.01…1 step 0.01) 0.95 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Not when do_sample = false
Do sample
do_sample
boolean true When false, the model uses greedy decoding and ignores temperature and top_p.
Reasoning 1 param
Parameter Type Default Description Condition
Thinking mode
thinking.type
enum (enabled | disabled) "enabled" Toggles the model's extended reasoning before it produces the final answer.
Output 1 param
Parameter Type Default Description Condition
Response format
response_format.type
enum (text | json_object) "text" Forces the response into plain text or a JSON object.
GLM-4.5-Air Z.ai 6 params
Length 1 param
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…+∞) Maximum number of tokens to generate in the response.
Sampling 3 params
Parameter Type Default Description Condition
Temperature
temperature
number (0…1 step 0.1) 0.6 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Not when do_sample = false
Top P
top_p
number (0.01…1 step 0.01) 0.95 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Not when do_sample = false
Do sample
do_sample
boolean true When false, the model uses greedy decoding and ignores temperature and top_p.
Reasoning 1 param
Parameter Type Default Description Condition
Thinking mode
thinking.type
enum (enabled | disabled) "enabled" Toggles the model's extended reasoning before it produces the final answer.
Output 1 param
Parameter Type Default Description Condition
Response format
response_format.type
enum (text | json_object) "text" Forces the response into plain text or a JSON object.
GLM-4.5-Air Z.ai Subscription 6 params
Length 1 param
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…+∞) Maximum number of tokens to generate in the response.
Sampling 3 params
Parameter Type Default Description Condition
Temperature
temperature
number (0…1 step 0.1) 0.6 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Not when do_sample = false
Top P
top_p
number (0.01…1 step 0.01) 0.95 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Not when do_sample = false
Do sample
do_sample
boolean true When false, the model uses greedy decoding and ignores temperature and top_p.
Reasoning 1 param
Parameter Type Default Description Condition
Thinking mode
thinking.type
enum (enabled | disabled) "enabled" Toggles the model's extended reasoning before it produces the final answer.
Output 1 param
Parameter Type Default Description Condition
Response format
response_format.type
enum (text | json_object) "text" Forces the response into plain text or a JSON object.
GLM-4.5-AirX Z.ai 6 params
Length 1 param
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…+∞) Maximum number of tokens to generate in the response.
Sampling 3 params
Parameter Type Default Description Condition
Temperature
temperature
number (0…1 step 0.1) 0.6 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Not when do_sample = false
Top P
top_p
number (0.01…1 step 0.01) 0.95 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Not when do_sample = false
Do sample
do_sample
boolean true When false, the model uses greedy decoding and ignores temperature and top_p.
Reasoning 1 param
Parameter Type Default Description Condition
Thinking mode
thinking.type
enum (enabled | disabled) "enabled" Toggles the model's extended reasoning before it produces the final answer.
Output 1 param
Parameter Type Default Description Condition
Response format
response_format.type
enum (text | json_object) "text" Forces the response into plain text or a JSON object.
GLM-4.5-Flash Z.ai 6 params
Length 1 param
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…+∞) Maximum number of tokens to generate in the response.
Sampling 3 params
Parameter Type Default Description Condition
Temperature
temperature
number (0…1 step 0.1) 0.6 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Not when do_sample = false
Top P
top_p
number (0.01…1 step 0.01) 0.95 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Not when do_sample = false
Do sample
do_sample
boolean true When false, the model uses greedy decoding and ignores temperature and top_p.
Reasoning 1 param
Parameter Type Default Description Condition
Thinking mode
thinking.type
enum (enabled | disabled) "enabled" Toggles the model's extended reasoning before it produces the final answer.
Output 1 param
Parameter Type Default Description Condition
Response format
response_format.type
enum (text | json_object) "text" Forces the response into plain text or a JSON object.
GLM-4.5 Z.ai Subscription 6 params
Length 1 param
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…+∞) Maximum number of tokens to generate in the response.
Sampling 3 params
Parameter Type Default Description Condition
Temperature
temperature
number (0…1 step 0.1) 0.6 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Not when do_sample = false
Top P
top_p
number (0.01…1 step 0.01) 0.95 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Not when do_sample = false
Do sample
do_sample
boolean true When false, the model uses greedy decoding and ignores temperature and top_p.
Reasoning 1 param
Parameter Type Default Description Condition
Thinking mode
thinking.type
enum (enabled | disabled) "enabled" Toggles the model's extended reasoning before it produces the final answer.
Output 1 param
Parameter Type Default Description Condition
Response format
response_format.type
enum (text | json_object) "text" Forces the response into plain text or a JSON object.
GLM-4.5-X Z.ai 6 params
Length 1 param
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…+∞) Maximum number of tokens to generate in the response.
Sampling 3 params
Parameter Type Default Description Condition
Temperature
temperature
number (0…1 step 0.1) 0.6 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Not when do_sample = false
Top P
top_p
number (0.01…1 step 0.01) 0.95 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Not when do_sample = false
Do sample
do_sample
boolean true When false, the model uses greedy decoding and ignores temperature and top_p.
Reasoning 1 param
Parameter Type Default Description Condition
Thinking mode
thinking.type
enum (enabled | disabled) "enabled" Toggles the model's extended reasoning before it produces the final answer.
Output 1 param
Parameter Type Default Description Condition
Response format
response_format.type
enum (text | json_object) "text" Forces the response into plain text or a JSON object.
GLM-4.6 Z.ai 6 params
Length 1 param
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…+∞) Maximum number of tokens to generate in the response.
Sampling 3 params
Parameter Type Default Description Condition
Temperature
temperature
number (0…1 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Not when do_sample = false
Top P
top_p
number (0.01…1 step 0.01) 0.95 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Not when do_sample = false
Do sample
do_sample
boolean true When false, the model uses greedy decoding and ignores temperature and top_p.
Reasoning 1 param
Parameter Type Default Description Condition
Thinking mode
thinking.type
enum (enabled | disabled) "enabled" Toggles the model's extended reasoning before it produces the final answer.
Output 1 param
Parameter Type Default Description Condition
Response format
response_format.type
enum (text | json_object) "text" Forces the response into plain text or a JSON object.
GLM-4.6 Z.ai Subscription 6 params
Length 1 param
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…+∞) Maximum number of tokens to generate in the response.
Sampling 3 params
Parameter Type Default Description Condition
Temperature
temperature
number (0…1 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Not when do_sample = false
Top P
top_p
number (0.01…1 step 0.01) 0.95 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Not when do_sample = false
Do sample
do_sample
boolean true When false, the model uses greedy decoding and ignores temperature and top_p.
Reasoning 1 param
Parameter Type Default Description Condition
Thinking mode
thinking.type
enum (enabled | disabled) "enabled" Toggles the model's extended reasoning before it produces the final answer.
Output 1 param
Parameter Type Default Description Condition
Response format
response_format.type
enum (text | json_object) "text" Forces the response into plain text or a JSON object.
GLM-4.7 Z.ai 6 params
Length 1 param
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…+∞) Maximum number of tokens to generate in the response.
Sampling 3 params
Parameter Type Default Description Condition
Temperature
temperature
number (0…1 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Not when do_sample = false
Top P
top_p
number (0.01…1 step 0.01) 0.95 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Not when do_sample = false
Do sample
do_sample
boolean true When false, the model uses greedy decoding and ignores temperature and top_p.
Reasoning 1 param
Parameter Type Default Description Condition
Thinking mode
thinking.type
enum (enabled | disabled) "enabled" Toggles the model's extended reasoning before it produces the final answer.
Output 1 param
Parameter Type Default Description Condition
Response format
response_format.type
enum (text | json_object) "text" Forces the response into plain text or a JSON object.
GLM-4.7-Flash Z.ai 6 params
Length 1 param
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…+∞) Maximum number of tokens to generate in the response.
Sampling 3 params
Parameter Type Default Description Condition
Temperature
temperature
number (0…1 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Not when do_sample = false
Top P
top_p
number (0.01…1 step 0.01) 0.95 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Not when do_sample = false
Do sample
do_sample
boolean true When false, the model uses greedy decoding and ignores temperature and top_p.
Reasoning 1 param
Parameter Type Default Description Condition
Thinking mode
thinking.type
enum (enabled | disabled) "enabled" Toggles the model's extended reasoning before it produces the final answer.
Output 1 param
Parameter Type Default Description Condition
Response format
response_format.type
enum (text | json_object) "text" Forces the response into plain text or a JSON object.
GLM-4.7-FlashX Z.ai 6 params
Length 1 param
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…+∞) Maximum number of tokens to generate in the response.
Sampling 3 params
Parameter Type Default Description Condition
Temperature
temperature
number (0…1 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Not when do_sample = false
Top P
top_p
number (0.01…1 step 0.01) 0.95 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Not when do_sample = false
Do sample
do_sample
boolean true When false, the model uses greedy decoding and ignores temperature and top_p.
Reasoning 1 param
Parameter Type Default Description Condition
Thinking mode
thinking.type
enum (enabled | disabled) "enabled" Toggles the model's extended reasoning before it produces the final answer.
Output 1 param
Parameter Type Default Description Condition
Response format
response_format.type
enum (text | json_object) "text" Forces the response into plain text or a JSON object.
GLM-4.7 Z.ai Subscription 6 params
Length 1 param
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…+∞) Maximum number of tokens to generate in the response.
Sampling 3 params
Parameter Type Default Description Condition
Temperature
temperature
number (0…1 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Not when do_sample = false
Top P
top_p
number (0.01…1 step 0.01) 0.95 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Not when do_sample = false
Do sample
do_sample
boolean true When false, the model uses greedy decoding and ignores temperature and top_p.
Reasoning 1 param
Parameter Type Default Description Condition
Thinking mode
thinking.type
enum (enabled | disabled) "enabled" Toggles the model's extended reasoning before it produces the final answer.
Output 1 param
Parameter Type Default Description Condition
Response format
response_format.type
enum (text | json_object) "text" Forces the response into plain text or a JSON object.
GLM-5 Z.ai 6 params
Length 1 param
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…+∞) Maximum number of tokens to generate in the response.
Sampling 3 params
Parameter Type Default Description Condition
Temperature
temperature
number (0…1 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Not when do_sample = false
Top P
top_p
number (0.01…1 step 0.01) 0.95 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Not when do_sample = false
Do sample
do_sample
boolean true When false, the model uses greedy decoding and ignores temperature and top_p.
Reasoning 1 param
Parameter Type Default Description Condition
Thinking mode
thinking.type
enum (enabled | disabled) "enabled" Toggles the model's extended reasoning before it produces the final answer.
Output 1 param
Parameter Type Default Description Condition
Response format
response_format.type
enum (text | json_object) "text" Forces the response into plain text or a JSON object.
GLM-5 Z.ai Subscription 6 params
Length 1 param
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…+∞) Maximum number of tokens to generate in the response.
Sampling 3 params
Parameter Type Default Description Condition
Temperature
temperature
number (0…1 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Not when do_sample = false
Top P
top_p
number (0.01…1 step 0.01) 0.95 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Not when do_sample = false
Do sample
do_sample
boolean true When false, the model uses greedy decoding and ignores temperature and top_p.
Reasoning 1 param
Parameter Type Default Description Condition
Thinking mode
thinking.type
enum (enabled | disabled) "enabled" Toggles the model's extended reasoning before it produces the final answer.
Output 1 param
Parameter Type Default Description Condition
Response format
response_format.type
enum (text | json_object) "text" Forces the response into plain text or a JSON object.
GLM-5-Turbo Z.ai 6 params
Length 1 param
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…+∞) Maximum number of tokens to generate in the response.
Sampling 3 params
Parameter Type Default Description Condition
Temperature
temperature
number (0…1 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Not when do_sample = false
Top P
top_p
number (0.01…1 step 0.01) 0.95 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Not when do_sample = false
Do sample
do_sample
boolean true When false, the model uses greedy decoding and ignores temperature and top_p.
Reasoning 1 param
Parameter Type Default Description Condition
Thinking mode
thinking.type
enum (enabled | disabled) "enabled" Toggles the model's extended reasoning before it produces the final answer.
Output 1 param
Parameter Type Default Description Condition
Response format
response_format.type
enum (text | json_object) "text" Forces the response into plain text or a JSON object.
GLM-5-Turbo Z.ai Subscription 6 params
Length 1 param
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…+∞) Maximum number of tokens to generate in the response.
Sampling 3 params
Parameter Type Default Description Condition
Temperature
temperature
number (0…1 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Not when do_sample = false
Top P
top_p
number (0.01…1 step 0.01) 0.95 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Not when do_sample = false
Do sample
do_sample
boolean true When false, the model uses greedy decoding and ignores temperature and top_p.
Reasoning 1 param
Parameter Type Default Description Condition
Thinking mode
thinking.type
enum (enabled | disabled) "enabled" Toggles the model's extended reasoning before it produces the final answer.
Output 1 param
Parameter Type Default Description Condition
Response format
response_format.type
enum (text | json_object) "text" Forces the response into plain text or a JSON object.
GLM-5.1 Z.ai 6 params
Length 1 param
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…+∞) Maximum number of tokens to generate in the response.
Sampling 3 params
Parameter Type Default Description Condition
Temperature
temperature
number (0…1 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Not when do_sample = false
Top P
top_p
number (0.01…1 step 0.01) 0.95 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Not when do_sample = false
Do sample
do_sample
boolean true When false, the model uses greedy decoding and ignores temperature and top_p.
Reasoning 1 param
Parameter Type Default Description Condition
Thinking mode
thinking.type
enum (enabled | disabled) "enabled" Toggles the model's extended reasoning before it produces the final answer.
Output 1 param
Parameter Type Default Description Condition
Response format
response_format.type
enum (text | json_object) "text" Forces the response into plain text or a JSON object.
GLM-5.1 Z.ai Subscription 6 params
Length 1 param
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…+∞) Maximum number of tokens to generate in the response.
Sampling 3 params
Parameter Type Default Description Condition
Temperature
temperature
number (0…1 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Not when do_sample = false
Top P
top_p
number (0.01…1 step 0.01) 0.95 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Not when do_sample = false
Do sample
do_sample
boolean true When false, the model uses greedy decoding and ignores temperature and top_p.
Reasoning 1 param
Parameter Type Default Description Condition
Thinking mode
thinking.type
enum (enabled | disabled) "enabled" Toggles the model's extended reasoning before it produces the final answer.
Output 1 param
Parameter Type Default Description Condition
Response format
response_format.type
enum (text | json_object) "text" Forces the response into plain text or a JSON object.

MiniMax

MiniMax M2 MiniMax 4 params
Length 1 param
Parameter Type Default Description Condition
Max completion tokens
max_completion_tokens
integer (1…+∞) Maximum number of tokens to generate in the completion.
Sampling 2 params
Parameter Type Default Description Condition
Temperature
temperature
number (0.01…1 step 0.01) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied. Values must be greater than 0 and at most 1.
Top P
top_p
number (0.01…1 step 0.01) 0.95 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Reasoning 1 param
Parameter Type Default Description Condition
Split reasoning
reasoning_split
boolean false Returns the model's reasoning in a separate reasoning_details field instead of inline with the response.
MiniMax M2 MiniMax Subscription 3 params
Length 1 param
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…+∞) Maximum number of tokens to generate in the response.
Sampling 2 params
Parameter Type Default Description Condition
Temperature
temperature
number (0.01…1 step 0.01) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied. Values must be greater than 0 and at most 1.
Top P
top_p
number (0.01…1 step 0.01) 0.95 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
MiniMax M2.1 MiniMax 4 params
Length 1 param
Parameter Type Default Description Condition
Max completion tokens
max_completion_tokens
integer (1…+∞) Maximum number of tokens to generate in the completion.
Sampling 2 params
Parameter Type Default Description Condition
Temperature
temperature
number (0.01…1 step 0.01) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied. Values must be greater than 0 and at most 1.
Top P
top_p
number (0.01…1 step 0.01) 0.95 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Reasoning 1 param
Parameter Type Default Description Condition
Split reasoning
reasoning_split
boolean false Returns the model's reasoning in a separate reasoning_details field instead of inline with the response.
MiniMax M2.1 Highspeed MiniMax 4 params
Length 1 param
Parameter Type Default Description Condition
Max completion tokens
max_completion_tokens
integer (1…+∞) Maximum number of tokens to generate in the completion.
Sampling 2 params
Parameter Type Default Description Condition
Temperature
temperature
number (0.01…1 step 0.01) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied. Values must be greater than 0 and at most 1.
Top P
top_p
number (0.01…1 step 0.01) 0.95 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Reasoning 1 param
Parameter Type Default Description Condition
Split reasoning
reasoning_split
boolean false Returns the model's reasoning in a separate reasoning_details field instead of inline with the response.
MiniMax M2.1 Highspeed MiniMax Subscription 3 params
Length 1 param
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…+∞) Maximum number of tokens to generate in the response.
Sampling 2 params
Parameter Type Default Description Condition
Temperature
temperature
number (0.01…1 step 0.01) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied. Values must be greater than 0 and at most 1.
Top P
top_p
number (0.01…1 step 0.01) 0.95 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
MiniMax M2.1 MiniMax Subscription 3 params
Length 1 param
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…+∞) Maximum number of tokens to generate in the response.
Sampling 2 params
Parameter Type Default Description Condition
Temperature
temperature
number (0.01…1 step 0.01) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied. Values must be greater than 0 and at most 1.
Top P
top_p
number (0.01…1 step 0.01) 0.95 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
MiniMax M2.5 MiniMax 4 params
Length 1 param
Parameter Type Default Description Condition
Max completion tokens
max_completion_tokens
integer (1…+∞) Maximum number of tokens to generate in the completion.
Sampling 2 params
Parameter Type Default Description Condition
Temperature
temperature
number (0.01…1 step 0.01) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied. Values must be greater than 0 and at most 1.
Top P
top_p
number (0.01…1 step 0.01) 0.95 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Reasoning 1 param
Parameter Type Default Description Condition
Split reasoning
reasoning_split
boolean false Returns the model's reasoning in a separate reasoning_details field instead of inline with the response.
MiniMax M2.5 Highspeed MiniMax 4 params
Length 1 param
Parameter Type Default Description Condition
Max completion tokens
max_completion_tokens
integer (1…+∞) Maximum number of tokens to generate in the completion.
Sampling 2 params
Parameter Type Default Description Condition
Temperature
temperature
number (0.01…1 step 0.01) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied. Values must be greater than 0 and at most 1.
Top P
top_p
number (0.01…1 step 0.01) 0.95 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Reasoning 1 param
Parameter Type Default Description Condition
Split reasoning
reasoning_split
boolean false Returns the model's reasoning in a separate reasoning_details field instead of inline with the response.
MiniMax M2.5 Highspeed MiniMax Subscription 3 params
Length 1 param
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…+∞) Maximum number of tokens to generate in the response.
Sampling 2 params
Parameter Type Default Description Condition
Temperature
temperature
number (0.01…1 step 0.01) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied. Values must be greater than 0 and at most 1.
Top P
top_p
number (0.01…1 step 0.01) 0.95 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
MiniMax M2.5 MiniMax Subscription 3 params
Length 1 param
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…+∞) Maximum number of tokens to generate in the response.
Sampling 2 params
Parameter Type Default Description Condition
Temperature
temperature
number (0.01…1 step 0.01) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied. Values must be greater than 0 and at most 1.
Top P
top_p
number (0.01…1 step 0.01) 0.95 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
MiniMax M2.7 MiniMax 4 params
Length 1 param
Parameter Type Default Description Condition
Max completion tokens
max_completion_tokens
integer (1…+∞) Maximum number of tokens to generate in the completion.
Sampling 2 params
Parameter Type Default Description Condition
Temperature
temperature
number (0.01…1 step 0.01) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied. Values must be greater than 0 and at most 1.
Top P
top_p
number (0.01…1 step 0.01) 0.95 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Reasoning 1 param
Parameter Type Default Description Condition
Split reasoning
reasoning_split
boolean false Returns the model's reasoning in a separate reasoning_details field instead of inline with the response.
MiniMax M2.7 Highspeed MiniMax 4 params
Length 1 param
Parameter Type Default Description Condition
Max completion tokens
max_completion_tokens
integer (1…+∞) Maximum number of tokens to generate in the completion.
Sampling 2 params
Parameter Type Default Description Condition
Temperature
temperature
number (0.01…1 step 0.01) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied. Values must be greater than 0 and at most 1.
Top P
top_p
number (0.01…1 step 0.01) 0.95 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Reasoning 1 param
Parameter Type Default Description Condition
Split reasoning
reasoning_split
boolean false Returns the model's reasoning in a separate reasoning_details field instead of inline with the response.
MiniMax M2.7 Highspeed MiniMax Subscription 3 params
Length 1 param
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…+∞) Maximum number of tokens to generate in the response.
Sampling 2 params
Parameter Type Default Description Condition
Temperature
temperature
number (0.01…1 step 0.01) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied. Values must be greater than 0 and at most 1.
Top P
top_p
number (0.01…1 step 0.01) 0.95 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
MiniMax M2.7 MiniMax Subscription 3 params
Length 1 param
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…+∞) Maximum number of tokens to generate in the response.
Sampling 2 params
Parameter Type Default Description Condition
Temperature
temperature
number (0.01…1 step 0.01) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied. Values must be greater than 0 and at most 1.
Top P
top_p
number (0.01…1 step 0.01) 0.95 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Minimax M3 MiniMax 4 params
Length 1 param
Parameter Type Default Description Condition
Max completion tokens
max_completion_tokens
integer (1…+∞) Maximum number of tokens to generate in the completion.
Sampling 2 params
Parameter Type Default Description Condition
Temperature
temperature
number (0.01…1 step 0.01) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied. Values must be greater than 0 and at most 1.
Top P
top_p
number (0.01…1 step 0.01) 0.95 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Reasoning 1 param
Parameter Type Default Description Condition
Split reasoning
reasoning_split
boolean false Returns the model's reasoning in a separate reasoning_details field instead of inline with the response.
MiniMax M3 MiniMax Subscription 3 params
Length 1 param
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…+∞) Maximum number of tokens to generate in the response.
Sampling 2 params
Parameter Type Default Description Condition
Temperature
temperature
number (0.01…1 step 0.01) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied. Values must be greater than 0 and at most 1.
Top P
top_p
number (0.01…1 step 0.01) 0.95 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.

Nvidia

Gliner Pii Nvidia 4 params
Sampling 1 param
Parameter Type Default Description Condition
Threshold
threshold
number (0…1) 0.5 Confidence threshold for entity detection. Lower values detect more entities but may include false positives.
Metadata 3 params
Parameter Type Default Description Condition
Chunk length
chunk_length
integer (1…2048) 384 Context window size for processing. Longer texts are automatically split into chunks with overlap for complete coverage. Must be greater than overlap.
Overlap
overlap
integer (0…512) 128 Token overlap between chunks to prevent entity clipping. Must be less than chunk_length.
Flat NER
flat_ner
boolean false When true, prevents overlapping entity spans. When false, may return nested entities such as both a full name and its constituent first name.
Llama 3.1 Nemoguard 8b Topic Control Nvidia 6 params
Length 2 params
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…+∞) 1024 Maximum number of tokens to generate. Generation stops when this limit is reached.
Stop
stop
string A string or list of strings where the API will stop generating further tokens. The returned text will not contain the stop sequence.
Sampling 4 params
Parameter Type Default Description Condition
Temperature
temperature
number (0…2) 0.5 Controls randomness. Lower values make outputs more focused; higher values make them more varied. Not recommended to modify both temperature and top_p in the same call.
Top P
top_p
number (-∞…1) 1 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability. Not recommended to modify both temperature and top_p in the same call.
Frequency penalty
frequency_penalty
number (-2…2) 0 Penalizes new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
Presence penalty
presence_penalty
number (-2…2) 0 Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
Llama 3.1 Nemotron Nano 8b V1 Nvidia 7 params
Length 2 params
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…16384) 4096 Maximum number of tokens to generate. Generation stops when this limit is reached.
Stop
stop
string A string or list of strings where the API will stop generating further tokens. The returned text will not contain the stop sequence.
Sampling 5 params
Parameter Type Default Description Condition
Temperature
temperature
number (0…1) 0.6 Controls randomness. Lower values make outputs more focused; higher values make them more varied. Not recommended to modify both temperature and top_p in the same call.
Top P
top_p
number (-∞…1) 0.95 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability. Not recommended to modify both temperature and top_p in the same call.
Frequency penalty
frequency_penalty
number (-2…2) 0 Penalizes new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
Presence penalty
presence_penalty
number (-2…2) 0 Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
Seed
seed
integer (0…18446744073709552000) 0 Best-effort deterministic sampling seed. Changing the seed produces a different response with similar characteristics. Fix the seed to reproduce results.
Llama 3.1 Nemotron Safety Guard 8b V3 Nvidia 1 param
Sampling 1 param
Parameter Type Default Description Condition
Temperature
temperature
number (0…1) 0 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Llama 3.1 Nemotron Ultra 253b V1 Nvidia 7 params
Length 2 params
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…16384) 4096 Maximum number of tokens to generate. Generation stops when this limit is reached.
Stop
stop
string A string or list of strings where the API will stop generating further tokens. The returned text will not contain the stop sequence.
Sampling 5 params
Parameter Type Default Description Condition
Temperature
temperature
number (0…1) 0.6 Controls randomness. Lower values make outputs more focused; higher values make them more varied. Not recommended to modify both temperature and top_p in the same call.
Top P
top_p
number (-∞…1) 0.95 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability. Not recommended to modify both temperature and top_p in the same call.
Frequency penalty
frequency_penalty
number (-2…2) 0 Penalizes new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
Presence penalty
presence_penalty
number (-2…2) 0 Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
Seed
seed
integer (0…18446744073709552000) 0 Best-effort deterministic sampling seed. Changing the seed produces a different response with similar characteristics. Fix the seed to reproduce results.
Llama 3.3 Nemotron Super 49b V1 Nvidia 7 params
Length 2 params
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…16384) 4096 Maximum number of tokens to generate. Generation stops when this limit is reached.
Stop
stop
string A string or list of strings where the API will stop generating further tokens. The returned text will not contain the stop sequence.
Sampling 5 params
Parameter Type Default Description Condition
Temperature
temperature
number (0…1) 0.6 Controls randomness. Lower values make outputs more focused; higher values make them more varied. Not recommended to modify both temperature and top_p in the same call.
Top P
top_p
number (-∞…1) 0.95 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability. Not recommended to modify both temperature and top_p in the same call.
Frequency penalty
frequency_penalty
number (-2…2) 0 Penalizes new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
Presence penalty
presence_penalty
number (-2…2) 0 Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
Seed
seed
integer (0…18446744073709552000) 0 Best-effort deterministic sampling seed. Changing the seed produces a different response with similar characteristics. Fix the seed to reproduce results.
Llama 3.3 Nemotron Super 49b V1.5 Nvidia 7 params
Length 2 params
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…65536) 65536 Maximum number of tokens to generate. Generation stops when this limit is reached.
Stop
stop
string A string or list of strings where the API will stop generating further tokens. The returned text will not contain the stop sequence.
Sampling 5 params
Parameter Type Default Description Condition
Temperature
temperature
number (0…1) 0.6 Controls randomness. Lower values make outputs more focused; higher values make them more varied. Not recommended to modify both temperature and top_p in the same call.
Top P
top_p
number (-∞…1) 0.95 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability. Not recommended to modify both temperature and top_p in the same call.
Frequency penalty
frequency_penalty
number (-2…2) 0 Penalizes new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
Presence penalty
presence_penalty
number (-2…2) 0 Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
Seed
seed
integer (0…18446744073709552000) 0 Best-effort deterministic sampling seed. Changing the seed produces a different response with similar characteristics. Fix the seed to reproduce results.
Nemoguard Jailbreak Detect Nvidia 0 params

No parameters documented yet.

Nemotron 3 Nano 30b A3b Nvidia 5 params
Length 2 params
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…32768) 16384 Maximum number of tokens to generate. Generation stops when this limit is reached.
Stop
stop
string A string or list of strings where the API will stop generating further tokens. The returned text will not contain the stop sequence.
Sampling 3 params
Parameter Type Default Description Condition
Temperature
temperature
number (-∞…1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied. Not recommended to modify both temperature and top_p in the same call.
Top P
top_p
number (-∞…1) 1 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability. Not recommended to modify both temperature and top_p in the same call.
Seed
seed
integer (0…18446744073709552000) Best-effort deterministic sampling seed. Repeated requests with the same seed and parameters should return the same result.
Nemotron 3 Super 120b A12b Nvidia 7 params
Length 2 params
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…32768) 16384 Maximum number of tokens to generate. Generation stops when this limit is reached.
Stop
stop
string A string or list of strings where the API will stop generating further tokens. The returned text will not contain the stop sequence.
Sampling 3 params
Parameter Type Default Description Condition
Temperature
temperature
number (-∞…1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied. Not recommended to modify both temperature and top_p in the same call.
Top P
top_p
number (-∞…1) 0.95 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability. Not recommended to modify both temperature and top_p in the same call.
Seed
seed
integer (0…18446744073709552000) Best-effort deterministic sampling seed. Repeated requests with the same seed and parameters should return the same result.
Reasoning 2 params
Parameter Type Default Description Condition
Reasoning effort
reasoning_effort
enum (none | low | high) "high" Controls the reasoning mode. 'none' disables reasoning tokens, 'low' enables low-effort reasoning, and 'high' enables full reasoning.
Reasoning budget
reasoning_budget
integer (-1…32768) 16384 Maximum number of tokens the model may use for internal reasoning before being forced to end the reasoning trace. Use -1 to disable budget enforcement.
Nemotron 3 Ultra 550b A55b Nvidia 7 params
Length 2 params
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…32768) 16384 Maximum number of tokens to generate. Generation stops when this limit is reached.
Stop
stop
string A string or list of strings where the API will stop generating further tokens. The returned text will not contain the stop sequence.
Sampling 3 params
Parameter Type Default Description Condition
Temperature
temperature
number (-∞…1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied. Not recommended to modify both temperature and top_p in the same call.
Top P
top_p
number (-∞…1) 0.95 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability. Not recommended to modify both temperature and top_p in the same call.
Seed
seed
integer (0…18446744073709552000) Best-effort deterministic sampling seed. Repeated requests with the same seed and parameters should return the same result.
Reasoning 2 params
Parameter Type Default Description Condition
Reasoning effort
reasoning_effort
enum (none | medium | high) "high" Controls the reasoning mode. 'none' disables reasoning tokens, 'medium' enables efficient reasoning, and 'high' enables full reasoning.
Reasoning budget
reasoning_budget
integer (-1…32768) 16384 Maximum number of tokens the model may use for internal reasoning before being forced to end the reasoning trace. Use -1 to disable budget enforcement.
Nemotron Content Safety Reasoning 4b Nvidia 5 params
Length 2 params
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…32768) 16384 Maximum number of tokens to generate. Generation stops when this limit is reached.
Stop
stop
string A string or list of strings where the API will stop generating further tokens. The returned text will not contain the stop sequence.
Sampling 3 params
Parameter Type Default Description Condition
Temperature
temperature
number (-∞…1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied. Not recommended to modify both temperature and top_p in the same call.
Top P
top_p
number (-∞…1) 1 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability. Not recommended to modify both temperature and top_p in the same call.
Seed
seed
integer (0…18446744073709552000) Best-effort deterministic sampling seed. Repeated requests with the same seed and parameters should return the same result.
Nemotron Mini 4b Instruct Nvidia 6 params
Length 2 params
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…4096) 1024 Maximum number of tokens to generate. Generation stops when this limit is reached.
Stop
stop
string A string or list of strings where the API will stop generating further tokens. The returned text will not contain the stop sequence.
Sampling 4 params
Parameter Type Default Description Condition
Temperature
temperature
number (0…1) 0.2 Controls randomness. Lower values make outputs more focused; higher values make them more varied. Not recommended to modify both temperature and top_p in the same call.
Top P
top_p
number (-∞…1) 0.7 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability. Not recommended to modify both temperature and top_p in the same call.
Frequency penalty
frequency_penalty
number (-2…2) 0 Penalizes new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
Presence penalty
presence_penalty
number (-2…2) 0 Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
Riva Translate 4b Instruct V1.1 Nvidia 6 params
Length 2 params
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…4096) 512 Maximum number of tokens to generate. Generation stops when this limit is reached.
Stop
stop
string A string or list of strings where the API will stop generating further tokens. The returned text will not contain the stop sequence.
Sampling 4 params
Parameter Type Default Description Condition
Temperature
temperature
number (0…1) 0 Controls randomness. Lower values make outputs more focused; higher values make them more varied. Not recommended to modify both temperature and top_p in the same call.
Top P
top_p
number (-∞…1) 0.9 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability. Not recommended to modify both temperature and top_p in the same call.
Frequency penalty
frequency_penalty
number (-2…2) 0 Penalizes new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
Presence penalty
presence_penalty
number (-2…2) 0 Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
Usdcode Llama 3.1 70b Instruct Nvidia 4 params
Length 1 param
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…2048) 1024 Maximum number of tokens to generate. Generation stops when this limit is reached.
Sampling 2 params
Parameter Type Default Description Condition
Temperature
temperature
number (0…1) 0.1 Controls randomness. Lower values make outputs more focused; higher values make them more varied. Not recommended to modify both temperature and top_p in the same call.
Top P
top_p
number (-∞…1) 1 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability. Not recommended to modify both temperature and top_p in the same call.
Metadata 1 param
Parameter Type Default Description Condition
Expert type
expert_type
enum (auto | code | knowledge | helperfunction) "auto" The type of expert to use. 'knowledge' answers with USD knowledge, 'code' responds with vanilla OpenUSD code, 'helperfunction' uses high-level helper functions, and 'auto' lets the LLM determine which expert to use.

Mistral

Codestral Latest Mistral 9 params
Length 2 params
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…+∞) Maximum number of tokens to generate in the completion.
Stop sequence
stop
string Stops generation when this string is detected.
Sampling 5 params
Parameter Type Default Description Condition
Temperature
temperature
number (0…1.5 step 0.1) Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Random seed
random_seed
integer (0…+∞) Seed used for deterministic sampling when reproducible outputs are desired.
Presence penalty
presence_penalty
number (-2…2 step 0.1) 0 Penalizes repeated words or phrases to encourage a wider variety of generated content.
Frequency penalty
frequency_penalty
number (-2…2 step 0.1) 0 Penalizes words based on how often they already appear in the generated text.
Output 1 param
Parameter Type Default Description Condition
Response format
response_format.type
enum (text | json_object) "text" Controls whether the model returns normal text or JSON mode output.
Metadata 1 param
Parameter Type Default Description Condition
Safe prompt
safe_prompt
boolean false Controls whether Mistral injects its safety prompt before the conversation.
Devstral 2512 Mistral 9 params
Length 2 params
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…+∞) Maximum number of tokens to generate in the completion.
Stop sequence
stop
string Stops generation when this string is detected.
Sampling 5 params
Parameter Type Default Description Condition
Temperature
temperature
number (0…1.5 step 0.1) Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Random seed
random_seed
integer (0…+∞) Seed used for deterministic sampling when reproducible outputs are desired.
Presence penalty
presence_penalty
number (-2…2 step 0.1) 0 Penalizes repeated words or phrases to encourage a wider variety of generated content.
Frequency penalty
frequency_penalty
number (-2…2 step 0.1) 0 Penalizes words based on how often they already appear in the generated text.
Output 1 param
Parameter Type Default Description Condition
Response format
response_format.type
enum (text | json_object) "text" Controls whether the model returns normal text or JSON mode output.
Metadata 1 param
Parameter Type Default Description Condition
Safe prompt
safe_prompt
boolean false Controls whether Mistral injects its safety prompt before the conversation.
Devstral Latest Mistral 9 params
Length 2 params
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…+∞) Maximum number of tokens to generate in the completion.
Stop sequence
stop
string Stops generation when this string is detected.
Sampling 5 params
Parameter Type Default Description Condition
Temperature
temperature
number (0…1.5 step 0.1) Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Random seed
random_seed
integer (0…+∞) Seed used for deterministic sampling when reproducible outputs are desired.
Presence penalty
presence_penalty
number (-2…2 step 0.1) 0 Penalizes repeated words or phrases to encourage a wider variety of generated content.
Frequency penalty
frequency_penalty
number (-2…2 step 0.1) 0 Penalizes words based on how often they already appear in the generated text.
Output 1 param
Parameter Type Default Description Condition
Response format
response_format.type
enum (text | json_object) "text" Controls whether the model returns normal text or JSON mode output.
Metadata 1 param
Parameter Type Default Description Condition
Safe prompt
safe_prompt
boolean false Controls whether Mistral injects its safety prompt before the conversation.
Magistral Medium Latest Mistral 10 params
Length 2 params
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…+∞) Maximum number of tokens to generate in the completion.
Stop sequence
stop
string Stops generation when this string is detected.
Sampling 5 params
Parameter Type Default Description Condition
Temperature
temperature
number (0…1.5 step 0.1) Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Random seed
random_seed
integer (0…+∞) Seed used for deterministic sampling when reproducible outputs are desired.
Presence penalty
presence_penalty
number (-2…2 step 0.1) 0 Penalizes repeated words or phrases to encourage a wider variety of generated content.
Frequency penalty
frequency_penalty
number (-2…2 step 0.1) 0 Penalizes words based on how often they already appear in the generated text.
Reasoning 1 param
Parameter Type Default Description Condition
Prompt mode
prompt_mode
enum (reasoning) Enables Mistral's reasoning system prompt; leave unset to disable the default reasoning behavior.
Output 1 param
Parameter Type Default Description Condition
Response format
response_format.type
enum (text | json_object) "text" Controls whether the model returns normal text or JSON mode output.
Metadata 1 param
Parameter Type Default Description Condition
Safe prompt
safe_prompt
boolean false Controls whether Mistral injects its safety prompt before the conversation.
Magistral Small Latest Mistral 10 params
Length 2 params
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…+∞) Maximum number of tokens to generate in the completion.
Stop sequence
stop
string Stops generation when this string is detected.
Sampling 5 params
Parameter Type Default Description Condition
Temperature
temperature
number (0…1.5 step 0.1) Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Random seed
random_seed
integer (0…+∞) Seed used for deterministic sampling when reproducible outputs are desired.
Presence penalty
presence_penalty
number (-2…2 step 0.1) 0 Penalizes repeated words or phrases to encourage a wider variety of generated content.
Frequency penalty
frequency_penalty
number (-2…2 step 0.1) 0 Penalizes words based on how often they already appear in the generated text.
Reasoning 1 param
Parameter Type Default Description Condition
Prompt mode
prompt_mode
enum (reasoning) Enables Mistral's reasoning system prompt; leave unset to disable the default reasoning behavior.
Output 1 param
Parameter Type Default Description Condition
Response format
response_format.type
enum (text | json_object) "text" Controls whether the model returns normal text or JSON mode output.
Metadata 1 param
Parameter Type Default Description Condition
Safe prompt
safe_prompt
boolean false Controls whether Mistral injects its safety prompt before the conversation.
Ministral 14b Latest Mistral 9 params
Length 2 params
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…+∞) Maximum number of tokens to generate in the completion.
Stop sequence
stop
string Stops generation when this string is detected.
Sampling 5 params
Parameter Type Default Description Condition
Temperature
temperature
number (0…1.5 step 0.1) Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Random seed
random_seed
integer (0…+∞) Seed used for deterministic sampling when reproducible outputs are desired.
Presence penalty
presence_penalty
number (-2…2 step 0.1) 0 Penalizes repeated words or phrases to encourage a wider variety of generated content.
Frequency penalty
frequency_penalty
number (-2…2 step 0.1) 0 Penalizes words based on how often they already appear in the generated text.
Output 1 param
Parameter Type Default Description Condition
Response format
response_format.type
enum (text | json_object) "text" Controls whether the model returns normal text or JSON mode output.
Metadata 1 param
Parameter Type Default Description Condition
Safe prompt
safe_prompt
boolean false Controls whether Mistral injects its safety prompt before the conversation.
Ministral 3b Latest Mistral 9 params
Length 2 params
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…+∞) Maximum number of tokens to generate in the completion.
Stop sequence
stop
string Stops generation when this string is detected.
Sampling 5 params
Parameter Type Default Description Condition
Temperature
temperature
number (0…1.5 step 0.1) Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Random seed
random_seed
integer (0…+∞) Seed used for deterministic sampling when reproducible outputs are desired.
Presence penalty
presence_penalty
number (-2…2 step 0.1) 0 Penalizes repeated words or phrases to encourage a wider variety of generated content.
Frequency penalty
frequency_penalty
number (-2…2 step 0.1) 0 Penalizes words based on how often they already appear in the generated text.
Output 1 param
Parameter Type Default Description Condition
Response format
response_format.type
enum (text | json_object) "text" Controls whether the model returns normal text or JSON mode output.
Metadata 1 param
Parameter Type Default Description Condition
Safe prompt
safe_prompt
boolean false Controls whether Mistral injects its safety prompt before the conversation.
Ministral 8b Latest Mistral 9 params
Length 2 params
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…+∞) Maximum number of tokens to generate in the completion.
Stop sequence
stop
string Stops generation when this string is detected.
Sampling 5 params
Parameter Type Default Description Condition
Temperature
temperature
number (0…1.5 step 0.1) Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Random seed
random_seed
integer (0…+∞) Seed used for deterministic sampling when reproducible outputs are desired.
Presence penalty
presence_penalty
number (-2…2 step 0.1) 0 Penalizes repeated words or phrases to encourage a wider variety of generated content.
Frequency penalty
frequency_penalty
number (-2…2 step 0.1) 0 Penalizes words based on how often they already appear in the generated text.
Output 1 param
Parameter Type Default Description Condition
Response format
response_format.type
enum (text | json_object) "text" Controls whether the model returns normal text or JSON mode output.
Metadata 1 param
Parameter Type Default Description Condition
Safe prompt
safe_prompt
boolean false Controls whether Mistral injects its safety prompt before the conversation.
Mistral Large Latest Mistral 9 params
Length 2 params
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…+∞) Maximum number of tokens to generate in the completion.
Stop sequence
stop
string Stops generation when this string is detected.
Sampling 5 params
Parameter Type Default Description Condition
Temperature
temperature
number (0…1.5 step 0.1) Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Random seed
random_seed
integer (0…+∞) Seed used for deterministic sampling when reproducible outputs are desired.
Presence penalty
presence_penalty
number (-2…2 step 0.1) 0 Penalizes repeated words or phrases to encourage a wider variety of generated content.
Frequency penalty
frequency_penalty
number (-2…2 step 0.1) 0 Penalizes words based on how often they already appear in the generated text.
Output 1 param
Parameter Type Default Description Condition
Response format
response_format.type
enum (text | json_object) "text" Controls whether the model returns normal text or JSON mode output.
Metadata 1 param
Parameter Type Default Description Condition
Safe prompt
safe_prompt
boolean false Controls whether Mistral injects its safety prompt before the conversation.
Mistral Medium 3.5 Mistral 9 params
Length 2 params
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…+∞) Maximum number of tokens to generate in the completion.
Stop sequence
stop
string Stops generation when this string is detected.
Sampling 5 params
Parameter Type Default Description Condition
Temperature
temperature
number (0…1.5 step 0.1) Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Random seed
random_seed
integer (0…+∞) Seed used for deterministic sampling when reproducible outputs are desired.
Presence penalty
presence_penalty
number (-2…2 step 0.1) 0 Penalizes repeated words or phrases to encourage a wider variety of generated content.
Frequency penalty
frequency_penalty
number (-2…2 step 0.1) 0 Penalizes words based on how often they already appear in the generated text.
Output 1 param
Parameter Type Default Description Condition
Response format
response_format.type
enum (text | json_object) "text" Controls whether the model returns normal text or JSON mode output.
Metadata 1 param
Parameter Type Default Description Condition
Safe prompt
safe_prompt
boolean false Controls whether Mistral injects its safety prompt before the conversation.
Mistral Medium Latest Mistral 9 params
Length 2 params
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…+∞) Maximum number of tokens to generate in the completion.
Stop sequence
stop
string Stops generation when this string is detected.
Sampling 5 params
Parameter Type Default Description Condition
Temperature
temperature
number (0…1.5 step 0.1) Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Random seed
random_seed
integer (0…+∞) Seed used for deterministic sampling when reproducible outputs are desired.
Presence penalty
presence_penalty
number (-2…2 step 0.1) 0 Penalizes repeated words or phrases to encourage a wider variety of generated content.
Frequency penalty
frequency_penalty
number (-2…2 step 0.1) 0 Penalizes words based on how often they already appear in the generated text.
Output 1 param
Parameter Type Default Description Condition
Response format
response_format.type
enum (text | json_object) "text" Controls whether the model returns normal text or JSON mode output.
Metadata 1 param
Parameter Type Default Description Condition
Safe prompt
safe_prompt
boolean false Controls whether Mistral injects its safety prompt before the conversation.
Mistral Small Latest Mistral 9 params
Length 2 params
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…+∞) Maximum number of tokens to generate in the completion.
Stop sequence
stop
string Stops generation when this string is detected.
Sampling 5 params
Parameter Type Default Description Condition
Temperature
temperature
number (0…1.5 step 0.1) Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Random seed
random_seed
integer (0…+∞) Seed used for deterministic sampling when reproducible outputs are desired.
Presence penalty
presence_penalty
number (-2…2 step 0.1) 0 Penalizes repeated words or phrases to encourage a wider variety of generated content.
Frequency penalty
frequency_penalty
number (-2…2 step 0.1) 0 Penalizes words based on how often they already appear in the generated text.
Output 1 param
Parameter Type Default Description Condition
Response format
response_format.type
enum (text | json_object) "text" Controls whether the model returns normal text or JSON mode output.
Metadata 1 param
Parameter Type Default Description Condition
Safe prompt
safe_prompt
boolean false Controls whether Mistral injects its safety prompt before the conversation.
Open Mistral Nemo Mistral 9 params
Length 2 params
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…+∞) Maximum number of tokens to generate in the completion.
Stop sequence
stop
string Stops generation when this string is detected.
Sampling 5 params
Parameter Type Default Description Condition
Temperature
temperature
number (0…1.5 step 0.1) Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Random seed
random_seed
integer (0…+∞) Seed used for deterministic sampling when reproducible outputs are desired.
Presence penalty
presence_penalty
number (-2…2 step 0.1) 0 Penalizes repeated words or phrases to encourage a wider variety of generated content.
Frequency penalty
frequency_penalty
number (-2…2 step 0.1) 0 Penalizes words based on how often they already appear in the generated text.
Output 1 param
Parameter Type Default Description Condition
Response format
response_format.type
enum (text | json_object) "text" Controls whether the model returns normal text or JSON mode output.
Metadata 1 param
Parameter Type Default Description Condition
Safe prompt
safe_prompt
boolean false Controls whether Mistral injects its safety prompt before the conversation.

Google

Gemini 2.5 Flash Google 8 params
Length 1 param
Parameter Type Default Description Condition
Max output tokens
generationConfig.maxOutputTokens
integer (1…65536) Maximum number of tokens to include in a response candidate.
Sampling 4 params
Parameter Type Default Description Condition
Temperature
generationConfig.temperature
number (0…2 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
generationConfig.topP
number (0…1 step 0.01) 0.95 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Top K
generationConfig.topK
integer (0…+∞) 64 Limits token sampling to the top K most likely next tokens.
Seed
generationConfig.seed
integer Optional seed used for decoding when reproducible sampling is desired.
Reasoning 2 params
Parameter Type Default Description Condition
Thinking budget
generationConfig.thinkingConfig.thinkingBudget
integer (-1…24576) -1 Number of thinking tokens Gemini should use; 0 disables thinking and -1 uses dynamic thinking.
Include thoughts
generationConfig.thinkingConfig.includeThoughts
boolean false Controls whether Gemini returns available thought summaries in the response parts.
Output 1 param
Parameter Type Default Description Condition
Response MIME type
generationConfig.responseMimeType
enum (text/plain | application/json) "text/plain" MIME type for generated text candidates.
Gemini 2.5 Flash Lite Google 8 params
Length 1 param
Parameter Type Default Description Condition
Max output tokens
generationConfig.maxOutputTokens
integer (1…65536) Maximum number of tokens to include in a response candidate.
Sampling 4 params
Parameter Type Default Description Condition
Temperature
generationConfig.temperature
number (0…2 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
generationConfig.topP
number (0…1 step 0.01) 0.95 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Top K
generationConfig.topK
integer (0…+∞) 64 Limits token sampling to the top K most likely next tokens.
Seed
generationConfig.seed
integer Optional seed used for decoding when reproducible sampling is desired.
Reasoning 2 params
Parameter Type Default Description Condition
Thinking budget
generationConfig.thinkingConfig.thinkingBudget
integer 0 Number of thinking tokens Gemini should use; -1 uses dynamic thinking, 0 disables thinking, and fixed budgets start at 512 tokens.
Include thoughts
generationConfig.thinkingConfig.includeThoughts
boolean false Controls whether Gemini returns available thought summaries in the response parts.
Output 1 param
Parameter Type Default Description Condition
Response MIME type
generationConfig.responseMimeType
enum (text/plain | application/json) "text/plain" MIME type for generated text candidates.
Gemini 2.5 Flash Lite Google Subscription 8 params
Length 1 param
Parameter Type Default Description Condition
Max output tokens
generationConfig.maxOutputTokens
integer (1…65536) Maximum number of tokens to include in a response candidate.
Sampling 4 params
Parameter Type Default Description Condition
Temperature
generationConfig.temperature
number (0…2 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
generationConfig.topP
number (0…1 step 0.01) 0.95 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Top K
generationConfig.topK
integer (0…+∞) 64 Limits token sampling to the top K most likely next tokens.
Seed
generationConfig.seed
integer Optional seed used for decoding when reproducible sampling is desired.
Reasoning 2 params
Parameter Type Default Description Condition
Thinking budget
generationConfig.thinkingConfig.thinkingBudget
integer 0 Number of thinking tokens Gemini should use; -1 uses dynamic thinking, 0 disables thinking, and fixed budgets start at 512 tokens.
Include thoughts
generationConfig.thinkingConfig.includeThoughts
boolean false Controls whether Gemini returns available thought summaries in the response parts.
Output 1 param
Parameter Type Default Description Condition
Response MIME type
generationConfig.responseMimeType
enum (text/plain | application/json) "text/plain" MIME type for generated text candidates.
Gemini 2.5 Flash Google Subscription 8 params
Length 1 param
Parameter Type Default Description Condition
Max output tokens
generationConfig.maxOutputTokens
integer (1…65536) Maximum number of tokens to include in a response candidate.
Sampling 4 params
Parameter Type Default Description Condition
Temperature
generationConfig.temperature
number (0…2 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
generationConfig.topP
number (0…1 step 0.01) 0.95 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Top K
generationConfig.topK
integer (0…+∞) 64 Limits token sampling to the top K most likely next tokens.
Seed
generationConfig.seed
integer Optional seed used for decoding when reproducible sampling is desired.
Reasoning 2 params
Parameter Type Default Description Condition
Thinking budget
generationConfig.thinkingConfig.thinkingBudget
integer (-1…24576) -1 Number of thinking tokens Gemini should use; 0 disables thinking and -1 uses dynamic thinking.
Include thoughts
generationConfig.thinkingConfig.includeThoughts
boolean false Controls whether Gemini returns available thought summaries in the response parts.
Output 1 param
Parameter Type Default Description Condition
Response MIME type
generationConfig.responseMimeType
enum (text/plain | application/json) "text/plain" MIME type for generated text candidates.
Gemini 2.5 Pro Google 8 params
Length 1 param
Parameter Type Default Description Condition
Max output tokens
generationConfig.maxOutputTokens
integer (1…65536) Maximum number of tokens to include in a response candidate.
Sampling 4 params
Parameter Type Default Description Condition
Temperature
generationConfig.temperature
number (0…2 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
generationConfig.topP
number (0…1 step 0.01) 0.95 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Top K
generationConfig.topK
integer (0…+∞) 64 Limits token sampling to the top K most likely next tokens.
Seed
generationConfig.seed
integer Optional seed used for decoding when reproducible sampling is desired.
Reasoning 2 params
Parameter Type Default Description Condition
Thinking budget
generationConfig.thinkingConfig.thinkingBudget
integer (128…32768) Maximum number of thinking tokens Gemini should use before producing the final answer.
Include thoughts
generationConfig.thinkingConfig.includeThoughts
boolean false Controls whether Gemini returns available thought summaries in the response parts.
Output 1 param
Parameter Type Default Description Condition
Response MIME type
generationConfig.responseMimeType
enum (text/plain | application/json) "text/plain" MIME type for generated text candidates.
Gemini 2.5 Pro Google Subscription 8 params
Length 1 param
Parameter Type Default Description Condition
Max output tokens
generationConfig.maxOutputTokens
integer (1…65536) Maximum number of tokens to include in a response candidate.
Sampling 4 params
Parameter Type Default Description Condition
Temperature
generationConfig.temperature
number (0…2 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
generationConfig.topP
number (0…1 step 0.01) 0.95 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Top K
generationConfig.topK
integer (0…+∞) 64 Limits token sampling to the top K most likely next tokens.
Seed
generationConfig.seed
integer Optional seed used for decoding when reproducible sampling is desired.
Reasoning 2 params
Parameter Type Default Description Condition
Thinking budget
generationConfig.thinkingConfig.thinkingBudget
integer (128…32768) Maximum number of thinking tokens Gemini should use before producing the final answer.
Include thoughts
generationConfig.thinkingConfig.includeThoughts
boolean false Controls whether Gemini returns available thought summaries in the response parts.
Output 1 param
Parameter Type Default Description Condition
Response MIME type
generationConfig.responseMimeType
enum (text/plain | application/json) "text/plain" MIME type for generated text candidates.
Gemini 3 Flash Preview Google Subscription 8 params
Length 1 param
Parameter Type Default Description Condition
Max output tokens
generationConfig.maxOutputTokens
integer (1…65536) Maximum number of tokens to include in a response candidate.
Sampling 4 params
Parameter Type Default Description Condition
Temperature
generationConfig.temperature
number (0…2 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
generationConfig.topP
number (0…1 step 0.01) 0.95 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Top K
generationConfig.topK
integer (0…+∞) 64 Limits token sampling to the top K most likely next tokens.
Seed
generationConfig.seed
integer Optional seed used for decoding when reproducible sampling is desired.
Reasoning 2 params
Parameter Type Default Description Condition
Thinking level
generationConfig.thinkingConfig.thinkingLevel
enum (minimal | low | medium | high) "high" Controls Gemini 3 Flash reasoning effort.
Include thoughts
generationConfig.thinkingConfig.includeThoughts
boolean false Controls whether Gemini returns available thought summaries in the response parts.
Output 1 param
Parameter Type Default Description Condition
Response MIME type
generationConfig.responseMimeType
enum (text/plain | application/json) "text/plain" MIME type for generated text candidates.
Gemini 3.1 Flash Lite Preview Google Subscription 8 params
Length 1 param
Parameter Type Default Description Condition
Max output tokens
generationConfig.maxOutputTokens
integer (1…65536) Maximum number of tokens to include in a response candidate.
Sampling 4 params
Parameter Type Default Description Condition
Temperature
generationConfig.temperature
number (0…2 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
generationConfig.topP
number (0…1 step 0.01) 0.95 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Top K
generationConfig.topK
integer (0…+∞) 64 Limits token sampling to the top K most likely next tokens.
Seed
generationConfig.seed
integer Optional seed used for decoding when reproducible sampling is desired.
Reasoning 2 params
Parameter Type Default Description Condition
Thinking level
generationConfig.thinkingConfig.thinkingLevel
enum (minimal | low | medium | high) "high" Controls Gemini 3.1 Flash-Lite reasoning effort.
Include thoughts
generationConfig.thinkingConfig.includeThoughts
boolean false Controls whether Gemini returns available thought summaries in the response parts.
Output 1 param
Parameter Type Default Description Condition
Response MIME type
generationConfig.responseMimeType
enum (text/plain | application/json) "text/plain" MIME type for generated text candidates.
Gemini 3.1 Flash Lite Google Subscription 8 params
Length 1 param
Parameter Type Default Description Condition
Max output tokens
generationConfig.maxOutputTokens
integer (1…65536) Maximum number of tokens to include in a response candidate.
Sampling 4 params
Parameter Type Default Description Condition
Temperature
generationConfig.temperature
number (0…2 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
generationConfig.topP
number (0…1 step 0.01) 0.95 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Top K
generationConfig.topK
integer (0…+∞) 64 Limits token sampling to the top K most likely next tokens.
Seed
generationConfig.seed
integer Optional seed used for decoding when reproducible sampling is desired.
Reasoning 2 params
Parameter Type Default Description Condition
Thinking level
generationConfig.thinkingConfig.thinkingLevel
enum (minimal | low | medium | high) "high" Controls Gemini 3.1 Flash-Lite reasoning effort.
Include thoughts
generationConfig.thinkingConfig.includeThoughts
boolean false Controls whether Gemini returns available thought summaries in the response parts.
Output 1 param
Parameter Type Default Description Condition
Response MIME type
generationConfig.responseMimeType
enum (text/plain | application/json) "text/plain" MIME type for generated text candidates.
Gemini 3.1 Pro Preview Google Subscription 8 params
Length 1 param
Parameter Type Default Description Condition
Max output tokens
generationConfig.maxOutputTokens
integer (1…65536) Maximum number of tokens to include in a response candidate.
Sampling 4 params
Parameter Type Default Description Condition
Temperature
generationConfig.temperature
number (0…2 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
generationConfig.topP
number (0…1 step 0.01) 0.95 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Top K
generationConfig.topK
integer (0…+∞) 64 Limits token sampling to the top K most likely next tokens.
Seed
generationConfig.seed
integer Optional seed used for decoding when reproducible sampling is desired.
Reasoning 2 params
Parameter Type Default Description Condition
Thinking level
generationConfig.thinkingConfig.thinkingLevel
enum (low | high) "high" Controls Gemini 3 Pro reasoning effort.
Include thoughts
generationConfig.thinkingConfig.includeThoughts
boolean false Controls whether Gemini returns available thought summaries in the response parts.
Output 1 param
Parameter Type Default Description Condition
Response MIME type
generationConfig.responseMimeType
enum (text/plain | application/json) "text/plain" MIME type for generated text candidates.
Gemini 3.5 Flash Google 8 params
Length 1 param
Parameter Type Default Description Condition
Max output tokens
generationConfig.maxOutputTokens
integer (1…65536) Maximum number of tokens to include in a response candidate.
Sampling 4 params
Parameter Type Default Description Condition
Temperature
generationConfig.temperature
number (0…2 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
generationConfig.topP
number (0…1 step 0.01) 0.95 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Top K
generationConfig.topK
integer (0…+∞) 64 Limits token sampling to the top K most likely next tokens.
Seed
generationConfig.seed
integer Optional seed used for decoding when reproducible sampling is desired.
Reasoning 2 params
Parameter Type Default Description Condition
Thinking level
generationConfig.thinkingConfig.thinkingLevel
enum (minimal | low | medium | high) "medium" Controls Gemini 3.5 Flash reasoning effort.
Include thoughts
generationConfig.thinkingConfig.includeThoughts
boolean false Controls whether Gemini returns available thought summaries in the response parts.
Output 1 param
Parameter Type Default Description Condition
Response MIME type
generationConfig.responseMimeType
enum (text/plain | application/json) "text/plain" MIME type for generated text candidates.

Alibaba

Qwen Flash Alibaba 5 params
Length 1 param
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…+∞) Maximum number of output tokens the model may generate.
Sampling 3 params
Parameter Type Default Description Condition
Temperature
temperature
number (0…2 step 0.1) Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
top_p
number (0…1 step 0.01) Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Top K
extra_body.top_k
integer (1…+∞) 20 Limits generation to the selected number of highest-probability tokens.
Reasoning 1 param
Parameter Type Default Description Condition
Enable thinking
extra_body.chat_template_kwargs.enable_thinking
boolean true Controls Qwen3 thinking mode when using OpenAI-compatible clients that pass provider-specific extra body fields.
Qwen Plus Alibaba 5 params
Length 1 param
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…+∞) Maximum number of output tokens the model may generate.
Sampling 3 params
Parameter Type Default Description Condition
Temperature
temperature
number (0…2 step 0.1) Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
top_p
number (0…1 step 0.01) Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Top K
extra_body.top_k
integer (1…+∞) 20 Limits generation to the selected number of highest-probability tokens.
Reasoning 1 param
Parameter Type Default Description Condition
Enable thinking
extra_body.chat_template_kwargs.enable_thinking
boolean true Controls Qwen3 thinking mode when using OpenAI-compatible clients that pass provider-specific extra body fields.
Qwen3 Coder Flash Alibaba 4 params
Length 1 param
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…+∞) Maximum number of output tokens the model may generate.
Sampling 3 params
Parameter Type Default Description Condition
Temperature
temperature
number (0…2 step 0.1) Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
top_p
number (0…1 step 0.01) Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Top K
extra_body.top_k
integer (1…+∞) 20 Limits generation to the selected number of highest-probability tokens.
Qwen3 Coder Plus Alibaba 4 params
Length 1 param
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…+∞) Maximum number of output tokens the model may generate.
Sampling 3 params
Parameter Type Default Description Condition
Temperature
temperature
number (0…2 step 0.1) Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
top_p
number (0…1 step 0.01) Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Top K
extra_body.top_k
integer (1…+∞) 20 Limits generation to the selected number of highest-probability tokens.
Qwen3 Max Alibaba 5 params
Length 1 param
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…+∞) Maximum number of output tokens the model may generate.
Sampling 3 params
Parameter Type Default Description Condition
Temperature
temperature
number (0…2 step 0.1) Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
top_p
number (0…1 step 0.01) Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Top K
extra_body.top_k
integer (1…+∞) 20 Limits generation to the selected number of highest-probability tokens.
Reasoning 1 param
Parameter Type Default Description Condition
Enable thinking
extra_body.chat_template_kwargs.enable_thinking
boolean false Controls Qwen3 thinking mode when using OpenAI-compatible clients that pass provider-specific extra body fields.
Qwen3.5 Alibaba 5 params
Length 1 param
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…+∞) Maximum number of output tokens the model may generate.
Sampling 3 params
Parameter Type Default Description Condition
Temperature
temperature
number (0…2 step 0.1) Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
top_p
number (0…1 step 0.01) Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Top K
extra_body.top_k
integer (1…+∞) 20 Limits generation to the selected number of highest-probability tokens.
Reasoning 1 param
Parameter Type Default Description Condition
Enable thinking
extra_body.chat_template_kwargs.enable_thinking
boolean true Controls Qwen3 thinking mode when using OpenAI-compatible clients that pass provider-specific extra body fields.
Qwen3.5 Flash Alibaba 5 params
Length 1 param
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…+∞) Maximum number of output tokens the model may generate.
Sampling 3 params
Parameter Type Default Description Condition
Temperature
temperature
number (0…2 step 0.1) Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
top_p
number (0…1 step 0.01) Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Top K
extra_body.top_k
integer (1…+∞) 20 Limits generation to the selected number of highest-probability tokens.
Reasoning 1 param
Parameter Type Default Description Condition
Enable thinking
extra_body.chat_template_kwargs.enable_thinking
boolean true Controls Qwen3 thinking mode when using OpenAI-compatible clients that pass provider-specific extra body fields.
Qwq Plus Alibaba 4 params
Length 1 param
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…+∞) Maximum number of output tokens the model may generate.
Sampling 3 params
Parameter Type Default Description Condition
Temperature
temperature
number (0…2 step 0.1) Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
top_p
number (0…1 step 0.01) Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Top K
extra_body.top_k
integer (1…+∞) 20 Limits generation to the selected number of highest-probability tokens.

Cohere

Command A 03 2025 Cohere 12 params
Length 2 params
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…+∞) Maximum number of output tokens the model may generate.
Stop sequences
stop_sequences
string Stops generation when one of these sequences is detected; up to five are allowed.
Sampling 6 params
Parameter Type Default Description Condition
Temperature
temperature
number (0…+∞ step 0.1) 0.3 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
p
number (0.01…0.99 step 0.01) 0.75 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Top K
k
integer (0…500) 0 Limits sampling to the K most likely tokens; 0 disables top-k sampling.
Frequency penalty
frequency_penalty
number (0…1 step 0.1) 0 Penalizes tokens proportional to how often they have already appeared to reduce repetition.
Presence penalty
presence_penalty
number (0…1 step 0.1) 0 Penalizes tokens that have already appeared to encourage a wider variety of content.
Seed
seed
integer Seed used for best-effort deterministic sampling when reproducible outputs are desired.
Tools 1 param
Parameter Type Default Description Condition
Tool choice
tool_choice
enum (REQUIRED | NONE) Forces the model to either call a tool or skip tool calls for this request.
Output 1 param
Parameter Type Default Description Condition
Response format
response_format.type
enum (text | json_object) "text" Controls whether the model returns normal text or JSON object output.
Observability 1 param
Parameter Type Default Description Condition
Log probabilities
logprobs
boolean false Controls whether the response includes log probabilities for the generated tokens.
Metadata 1 param
Parameter Type Default Description Condition
Safety mode
safety_mode
enum (CONTEXTUAL | STRICT) "CONTEXTUAL" Controls Cohere's built-in safety instructions applied to the generation.
Command A Plus 05 2026 Cohere 12 params
Length 2 params
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…+∞) Maximum number of output tokens the model may generate.
Stop sequences
stop_sequences
string Stops generation when one of these sequences is detected; up to five are allowed.
Sampling 6 params
Parameter Type Default Description Condition
Temperature
temperature
number (0…+∞ step 0.1) 0.3 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
p
number (0.01…0.99 step 0.01) 0.75 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Top K
k
integer (0…500) 0 Limits sampling to the K most likely tokens; 0 disables top-k sampling.
Frequency penalty
frequency_penalty
number (0…1 step 0.1) 0 Penalizes tokens proportional to how often they have already appeared to reduce repetition.
Presence penalty
presence_penalty
number (0…1 step 0.1) 0 Penalizes tokens that have already appeared to encourage a wider variety of content.
Seed
seed
integer Seed used for best-effort deterministic sampling when reproducible outputs are desired.
Tools 1 param
Parameter Type Default Description Condition
Tool choice
tool_choice
enum (REQUIRED | NONE) Forces the model to either call a tool or skip tool calls for this request.
Output 1 param
Parameter Type Default Description Condition
Response format
response_format.type
enum (text | json_object) "text" Controls whether the model returns normal text or JSON object output.
Observability 1 param
Parameter Type Default Description Condition
Log probabilities
logprobs
boolean false Controls whether the response includes log probabilities for the generated tokens.
Metadata 1 param
Parameter Type Default Description Condition
Safety mode
safety_mode
enum (CONTEXTUAL | STRICT) "CONTEXTUAL" Controls Cohere's built-in safety instructions applied to the generation.
Command A Reasoning 08 2025 Cohere 14 params
Length 2 params
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…+∞) Maximum number of output tokens the model may generate.
Stop sequences
stop_sequences
string Stops generation when one of these sequences is detected; up to five are allowed.
Sampling 6 params
Parameter Type Default Description Condition
Temperature
temperature
number (0…+∞ step 0.1) 0.3 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
p
number (0.01…0.99 step 0.01) 0.75 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Top K
k
integer (0…500) 0 Limits sampling to the K most likely tokens; 0 disables top-k sampling.
Frequency penalty
frequency_penalty
number (0…1 step 0.1) 0 Penalizes tokens proportional to how often they have already appeared to reduce repetition.
Presence penalty
presence_penalty
number (0…1 step 0.1) 0 Penalizes tokens that have already appeared to encourage a wider variety of content.
Seed
seed
integer Seed used for best-effort deterministic sampling when reproducible outputs are desired.
Reasoning 2 params
Parameter Type Default Description Condition
Thinking mode
thinking.type
enum (enabled | disabled) "disabled" Controls whether the model reasons step by step before producing its final answer.
Thinking token budget
thinking.token_budget
integer (1…+∞) Maximum number of tokens the model may spend on reasoning before answering.
Only when thinking.type = "enabled"
Tools 1 param
Parameter Type Default Description Condition
Tool choice
tool_choice
enum (REQUIRED | NONE) Forces the model to either call a tool or skip tool calls for this request.
Output 1 param
Parameter Type Default Description Condition
Response format
response_format.type
enum (text | json_object) "text" Controls whether the model returns normal text or JSON object output.
Observability 1 param
Parameter Type Default Description Condition
Log probabilities
logprobs
boolean false Controls whether the response includes log probabilities for the generated tokens.
Metadata 1 param
Parameter Type Default Description Condition
Safety mode
safety_mode
enum (CONTEXTUAL | STRICT) "CONTEXTUAL" Controls Cohere's built-in safety instructions applied to the generation.
Command A Translate 08 2025 Cohere 12 params
Length 2 params
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…+∞) Maximum number of output tokens the model may generate.
Stop sequences
stop_sequences
string Stops generation when one of these sequences is detected; up to five are allowed.
Sampling 6 params
Parameter Type Default Description Condition
Temperature
temperature
number (0…+∞ step 0.1) 0.3 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
p
number (0.01…0.99 step 0.01) 0.75 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Top K
k
integer (0…500) 0 Limits sampling to the K most likely tokens; 0 disables top-k sampling.
Frequency penalty
frequency_penalty
number (0…1 step 0.1) 0 Penalizes tokens proportional to how often they have already appeared to reduce repetition.
Presence penalty
presence_penalty
number (0…1 step 0.1) 0 Penalizes tokens that have already appeared to encourage a wider variety of content.
Seed
seed
integer Seed used for best-effort deterministic sampling when reproducible outputs are desired.
Tools 1 param
Parameter Type Default Description Condition
Tool choice
tool_choice
enum (REQUIRED | NONE) Forces the model to either call a tool or skip tool calls for this request.
Output 1 param
Parameter Type Default Description Condition
Response format
response_format.type
enum (text | json_object) "text" Controls whether the model returns normal text or JSON object output.
Observability 1 param
Parameter Type Default Description Condition
Log probabilities
logprobs
boolean false Controls whether the response includes log probabilities for the generated tokens.
Metadata 1 param
Parameter Type Default Description Condition
Safety mode
safety_mode
enum (CONTEXTUAL | STRICT) "CONTEXTUAL" Controls Cohere's built-in safety instructions applied to the generation.
Command A Vision 07 2025 Cohere 12 params
Length 2 params
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…+∞) Maximum number of output tokens the model may generate.
Stop sequences
stop_sequences
string Stops generation when one of these sequences is detected; up to five are allowed.
Sampling 6 params
Parameter Type Default Description Condition
Temperature
temperature
number (0…+∞ step 0.1) 0.3 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
p
number (0.01…0.99 step 0.01) 0.75 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Top K
k
integer (0…500) 0 Limits sampling to the K most likely tokens; 0 disables top-k sampling.
Frequency penalty
frequency_penalty
number (0…1 step 0.1) 0 Penalizes tokens proportional to how often they have already appeared to reduce repetition.
Presence penalty
presence_penalty
number (0…1 step 0.1) 0 Penalizes tokens that have already appeared to encourage a wider variety of content.
Seed
seed
integer Seed used for best-effort deterministic sampling when reproducible outputs are desired.
Tools 1 param
Parameter Type Default Description Condition
Tool choice
tool_choice
enum (REQUIRED | NONE) Forces the model to either call a tool or skip tool calls for this request.
Output 1 param
Parameter Type Default Description Condition
Response format
response_format.type
enum (text | json_object) "text" Controls whether the model returns normal text or JSON object output.
Observability 1 param
Parameter Type Default Description Condition
Log probabilities
logprobs
boolean false Controls whether the response includes log probabilities for the generated tokens.
Metadata 1 param
Parameter Type Default Description Condition
Safety mode
safety_mode
enum (CONTEXTUAL | STRICT) "CONTEXTUAL" Controls Cohere's built-in safety instructions applied to the generation.
Command R 08 2024 Cohere 11 params
Length 2 params
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…+∞) Maximum number of output tokens the model may generate.
Stop sequences
stop_sequences
string Stops generation when one of these sequences is detected; up to five are allowed.
Sampling 6 params
Parameter Type Default Description Condition
Temperature
temperature
number (0…+∞ step 0.1) 0.3 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
p
number (0.01…0.99 step 0.01) 0.75 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Top K
k
integer (0…500) 0 Limits sampling to the K most likely tokens; 0 disables top-k sampling.
Frequency penalty
frequency_penalty
number (0…1 step 0.1) 0 Penalizes tokens proportional to how often they have already appeared to reduce repetition.
Presence penalty
presence_penalty
number (0…1 step 0.1) 0 Penalizes tokens that have already appeared to encourage a wider variety of content.
Seed
seed
integer Seed used for best-effort deterministic sampling when reproducible outputs are desired.
Output 1 param
Parameter Type Default Description Condition
Response format
response_format.type
enum (text | json_object) "text" Controls whether the model returns normal text or JSON object output.
Observability 1 param
Parameter Type Default Description Condition
Log probabilities
logprobs
boolean false Controls whether the response includes log probabilities for the generated tokens.
Metadata 1 param
Parameter Type Default Description Condition
Safety mode
safety_mode
enum (CONTEXTUAL | STRICT | OFF) "CONTEXTUAL" Controls Cohere's built-in safety instructions applied to the generation.
Command R Plus 08 2024 Cohere 11 params
Length 2 params
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…+∞) Maximum number of output tokens the model may generate.
Stop sequences
stop_sequences
string Stops generation when one of these sequences is detected; up to five are allowed.
Sampling 6 params
Parameter Type Default Description Condition
Temperature
temperature
number (0…+∞ step 0.1) 0.3 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
p
number (0.01…0.99 step 0.01) 0.75 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Top K
k
integer (0…500) 0 Limits sampling to the K most likely tokens; 0 disables top-k sampling.
Frequency penalty
frequency_penalty
number (0…1 step 0.1) 0 Penalizes tokens proportional to how often they have already appeared to reduce repetition.
Presence penalty
presence_penalty
number (0…1 step 0.1) 0 Penalizes tokens that have already appeared to encourage a wider variety of content.
Seed
seed
integer Seed used for best-effort deterministic sampling when reproducible outputs are desired.
Output 1 param
Parameter Type Default Description Condition
Response format
response_format.type
enum (text | json_object) "text" Controls whether the model returns normal text or JSON object output.
Observability 1 param
Parameter Type Default Description Condition
Log probabilities
logprobs
boolean false Controls whether the response includes log probabilities for the generated tokens.
Metadata 1 param
Parameter Type Default Description Condition
Safety mode
safety_mode
enum (CONTEXTUAL | STRICT | OFF) "CONTEXTUAL" Controls Cohere's built-in safety instructions applied to the generation.
Command R7b 12 2024 Cohere 12 params
Length 2 params
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…+∞) Maximum number of output tokens the model may generate.
Stop sequences
stop_sequences
string Stops generation when one of these sequences is detected; up to five are allowed.
Sampling 6 params
Parameter Type Default Description Condition
Temperature
temperature
number (0…+∞ step 0.1) 0.3 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
p
number (0.01…0.99 step 0.01) 0.75 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Top K
k
integer (0…500) 0 Limits sampling to the K most likely tokens; 0 disables top-k sampling.
Frequency penalty
frequency_penalty
number (0…1 step 0.1) 0 Penalizes tokens proportional to how often they have already appeared to reduce repetition.
Presence penalty
presence_penalty
number (0…1 step 0.1) 0 Penalizes tokens that have already appeared to encourage a wider variety of content.
Seed
seed
integer Seed used for best-effort deterministic sampling when reproducible outputs are desired.
Tools 1 param
Parameter Type Default Description Condition
Tool choice
tool_choice
enum (REQUIRED | NONE) Forces the model to either call a tool or skip tool calls for this request.
Output 1 param
Parameter Type Default Description Condition
Response format
response_format.type
enum (text | json_object) "text" Controls whether the model returns normal text or JSON object output.
Observability 1 param
Parameter Type Default Description Condition
Log probabilities
logprobs
boolean false Controls whether the response includes log probabilities for the generated tokens.
Metadata 1 param
Parameter Type Default Description Condition
Safety mode
safety_mode
enum (CONTEXTUAL | STRICT) "CONTEXTUAL" Controls Cohere's built-in safety instructions applied to the generation.

Moonshot AI

Kimi K2.5 Moonshot AI 3 params
Length 1 param
Parameter Type Default Description Condition
Max completion tokens
max_completion_tokens
integer (1…+∞) Maximum number of tokens to generate in the chat completion.
Reasoning 1 param
Parameter Type Default Description Condition
Thinking mode
thinking.type
enum (enabled | disabled) Controls whether Kimi reasons step by step before answering, or responds directly when set to disabled.
Output 1 param
Parameter Type Default Description Condition
Response format
response_format.type
enum (text | json_object) "text" Forces the response into plain text or a JSON object.
Kimi K2.6 Moonshot AI 3 params
Length 1 param
Parameter Type Default Description Condition
Max completion tokens
max_completion_tokens
integer (1…+∞) Maximum number of tokens to generate in the chat completion.
Reasoning 1 param
Parameter Type Default Description Condition
Thinking mode
thinking.type
enum (enabled | disabled) "enabled" Controls whether Kimi reasons step by step before answering. Thinking is enabled by default; set disabled to respond directly.
Output 1 param
Parameter Type Default Description Condition
Response format
response_format.type
enum (text | json_object) "text" Forces the response into plain text or a JSON object.
Moonshot v1 128K Moonshot AI 7 params
Length 2 params
Parameter Type Default Description Condition
Max completion tokens
max_completion_tokens
integer (1…+∞) Maximum number of tokens to generate in the chat completion.
Number of completions
n
integer (1…5) 1 How many chat completion choices to generate for the request.
Sampling 4 params
Parameter Type Default Description Condition
Temperature
temperature
number (0…1 step 0.1) 0.3 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Presence penalty
presence_penalty
number (-2…2 step 0.1) 0 Penalizes tokens that have already appeared, encouraging the model to talk about new topics.
Frequency penalty
frequency_penalty
number (-2…2 step 0.1) 0 Penalizes tokens by how often they have appeared, reducing verbatim repetition.
Output 1 param
Parameter Type Default Description Condition
Response format
response_format.type
enum (text | json_object) "text" Forces the response into plain text or a JSON object.
Moonshot v1 32K Moonshot AI 7 params
Length 2 params
Parameter Type Default Description Condition
Max completion tokens
max_completion_tokens
integer (1…+∞) Maximum number of tokens to generate in the chat completion.
Number of completions
n
integer (1…5) 1 How many chat completion choices to generate for the request.
Sampling 4 params
Parameter Type Default Description Condition
Temperature
temperature
number (0…1 step 0.1) 0.3 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Presence penalty
presence_penalty
number (-2…2 step 0.1) 0 Penalizes tokens that have already appeared, encouraging the model to talk about new topics.
Frequency penalty
frequency_penalty
number (-2…2 step 0.1) 0 Penalizes tokens by how often they have appeared, reducing verbatim repetition.
Output 1 param
Parameter Type Default Description Condition
Response format
response_format.type
enum (text | json_object) "text" Forces the response into plain text or a JSON object.
Moonshot v1 8K Moonshot AI 7 params
Length 2 params
Parameter Type Default Description Condition
Max completion tokens
max_completion_tokens
integer (1…+∞) Maximum number of tokens to generate in the chat completion.
Number of completions
n
integer (1…5) 1 How many chat completion choices to generate for the request.
Sampling 4 params
Parameter Type Default Description Condition
Temperature
temperature
number (0…1 step 0.1) 0.3 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Presence penalty
presence_penalty
number (-2…2 step 0.1) 0 Penalizes tokens that have already appeared, encouraging the model to talk about new topics.
Frequency penalty
frequency_penalty
number (-2…2 step 0.1) 0 Penalizes tokens by how often they have appeared, reducing verbatim repetition.
Output 1 param
Parameter Type Default Description Condition
Response format
response_format.type
enum (text | json_object) "text" Forces the response into plain text or a JSON object.

xAI

Grok 4.20 0309 Non Reasoning xAI 6 params
Length 2 params
Parameter Type Default Description Condition
Max completion tokens
max_completion_tokens
integer (1…+∞) Upper bound for visible output tokens generated in the chat completion.
Stop sequence
stop
string Stops generation when this sequence is produced. xAI accepts up to four stop sequences.
Sampling 3 params
Parameter Type Default Description Condition
Temperature
temperature
number (0…2 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Seed
seed
integer Optional seed used for decoding when reproducible sampling is desired.
Output 1 param
Parameter Type Default Description Condition
Response format
response_format.type
enum (text | json_object | json_schema) "text" Controls whether the model returns text, JSON mode output, or structured JSON schema output.
Grok 4.20 0309 Reasoning xAI 5 params
Length 1 param
Parameter Type Default Description Condition
Max completion tokens
max_completion_tokens
integer (1…+∞) Upper bound for visible output tokens generated in the chat completion.
Sampling 3 params
Parameter Type Default Description Condition
Temperature
temperature
number (0…2 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Seed
seed
integer Optional seed used for decoding when reproducible sampling is desired.
Output 1 param
Parameter Type Default Description Condition
Response format
response_format.type
enum (text | json_object | json_schema) "text" Controls whether the model returns text, JSON mode output, or structured JSON schema output.
Grok 4.20 Multi Agent 0309 xAI 5 params
Length 1 param
Parameter Type Default Description Condition
Max output tokens
max_output_tokens
integer (1…+∞) Upper bound for output tokens generated in the Responses API response.
Sampling 2 params
Parameter Type Default Description Condition
Temperature
temperature
number (0…2 step 0.1) 0.7 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
top_p
number (0…1 step 0.01) 0.95 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Reasoning 1 param
Parameter Type Default Description Condition
Reasoning effort
reasoning.effort
enum (low | medium | high | xhigh) Controls whether the Responses API request uses the 4-agent or 16-agent multi-agent setup.
Output 1 param
Parameter Type Default Description Condition
Text format
text.format.type
enum (text | json_object | json_schema) "text" Controls whether the Responses API returns free-form text, JSON mode output, or structured JSON schema output.
Grok 4.3 xAI 6 params
Length 1 param
Parameter Type Default Description Condition
Max completion tokens
max_completion_tokens
integer (1…+∞) Upper bound for visible output tokens generated in the chat completion.
Sampling 3 params
Parameter Type Default Description Condition
Temperature
temperature
number (0…2 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Seed
seed
integer Optional seed used for decoding when reproducible sampling is desired.
Reasoning 1 param
Parameter Type Default Description Condition
Reasoning effort
reasoning_effort
enum (none | low | medium | high) "low" Controls how much reasoning Grok performs before responding. Set to none for non-reasoning requests.
Output 1 param
Parameter Type Default Description Condition
Response format
response_format.type
enum (text | json_object | json_schema) "text" Controls whether the model returns text, JSON mode output, or structured JSON schema output.
Grok Build 0.1 xAI 5 params
Length 1 param
Parameter Type Default Description Condition
Max completion tokens
max_completion_tokens
integer (1…+∞) Upper bound for visible output tokens generated in the chat completion.
Sampling 3 params
Parameter Type Default Description Condition
Temperature
temperature
number (0…2 step 0.1) 1 Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Seed
seed
integer Optional seed used for decoding when reproducible sampling is desired.
Output 1 param
Parameter Type Default Description Condition
Response format
response_format.type
enum (text | json_object | json_schema) "text" Controls whether the model returns text, JSON mode output, or structured JSON schema output.

DeepSeek

Deepseek Chat DeepSeek 4 params
Length 1 param
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…+∞) 4096 Maximum number of output tokens the model may generate.
Sampling 2 params
Parameter Type Default Description Condition
Temperature
temperature
number (0…2 step 0.1) 1 Controls randomness. In DeepSeek thinking mode this parameter is accepted for compatibility but has no effect.
Not when thinking.type = "enabled"
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling. In DeepSeek thinking mode this parameter is accepted for compatibility but has no effect.
Not when thinking.type = "enabled"
Reasoning 1 param
Parameter Type Default Description Condition
Thinking mode
thinking.type
enum (disabled | enabled) "disabled" Controls whether DeepSeek uses thinking mode before producing the final answer.
Deepseek Reasoner DeepSeek 5 params
Length 1 param
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…+∞) 4096 Maximum number of output tokens the model may generate.
Sampling 2 params
Parameter Type Default Description Condition
Temperature
temperature
number (0…2 step 0.1) 1 Controls randomness. In DeepSeek thinking mode this parameter is accepted for compatibility but has no effect.
Not when thinking.type = "enabled"
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling. In DeepSeek thinking mode this parameter is accepted for compatibility but has no effect.
Not when thinking.type = "enabled"
Reasoning 2 params
Parameter Type Default Description Condition
Thinking mode
thinking.type
enum (enabled | disabled) "enabled" Controls whether DeepSeek uses thinking mode before producing the final answer.
Reasoning effort
reasoning_effort
enum (high | max) "high" Controls DeepSeek thinking effort when thinking mode is enabled.
Only when thinking.type = "enabled"
Deepseek V4 Flash DeepSeek 5 params
Length 1 param
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…+∞) 4096 Maximum number of output tokens the model may generate.
Sampling 2 params
Parameter Type Default Description Condition
Temperature
temperature
number (0…2 step 0.1) 1 Controls randomness. In DeepSeek thinking mode this parameter is accepted for compatibility but has no effect.
Not when thinking.type = "enabled"
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling. In DeepSeek thinking mode this parameter is accepted for compatibility but has no effect.
Not when thinking.type = "enabled"
Reasoning 2 params
Parameter Type Default Description Condition
Thinking mode
thinking.type
enum (enabled | disabled) "enabled" Controls whether DeepSeek uses thinking mode before producing the final answer.
Reasoning effort
reasoning_effort
enum (high | max) "high" Controls DeepSeek thinking effort when thinking mode is enabled.
Only when thinking.type = "enabled"
Deepseek V4 Pro DeepSeek 5 params
Length 1 param
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…+∞) 4096 Maximum number of output tokens the model may generate.
Sampling 2 params
Parameter Type Default Description Condition
Temperature
temperature
number (0…2 step 0.1) 1 Controls randomness. In DeepSeek thinking mode this parameter is accepted for compatibility but has no effect.
Not when thinking.type = "enabled"
Top P
top_p
number (0…1 step 0.01) 1 Controls nucleus sampling. In DeepSeek thinking mode this parameter is accepted for compatibility but has no effect.
Not when thinking.type = "enabled"
Reasoning 2 params
Parameter Type Default Description Condition
Thinking mode
thinking.type
enum (enabled | disabled) "enabled" Controls whether DeepSeek uses thinking mode before producing the final answer.
Reasoning effort
reasoning_effort
enum (high | max) "high" Controls DeepSeek thinking effort when thinking mode is enabled.
Only when thinking.type = "enabled"

Meta

Llama 3.3 70B Instruct Meta 7 params
Length 1 param
Parameter Type Default Description Condition
Max completion tokens
max_completion_tokens
integer (1…+∞) Maximum number of output tokens the model may generate.
Sampling 4 params
Parameter Type Default Description Condition
Temperature
temperature
number Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
top_p
number Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Top K
top_k
integer Limits generation to the selected number of highest-probability tokens.
Repetition penalty
repetition_penalty
number Penalizes tokens that have already appeared to reduce repetition in the output.
Tools 1 param
Parameter Type Default Description Condition
Tool choice
tool_choice
enum (auto | none | required) Controls whether the model may call tools, must call one, or skips tool calls.
Output 1 param
Parameter Type Default Description Condition
Response format
response_format.type
enum (text | json_schema) "text" Controls whether the model returns normal text or a schema-constrained JSON object.
Llama 3.3 8B Instruct Meta 7 params
Length 1 param
Parameter Type Default Description Condition
Max completion tokens
max_completion_tokens
integer (1…+∞) Maximum number of output tokens the model may generate.
Sampling 4 params
Parameter Type Default Description Condition
Temperature
temperature
number Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
top_p
number Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Top K
top_k
integer Limits generation to the selected number of highest-probability tokens.
Repetition penalty
repetition_penalty
number Penalizes tokens that have already appeared to reduce repetition in the output.
Tools 1 param
Parameter Type Default Description Condition
Tool choice
tool_choice
enum (auto | none | required) Controls whether the model may call tools, must call one, or skips tool calls.
Output 1 param
Parameter Type Default Description Condition
Response format
response_format.type
enum (text | json_schema) "text" Controls whether the model returns normal text or a schema-constrained JSON object.
Llama 4 Maverick 17B 128E Instruct FP8 Meta 7 params
Length 1 param
Parameter Type Default Description Condition
Max completion tokens
max_completion_tokens
integer (1…+∞) Maximum number of output tokens the model may generate.
Sampling 4 params
Parameter Type Default Description Condition
Temperature
temperature
number Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
top_p
number Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Top K
top_k
integer Limits generation to the selected number of highest-probability tokens.
Repetition penalty
repetition_penalty
number Penalizes tokens that have already appeared to reduce repetition in the output.
Tools 1 param
Parameter Type Default Description Condition
Tool choice
tool_choice
enum (auto | none | required) Controls whether the model may call tools, must call one, or skips tool calls.
Output 1 param
Parameter Type Default Description Condition
Response format
response_format.type
enum (text | json_schema) "text" Controls whether the model returns normal text or a schema-constrained JSON object.
Llama 4 Scout 17B 16E Instruct FP8 Meta 7 params
Length 1 param
Parameter Type Default Description Condition
Max completion tokens
max_completion_tokens
integer (1…+∞) Maximum number of output tokens the model may generate.
Sampling 4 params
Parameter Type Default Description Condition
Temperature
temperature
number Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
top_p
number Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Top K
top_k
integer Limits generation to the selected number of highest-probability tokens.
Repetition penalty
repetition_penalty
number Penalizes tokens that have already appeared to reduce repetition in the output.
Tools 1 param
Parameter Type Default Description Condition
Tool choice
tool_choice
enum (auto | none | required) Controls whether the model may call tools, must call one, or skips tool calls.
Output 1 param
Parameter Type Default Description Condition
Response format
response_format.type
enum (text | json_schema) "text" Controls whether the model returns normal text or a schema-constrained JSON object.

Perplexity

Sonar Perplexity 12 params
Length 1 param
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…128000) Maximum number of output tokens the model may generate.
Sampling 2 params
Parameter Type Default Description Condition
Temperature
temperature
number (0…2 step 0.1) Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
top_p
number (0…1 step 0.01) Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Metadata 9 params
Parameter Type Default Description Condition
Search mode
search_mode
enum (web | academic | sec) Selects the corpus the model searches when grounding its answer.
Search recency filter
search_recency_filter
enum (hour | day | week | month | year) Restricts web search results to a recent time window.
Search domain filter
search_domain_filter
string Limits search to, or excludes, specific domains.
Search after date
search_after_date_filter
string Restricts search results to content published after this date (MM/DD/YYYY).
Search before date
search_before_date_filter
string Restricts search results to content published before this date (MM/DD/YYYY).
Search context size
web_search_options.search_context_size
enum (low | medium | high) "low" Controls how much web search context is retrieved before generating the answer.
Return images
return_images
boolean false Controls whether the response may include related images from the search.
Return related questions
return_related_questions
boolean false Controls whether the response includes suggested follow-up questions.
Disable search
disable_search
boolean false Turns off web search so the model answers from its own knowledge only.
Sonar Deep Research Perplexity 12 params
Length 1 param
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…128000) Maximum number of output tokens the model may generate.
Sampling 2 params
Parameter Type Default Description Condition
Temperature
temperature
number (0…2 step 0.1) Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
top_p
number (0…1 step 0.01) Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Reasoning 1 param
Parameter Type Default Description Condition
Reasoning effort
reasoning_effort
enum (minimal | low | medium | high) Controls how much reasoning and searching the model performs before producing the report.
Metadata 8 params
Parameter Type Default Description Condition
Search mode
search_mode
enum (web | academic | sec) Selects the corpus the model searches when grounding its answer.
Search recency filter
search_recency_filter
enum (hour | day | week | month | year) Restricts web search results to a recent time window.
Search domain filter
search_domain_filter
string Limits search to, or excludes, specific domains.
Search after date
search_after_date_filter
string Restricts search results to content published after this date (MM/DD/YYYY).
Search before date
search_before_date_filter
string Restricts search results to content published before this date (MM/DD/YYYY).
Search context size
web_search_options.search_context_size
enum (low | medium | high) "low" Controls how much web search context is retrieved before generating the answer.
Return images
return_images
boolean false Controls whether the response may include related images from the search.
Return related questions
return_related_questions
boolean false Controls whether the response includes suggested follow-up questions.
Sonar Pro Perplexity 12 params
Length 1 param
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…128000) Maximum number of output tokens the model may generate.
Sampling 2 params
Parameter Type Default Description Condition
Temperature
temperature
number (0…2 step 0.1) Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
top_p
number (0…1 step 0.01) Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Metadata 9 params
Parameter Type Default Description Condition
Search mode
search_mode
enum (web | academic | sec) Selects the corpus the model searches when grounding its answer.
Search recency filter
search_recency_filter
enum (hour | day | week | month | year) Restricts web search results to a recent time window.
Search domain filter
search_domain_filter
string Limits search to, or excludes, specific domains.
Search after date
search_after_date_filter
string Restricts search results to content published after this date (MM/DD/YYYY).
Search before date
search_before_date_filter
string Restricts search results to content published before this date (MM/DD/YYYY).
Search context size
web_search_options.search_context_size
enum (low | medium | high) "low" Controls how much web search context is retrieved before generating the answer.
Return images
return_images
boolean false Controls whether the response may include related images from the search.
Return related questions
return_related_questions
boolean false Controls whether the response includes suggested follow-up questions.
Disable search
disable_search
boolean false Turns off web search so the model answers from its own knowledge only.
Sonar Reasoning Pro Perplexity 12 params
Length 1 param
Parameter Type Default Description Condition
Max tokens
max_tokens
integer (1…128000) Maximum number of output tokens the model may generate.
Sampling 2 params
Parameter Type Default Description Condition
Temperature
temperature
number (0…2 step 0.1) Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Top P
top_p
number (0…1 step 0.01) Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Metadata 9 params
Parameter Type Default Description Condition
Search mode
search_mode
enum (web | academic | sec) Selects the corpus the model searches when grounding its answer.
Search recency filter
search_recency_filter
enum (hour | day | week | month | year) Restricts web search results to a recent time window.
Search domain filter
search_domain_filter
string Limits search to, or excludes, specific domains.
Search after date
search_after_date_filter
string Restricts search results to content published after this date (MM/DD/YYYY).
Search before date
search_before_date_filter
string Restricts search results to content published before this date (MM/DD/YYYY).
Search context size
web_search_options.search_context_size
enum (low | medium | high) "low" Controls how much web search context is retrieved before generating the answer.
Return images
return_images
boolean false Controls whether the response may include related images from the search.
Return related questions
return_related_questions
boolean false Controls whether the response includes suggested follow-up questions.
Disable search
disable_search
boolean false Turns off web search so the model answers from its own knowledge only.

How to use

Building with an AI agent? Hit Copy to grab this whole guide as Markdown and paste it in — or point your agent straight at /llms.txt.

modelparams.dev is an open, community-maintained catalog of LLM model parameters. Each entry shows the knobs you can turn — type, default, range, and the conditions that gate it.

The same model accessed via an API key and via a subscription usually exposes a different set of parameters. We list both as separate entries so the data stays honest.

Catalog API

The full catalog is static JSON, CORS-enabled, served from the edge.

curl https://modelparams.dev/api/v1/models.json

Each entry is keyed by provider/model for API-key variants; subscription variants append -subscription.

If you only need the params for one model contract, use the providerless endpoint. Subscription contracts are model slugs with -subscription.

curl https://modelparams.dev/api/v1/params/gpt-5.5.json
curl https://modelparams.dev/api/v1/params/gpt-5.5-subscription.json

Single model

curl https://modelparams.dev/api/v1/models/anthropic/claude-opus-4-7.json
curl https://modelparams.dev/api/v1/models/anthropic/claude-opus-4-7-subscription.json

JSON Schema

Every entry validates against a JSON Schema you can use in your editor or pipeline.

curl https://modelparams.dev/api/v1/schema.json

Add this header to any YAML you author for autocomplete in VS Code:

# yaml-language-server: $schema=https://modelparams.dev/api/v1/schema.json

Logos

Provider logos are available at /assets/logos/{provider}.svg where {provider} is the provider slug. They use currentColor so they inherit your text color.

curl https://modelparams.dev/assets/logos/anthropic.svg

Logos are sourced from the models.dev repo (MIT) and used under nominative fair use.

Contribute

The data lives in YAML under models/{provider}/{model}-{auth}.yaml in the GitHub repo. Open a PR; CI validates against the schema and rebuilds.

Edit on GitHub MIT licensed