LLM parameter glossary
28 parameters appear across the catalog. This page defines each one, grouped by what it controls, and notes its type and how many models expose it. Definitions come from the same community-maintained data as the JSON API.
Length
-
Max tokens
max_tokensinteger · 52 models -
Maximum number of output tokens the model may generate.
-
Max tokens
max_completion_tokensinteger · 13 models -
Maximum number of output tokens the model may generate.
Supported by: OpenAI
-
Stop sequence
stopstring · 13 models -
Stops generation when this string is detected.
Supported by: Mistral
-
Max output tokens
generationConfig.maxOutputTokensinteger · 4 models -
Maximum number of tokens to include in a response candidate.
Supported by: Google
Sampling
-
Temperature
temperaturenumber · 49 models -
Controls randomness. Lower values make outputs more focused; higher values make them more varied.
-
Top P
top_pnumber · 49 models -
Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.
-
Top K
top_kinteger · 22 models -
Limits token sampling to the top K most likely next tokens.
Supported by: Anthropic
-
Frequency penalty
frequency_penaltynumber · 13 models -
Penalizes words based on how often they already appear in the generated text.
Supported by: Mistral
-
Presence penalty
presence_penaltynumber · 13 models -
Penalizes repeated words or phrases to encourage a wider variety of generated content.
Supported by: Mistral
-
Random seed
random_seedinteger · 13 models -
Seed used for deterministic sampling when reproducible outputs are desired.
Supported by: Mistral
-
Seed
generationConfig.seedinteger · 4 models -
Optional seed used for decoding when reproducible sampling is desired.
Supported by: Google
-
Temperature
generationConfig.temperaturenumber · 4 models -
Controls randomness. Lower values make outputs more focused; higher values make them more varied.
Supported by: Google
-
Top K
generationConfig.topKinteger · 4 models -
Limits token sampling to the top K most likely next tokens.
Supported by: Google
-
Top P
generationConfig.topPnumber · 4 models -
Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.
Supported by: Google
Reasoning
-
Thinking mode
thinking.typeenum · 21 models -
Controls the Anthropic thinking mode values supported by this model.
-
Reasoning effort
reasoning_effortenum · 17 models -
Controls how much reasoning the model should perform before producing an answer.
-
Budget tokens
thinking.budget_tokensinteger · 16 models -
Maximum token budget Anthropic may use for extended thinking before producing the final answer.
Supported by: Anthropic
-
Reasoning effort
reasoning.effortenum · 9 models -
Controls how much reasoning the model should perform before producing an answer.
Supported by: OpenAI
-
Reasoning summary
reasoning.summaryenum · 9 models -
Controls the level of reasoning summary returned with the response.
Supported by: OpenAI
-
Thinking display
thinking.displayenum · 7 models -
Controls whether Anthropic returns summarized or omitted thinking content.
Supported by: Anthropic
-
Include thoughts
generationConfig.thinkingConfig.includeThoughtsboolean · 4 models -
Controls whether Gemini returns available thought summaries in the response parts.
Supported by: Google
-
Effort
output_config.effortenum · 4 models -
Controls Anthropic response thoroughness and token spend.
Supported by: Anthropic
-
Thinking budget
generationConfig.thinkingConfig.thinkingBudgetinteger · 3 models -
Number of thinking tokens Gemini should use; -1 uses dynamic thinking, 0 disables thinking, and fixed budgets start at 512 tokens.
Supported by: Google
-
Thinking level
generationConfig.thinkingConfig.thinkingLevelenum · 1 model -
Controls Gemini 3.5 Flash reasoning effort.
Supported by: Google
Output
-
Response format
response_format.typeenum · 13 models -
Controls whether the model returns normal text or JSON mode output.
Supported by: Mistral
-
Verbosity
text.verbosityenum · 9 models -
Controls how concise or detailed the model's final text response should be.
Supported by: OpenAI
-
Response MIME type
generationConfig.responseMimeTypeenum · 4 models -
MIME type for generated text candidates.
Supported by: Google
Metadata
-
Safe prompt
safe_promptboolean · 13 models -
Controls whether Mistral injects its safety prompt before the conversation.
Supported by: Mistral