beginnerconcepts

What Is Temperature?

Temperature controls how random or predictable an AI's responses are.

What Temperature Actually Does

At each step, a language model produces a probability distribution over every possible next token. Temperature is a multiplier applied to that distribution before the model samples from it. A low temperature (close to 0) sharpens the distribution, making the highest-probability tokens overwhelmingly more likely to be chosen. A high temperature flattens the distribution, giving lower-probability tokens a better shot. In plain terms: low temperature = predictable, high temperature = surprising.

The 0–2 Range

Most APIs expose temperature as a value between 0 and 2. At exactly 0, the model becomes greedy — it always picks the highest-probability token, making outputs nearly fully deterministic. At 1.0, you get the model's "natural" behaviour, sampling according to its raw probability distribution. Above 1.0, the model starts taking increasingly unusual word choices, which can produce creative or incoherent outputs depending on the task. Values above 1.5 are rarely useful in production.

When to Use Low vs High Temperature

Low temperature (0.0–0.3): factual Q&A, code generation, data extraction, classification, anything where consistency and accuracy matter more than variety.
Medium temperature (0.5–0.8): general chat, summarisation, writing assistance — a balance between coherence and naturalness.
High temperature (0.9–1.3): brainstorming, creative writing, generating diverse options, poetry — where you want the model to take risks.

Top-P as a Sibling Setting

Top-P (nucleus sampling) is a related parameter that works differently. Instead of scaling the whole distribution, it restricts sampling to the smallest set of tokens whose cumulative probability exceeds P. Setting top-p to 0.9 means the model only samples from the top 90% of the probability mass, cutting off the long tail of weird tokens. Most practitioners tune one or the other, not both simultaneously — pick temperature for simplicity, or top-p if you want finer-grained control over the tail.

Example

Low temp (0.1): 'The capital of France is Paris.' — deterministic High temp (1.2): 'The capital of France is Paris, though some might poetically argue it's wherever the croissants are freshest.' — creative

Try this skill with our AI assistant

Try it →