Temperature controls how random or predictable an AI's responses are.
At each step, a language model produces a probability distribution over every possible next token. Temperature is a multiplier applied to that distribution before the model samples from it. A low temperature (close to 0) sharpens the distribution, making the highest-probability tokens overwhelmingly more likely to be chosen. A high temperature flattens the distribution, giving lower-probability tokens a better shot. In plain terms: low temperature = predictable, high temperature = surprising.
Most APIs expose temperature as a value between 0 and 2. At exactly 0, the model becomes greedy — it always picks the highest-probability token, making outputs nearly fully deterministic. At 1.0, you get the model's "natural" behaviour, sampling according to its raw probability distribution. Above 1.0, the model starts taking increasingly unusual word choices, which can produce creative or incoherent outputs depending on the task. Values above 1.5 are rarely useful in production.
Top-P (nucleus sampling) is a related parameter that works differently. Instead of scaling the whole distribution, it restricts sampling to the smallest set of tokens whose cumulative probability exceeds P. Setting top-p to 0.9 means the model only samples from the top 90% of the probability mass, cutting off the long tail of weird tokens. Most practitioners tune one or the other, not both simultaneously — pick temperature for simplicity, or top-p if you want finer-grained control over the tail.
Example
Low temp (0.1): 'The capital of France is Paris.' — deterministic High temp (1.2): 'The capital of France is Paris, though some might poetically argue it's wherever the croissants are freshest.' — creative
Try this skill with our AI assistant
Try it →