Top-P (Nucleus Sampling)

Top-P sampling—also called nucleus sampling—is a decoding strategy used in language models to generate more coherent and diverse text. It selects the next token from the smallest possible set of candidates whose cumulative probability exceeds a threshold p.

Purpose

Top-P limits randomness while maintaining flexibility. Unlike Top-K (which selects from the K most probable tokens), Top-P chooses dynamically based on probability mass, which adapts better to varying uncertainty in predictions.

How It Works

Sort all possible next tokens by probability (highest to lowest).
Accumulate probabilities until the sum exceeds the threshold p (e.g., 0.9).
Sample the next token from this truncated set.

Parameters

p (probability threshold): A float between 0 and 1.
- Lower p (e.g., 0.7): More focused, deterministic output.
- Higher p (e.g., 0.95): More diverse, creative output.

Comparison: Top-K vs Top-P

Strategy	Description	Behavior
Top-K	Pick from fixed top K tokens	Consistent but may miss nuance
Top-P	Pick from dynamic set with cumulative P	More adaptive and fluent output

Use Cases

Creative writing
Conversational agents
Any LLM generation task where a balance of randomness and relevance is needed

Tips

Often combined with Temperature to fine-tune creativity vs precision.
Set Top-P ~0.8–0.95 for human-like fluency without excessive randomness.
Low values (0.1-0.5) produce focused outputs, medium (0.6-0.9) balance creativity and coherence, high (0.9-0.99) enable creative diversity.

Links to this note

Top-K Sampling

Top-K sampling is a decoding method used in language models to control output randomness and maintain relevance during token generation. It

Temperature (Sampling Parameter)

Temperature is a key sampling parameter that controls the level of randomness in language model outputs. It adjusts the probability distribution

Sampling Parameters

Sampling parameters control how language models choose the next token during generation. They directly affect the randomness, coherence, and

Decoding Strategies

Decoding strategies define how language models convert predicted token probabilities into coherent text. They control determinism, diversity, and

Greedy Decoding

Greedy decoding is the simplest decoding strategy for language models. At each generation step, it selects the token with the highest probability,