Top-P (Nucleus Sampling)

Top-P sampling—also called nucleus sampling—is a decoding strategy used in language models to generate more coherent and diverse text. It selects the next token from the smallest possible set of candidates whose cumulative probability exceeds a threshold p.

Purpose

Top-P limits randomness while maintaining flexibility. Unlike Top-K (which selects from the K most probable tokens), Top-P chooses dynamically based on probability mass, which adapts better to varying uncertainty in predictions.

How It Works

  1. Sort all possible next tokens by probability (highest to lowest).
  2. Accumulate probabilities until the sum exceeds the threshold p (e.g., 0.9).
  3. Sample the next token from this truncated set.

Parameters

  • p (probability threshold): A float between 0 and 1.
    • Lower p (e.g., 0.7): More focused, deterministic output.
    • Higher p (e.g., 0.95): More diverse, creative output.

Comparison: Top-K vs Top-P

Strategy Description Behavior
Top-K Pick from fixed top K tokens Consistent but may miss nuance
Top-P Pick from dynamic set with cumulative P More adaptive and fluent output

Use Cases

  • Creative writing
  • Conversational agents
  • Any LLM generation task where a balance of randomness and relevance is needed

Tips

  • Often combined with Temperature to fine-tune creativity vs precision.
  • Set Top-P ~0.8–0.95 for human-like fluency without excessive randomness.
  • Low values (0.1-0.5) produce focused outputs, medium (0.6-0.9) balance creativity and coherence, high (0.9-0.99) enable creative diversity.