Greedy Decoding
Greedy decoding is the simplest decoding strategy for language models. At each generation step, it selects the token with the highest probability, without considering alternative candidates.
Purpose
To produce deterministic, fast, and low-variance outputs—typically when correctness and consistency matter more than creativity.
How It Works
- Start with the input prompt.
- At each step, compute the probability distribution over the vocabulary.
- Select the single most probable token.
- Append it to the sequence and repeat until reaching a stop condition.
This process generates the maximum likelihood sequence, step-by-step.
Characteristics
- Deterministic: Same prompt = same output every time.
- Fast: No sampling or ranking required.
- Low diversity: Can lead to bland or repetitive text.
- Locally optimal: May miss globally optimal sequences.
Use Cases
- Structured tasks where consistency is critical
- Code generation
- Factual Q&A
- Finishing templated responses
Limitations
- Can fall into loops or repetitive phrasing
- Lacks variation
- Doesn’t explore alternative paths that might lead to better long-term outcomes
Alternatives
Related Notes
- Sampling Parameters
- LLM Generation
- Deterministic Decoding
- Prompt Engineering Roadmap
- Decoding Strategies
Links to this note
Decoding Strategies
Decoding strategies define how language models convert predicted token probabilities into coherent text. They control determinism, diversity, and
Sampling Parameters
Sampling parameters control how language models choose the next token during generation. They directly affect the randomness, coherence, and