Greedy Decoding

Greedy decoding is the simplest decoding strategy for language models. At each generation step, it selects the token with the highest probability, without considering alternative candidates.

Purpose

To produce deterministic, fast, and low-variance outputs—typically when correctness and consistency matter more than creativity.

How It Works

Start with the input prompt.
At each step, compute the probability distribution over the vocabulary.
Select the single most probable token.
Append it to the sequence and repeat until reaching a stop condition.

This process generates the maximum likelihood sequence, step-by-step.

Characteristics

Deterministic: Same prompt = same output every time.
Fast: No sampling or ranking required.
Low diversity: Can lead to bland or repetitive text.
Locally optimal: May miss globally optimal sequences.

Use Cases

Structured tasks where consistency is critical
Code generation
Factual Q&A
Finishing templated responses

Limitations

Can fall into loops or repetitive phrasing
Lacks variation
Doesn’t explore alternative paths that might lead to better long-term outcomes

Alternatives

Links to this note

Decoding Strategies

Decoding strategies define how language models convert predicted token probabilities into coherent text. They control determinism, diversity, and

Sampling Parameters

Sampling parameters control how language models choose the next token during generation. They directly affect the randomness, coherence, and