Context Window
The context window refers to the maximum number of tokens a language model can process in a single request. It includes both the input prompt and the generated output.
Why It Matters
LLMs have a fixed memory window. If the total number of tokens exceeds this limit, the oldest tokens are truncated, which can lead to loss of important context and degrade output quality.
Example
- Model: GPT-4-1106-preview
- Max context window: 128,000 tokens
- If your input prompt is 100,000 tokens, you can only generate up to 28,000 tokens in response.
Implications
- Prompt Engineering: You must strategically fit your instructions, examples, and context within the limit.
- RAG Pipelines: Retrieved content must be chunked to fit within available space.
- Long Conversations: Older messages may be forgotten unless explicitly re-injected.
- Streaming / Iterative Outputs: Large tasks may need to be broken up across multiple calls.
Token Budgeting Tips
- Pre-tokenize your input using tools like
tiktoken
or Hugging Face tokenizers. - Keep prompts concise and remove redundant content.
- Consider truncating or summarizing less-relevant context.
- Structure your prompt to prioritize high-value content near the end (least likely to be truncated).
Related Notes
Hallucinations
occur when a language model generates content that is confidently wrong, fabricated, or misleading, despite sounding plausible. These
Tokens
are the basic units of text that Large Language Models (LLMs) use for processing and generation. A token can represent a full word, part of a
Prompt Engineering
LLMs Prompt Engineering Prompt Engineering LLMs Tokens Context Window Hallucination AI Agents Prompt Injection Model Weights and Parameters
Models by Anthropic
Anthropic is the AI research company behind the Claude family of large language models. These models are designed with a strong emphasis on safety,