Tokens

Tokens are the basic units of text that Large Language Models (LLMs) use for processing and generation. A token can represent a full word, part of a word (subword), punctuation, or even whitespace—depending on the model’s tokenizer.

Purpose

LLMs don’t operate on raw text—they operate on sequences of tokens. All generation, memory limits, and costs are measured in terms of tokens.

How Tokenization Works

  • Input text is broken down using a tokenizer.
  • Common schemes: Byte Pair Encoding (BPE), WordPiece, SentencePiece.
  • For example:
    • ChatGPT is great!” → ["Chat", "G", "PT", " is", " great", "!"]

Why Tokens Matter

  • Prediction Unit: LLMs predict one token at a time.
  • Cost Unit: OpenAI, Anthropic, etc., bill usage per token.
  • Limit Unit: Each model has a max token window (e.g., 4k, 8k, 128k).
  • Performance Impact: More tokens = more compute and latency.

Estimating Token Counts

  • English: ~1 token ≈ ¾ of a word.
  • 100 tokens ≈ 75 words ≈ 5–7 sentences (rough estimate).
  • Tools: OpenAI’s tokenizer, tiktoken (Python lib), Hugging Face tokenizers.

Use Cases

  • Token counting for cost estimation.
  • Truncating or chunking input to fit within model limits.
  • Token-level control in generation tasks (e.g., summaries, classification).

Practical Considerations

  • Always account for both input and output tokens in API usage.
  • Use tokenizer libraries to test and preview token behavior.
  • Prompt structure can greatly affect tokenization and model efficiency.