RAG

Retrieval-Augmented Generation

RAG is an AI architecture that combines retrieval (searching external data) with generation (language model output) to produce more accurate and informed responses.


Core Components

1. Retriever

  • Finds relevant documents from a knowledge base (vector store, database, etc.)
  • Uses Vector Embeddings to perform semantic search

2. Generator

  • A large language model (LLM) that takes the retrieved documents and query
  • Produces a final, informed answer

How It Works

  1. User query → converted to an embedding vector
  2. Retriever searches for relevant documents using similarity search
  3. Top-k documents + original query → passed to the Generator
  4. LLM uses both to generate the response

Benefits

  • Grounds the LLM in factual, real-time data
  • Works with custom knowledge bases
  • Avoids retraining the LLM when new info is added
  • Improves accuracy and reduces hallucination

Use Cases

  • Chatbots with proprietary knowledge
  • AI assistants with document search
  • Customer service agents
  • Legal, research, and data analysis tools

  • Vector databases (e.g. Pinecone, FAISS)
  • Vector Embeddings (e.g. OpenAI, Cohere, Sentence Transformers)
  • Prompt engineering for context injection