intermediateยท18 min

Retrieval-Augmented Generation (RAG)

Learn how RAG combines search engines with AI to answer questions accurately using real, up-to-date knowledge โ€” the technique powering most enterprise AI today.

๐Ÿง‘For teens & curious minds
Standard AI language models are trained on data up to a cutoff date โ€” they cannot access the internet or your private documents. RAG fixes this by adding a retrieval step: before generating an answer, the AI searches a knowledge base (your documents, a database, or a search engine), pulls back the most relevant chunks of text, then feeds those chunks plus your question into the LLM as context. The result is answers grounded in real, current information rather than hallucinated guesses. This is how tools like Perplexity, NotebookLM, and most enterprise chatbots work.
๐Ÿ’กVisual Analogy

Think of RAG like a brilliant research assistant. Before answering any question, they sprint to the library, grab the three most relevant books, open them to the right pages, and then give you an answer โ€” pointing to the exact passages they used. They never make things up because the books are right there in front of them.

Key Terms

Vector Embedding:A numerical representation of text (or images) as a point in high-dimensional space, where similar meanings are close together.
Vector Database:A database optimised for storing and searching embeddings using similarity search (e.g. Pinecone, Weaviate, pgvector).
Chunking:Breaking documents into smaller pieces so they fit in the LLM context window and can be retrieved individually.
Semantic Search:Finding documents based on meaning similarity rather than exact keyword matching.
Grounding:Anchoring AI responses in retrieved factual sources to reduce hallucination.
Reranking:A second-pass model that reorders retrieved chunks by relevance before passing to the LLM.

๐ŸŽฏ Fun Facts

  • โ€ขRAG was introduced in a 2020 Facebook AI Research paper and is now used by virtually every enterprise AI product.
  • โ€ขVector databases like Pinecone grew from near-zero to billion-dollar valuations largely because of RAG adoption.
  • โ€ขWithout RAG, ChatGPT would not know anything that happened after its training cutoff โ€” RAG is what makes AI assistants feel 'current'.
  • โ€ขA single well-tuned RAG pipeline can reduce AI hallucination rates by over 50% compared to a standalone LLM.
  • โ€ขpgvector lets you run RAG directly in PostgreSQL โ€” no separate vector database needed.

Real World Examples

  • โœ“Perplexity AI retrieves live web pages before generating its answers, citing sources.
  • โœ“NotebookLM (Google) lets you upload PDFs and ask questions โ€” it uses RAG over your documents.
  • โœ“Enterprise chatbots at banks and insurers use RAG over internal policy documents so the AI gives accurate, policy-compliant answers.
  • โœ“GitHub Copilot Chat uses RAG over your open files and project context to give code-aware suggestions.
  • โœ“Customer support bots use RAG over FAQ databases, reducing escalations to human agents by 40-70%.