How RAG Works: Give Your AI a Memory

RAG stands for Retrieval-Augmented Generation. Fancy name, simple idea:

Give your AI Google for its brain.

Here's why that matters and how it works.

The Problem: AI Can't Remember Everything

Language models like ChatGPT are trained on internet data from before 2023. Ask about your company's internal docs? It has no clue.

Options:

  1. Retrain the whole model (costs $1M+, takes months)
  2. Fine-tune it (costs $10K+, forgets other stuff)
  3. RAG (costs $10, takes 1 hour) ← We're doing this

What RAG Does (Simple Explanation)

Instead of cramming everything into the AI's brain during training:

  1. Store your documents in a searchable database
  2. When user asks a question, search for relevant docs
  3. Give AI the search results + the question
  4. AI answers using the docs you just handed it

Analogy: Open book test vs. closed book test.

The 3 Steps of RAG

Step 1: Retrieval (Find Relevant Stuff)

User asks: "What's our return policy?"

System searches your docs and finds:

  • Returns accepted within 30 days
  • Original packaging required
  • Refund processed in 5-7 business days

Step 2: Augmentation (Add Context to Prompt)

Original prompt:

User: What's our return policy?

Augmented prompt:

Context: [Return policy doc text here]

User: What's our return policy?

Answer using only the context above.

Step 3: Generation (AI Answers with Facts)

AI responds using the context, not making stuff up.

Result: Accurate answer with citations.

Why RAG Beats Fine-Tuning

Feature RAG Fine-Tuning
Cost $10 $10,000
Time 1 hour 1 week
Update docs Just add files Retrain entire model
Hallucinations Rare (cites sources) Common
Private data Stays private Goes into model

The Tech Stack (What You Actually Need)

1. Embedding Model

Turns text into numbers (vectors).

Free option: all-MiniLM-L6-v2 (runs on CPU) Better option: bge-large-en-v1.5 (needs GPU)

2. Vector Database

Stores embeddings, finds similar ones fast.

Options:

  • ChromaDB (easiest, runs locally)
  • Pinecone (cloud, free tier: 1GB)
  • Weaviate (open source, self-hosted)
  • FAISS (Facebook's library, fastest)

3. LLM

The actual AI that generates answers.

Options:

  • OpenAI API ($0.002/1K tokens)
  • Local models (free but needs GPU)
  • Groq API (fastest, free tier available)

Simple RAG in 50 Lines of Python

from sentence_transformers import SentenceTransformer
import chromadb
from openai import OpenAI

# 1. Initialize
embedder = SentenceTransformer('all-MiniLM-L6-v2')
db = chromadb.Client()
collection = db.create_collection("docs")
client = OpenAI()

# 2. Add documents
docs = [
    "Our return policy allows 30-day returns...",
    "Shipping takes 5-7 business days...",
    "Customer support available 9-5 EST..."
]

for i, doc in enumerate(docs):
    embedding = embedder.encode(doc)
    collection.add(
        embeddings=[embedding.tolist()],
        documents=[doc],
        ids=[f"doc_{i}"]
    )

# 3. Query
question = "What's the return policy?"
q_embedding = embedder.encode(question)

results = collection.query(
    query_embeddings=[q_embedding.tolist()],
    n_results=2
)

# 4. Augment and generate
context = "\n".join(results['documents'][0])
prompt = f"Context: {context}\n\nQuestion: {question}\n\nAnswer:"

response = client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": prompt}]
)

print(response.choices[0].message.content)

That's it. RAG in under 50 lines.

Advanced RAG Techniques (For Later)

Once you master basics:

  • Hybrid search (semantic + keyword)
  • Reranking (improve search quality)
  • Query decomposition (break complex questions)
  • Agentic RAG (AI decides when to search)
  • Multi-hop (combine info from multiple docs)

Common Mistakes

1. Chunks too big - AI gets confused with 2000-word chunks. Keep it 200-500 words.

2. Bad search - Returns irrelevant docs. Use better embeddings or hybrid search.

3. No citations - Always return source documents. Users want to verify.

4. Ignoring cost - Embedding 1M documents costs $$. Cache embeddings!

Why You Should Care

Every company needs RAG for:

  • Customer support (search knowledge base)
  • Internal docs (find that one Slack message)
  • Code search (find similar bugs)
  • Legal/compliance (cite regulations accurately)

Market size: RAG market growing from $1.2B (2024) to $30B+ (2030)

This skill pays.

Next Week

Part 2: We'll scrape LinkedIn, Twitter, and Medium to collect YOUR writing. Bring your login credentials!

Homework:

  • Run the 50-line example above
  • Join our Discord
  • Star the GitHub repo

Welcome to Science Church. Class is in session.


Series Progress: Part 1 of 6 Complete ✓ Next: Part 2 - Scraping Your Digital Self (Aug 31) GitHub: Full code + notebooks Discord: Live Q&A every Sunday