Part 3: Vector Embeddings Made Simple

You have your writing collected (Part 2). Now we turn it into numbers AI can understand.

What Are Embeddings?

Simple version: Math that captures meaning.

"King" and "Queen" are similar → their embedding vectors are close. "King" and "Banana" are different → vectors are far apart.

Technically: Arrays of numbers (usually 384-1536 dimensions).

"Hello world" → [0.23, -0.45, 0.67, ..., 0.12]  # 384 numbers

Similar meanings = similar numbers.

Why This Matters

Your AI twin needs to know:

"How do I..." and "What's the process for..." mean similar things
Your formal LinkedIn posts vs casual tweets are both "you"
Technical writing vs storytelling have different embeddings

Embeddings capture style, tone, and meaning.

Create Embeddings (15 Lines)

from sentence_transformers import SentenceTransformer
import json

# Load model (downloads ~100MB first time)
model = SentenceTransformer('all-MiniLM-L6-v2')

# Load your writing
with open('data/raw/combined.json') as f:
    writings = json.load(f)

# Create embeddings
for item in writings:
    text = item['text']
    embedding = model.encode(text)
    item['embedding'] = embedding.tolist()

# Save
with open('data/processed/embeddings.json', 'w') as f:
    json.dump(writings, f)

That's it. Your writing is now searchable by meaning.

Store in ChromaDB (Vector Database)

import chromadb

client = chromadb.Client()
collection = client.create_collection("my_writing")

for i, item in enumerate(writings):
    collection.add(
        ids=[f"doc_{i}"],
        embeddings=[item['embedding']],
        documents=[item['text']],
        metadatas=[{'platform': item['platform']}]
    )

Test Semantic Search

# Query
query = "how to build web applications"
query_embedding = model.encode(query)

# Search
results = collection.query(
    query_embeddings=[query_embedding.tolist()],
    n_results=5
)

# Results are your posts most similar to the query!
for doc in results['documents'][0]:
    print(doc[:200])

Next Week

Part 4: Fine-tune a model on your embeddings. Make it write like you!

Homework: Run the code above, verify semantic search works.

Series Progress: Part 3 of 6 ✓ Next: Part 4 - Fine-Tuning (Aug 10)

Part 3: Vector Embeddings Made Simple

What Are Embeddings?

Why This Matters

Create Embeddings (15 Lines)

Store in ChromaDB (Vector Database)

Test Semantic Search

Next Week

Share this article

Related Articles

Build an LLM Twin: Part 6 - Deploy to Production

Build an LLM Twin: Part 5 - Building the Inference API

Build an LLM Twin: Part 4 - Fine-Tuning on Your Writing