6 days ago · 4d6d7d1776
--- a/notes/index.md
+++ b/notes/index.md
@@ -7,6 +7,7 @@ slug: /
 
															 ## Log
														
 
															+- 26/02/18 - 🤖 [projects/RAG](/notes/work/projects/ai-rag)
														
 
															 - 26/01/02 - 🎨 [art/paint](/notes/art/paint)
														
 
															 - 25/08/16 - 🚰 [house/bath](/notes/house/bath)
														
 
															 - 25/04/26 - 🎹 [music/music](/notes/music/)
														
--- a/notes/work/projects/ai-rag.md
+++ b/notes/work/projects/ai-rag.md
@@ -0,0 +1,106 @@
 
															+# RAG
														
 
															+
														
 
															+(26/02/18) Although I still primarily use the Lunr XML search feature, I occasionally use the AI helper and I want to get rid of the remote API calls. I'm adding this page for documenting the process. 
														
 
															+
														
 
															+## Overview
														
 
															+Build a fully local RAG system that ingests ~313 markdown files from the Docusaurus site, stores them in ChromaDB with Ollama embeddings, and provides a web UI for semantic search and Q&A.
														
 
															+
														
 
															+## Stack
														
 
															+- Python 3 with FastAPI for the web server
														
 
															+- Ollama with nomic-embed-text for embeddings
														
 
															+- ChromaDB for vector storage (persisted to disk)
														
 
															+- Simple HTML/JS frontend served by FastAPI
														
 
															+
														
 
															+## Directory Structure
														
 
															+
														
 
															+```sh
														
 
															+daw_til/
														
 
															+└── rag/
														
 
															+    ├── ingest.py          # Markdown parser + ChromaDB ingestion
														
 
															+    ├── server.py          # FastAPI web server + query endpoint
														
 
															+    ├── requirements.txt   # Python dependencies
														
 
															+    ├── templates/
														
 
															+    │   └── index.html     # Web UI (search box + results)
														
 
															+    └── chroma_db/         # ChromaDB persistence (gitignored)
														
 
															+```
														
 
															+
														
 
															+## Implementation Steps
														
 
															+1. Create rag/requirements.txt
														
 
															+```sh
														
 
															+chromadb
														
 
															+fastapi
														
 
															+uvicorn[standard]
														
 
															+ollama
														
 
															+python-frontmatter
														
 
															+```
														
 
															+
														
 
															+2. Create ```rag/ingest.py``` — Markdown Ingestion Pipeline
														
 
															+- Walk docs/, notes/, lists/, posts/ directories
														
 
															+- Parse each .md file:
														
 
															+  - Extract YAML frontmatter (posts have title, slug, description, tags)
														
 
															+  - For docs/notes/lists: derive title from first # heading
														
 
															+  - Extract date from post filenames (YYYY-MM-DD pattern)
														
 
															+  - Categorize by directory: docs, notes, lists, posts
														
 
															+- Chunking strategy:
														
 
															+  - Split on markdown headers (##, ###) to keep semantic sections intact
														
 
															+  - For sections exceeding ~1000 chars, further split on paragraphs
														
 
															+  - Each chunk gets metadata: source (file path), category, section (header path), title, tags (if post)
														
 
															+- Generate embeddings via Ollama (nomic-embed-text)
														
 
															+- Upsert into ChromaDB collection with metadata
														
 
															+- Print progress (file count, chunk count, timing)
														
 
															+3. Create ```rag/server.py``` — FastAPI Query Server
														
 
															+- ```GET /``` — serve the HTML UI
														
 
															+- ```POST /query``` — accept a search query, embed it with Ollama, query ChromaDB for top-k results (default 8), return JSON with chunks + metadata + distances
														
 
															+- ```GET /stats``` — return collection stats (doc count, chunk count)
														
 
															+- CORS enabled for local dev
														
 
															+4. Create ```rag/templates/index.html``` — Web UI
														
 
															+- Clean, minimal search interface
														
 
															+- Search box with submit button
														
 
															+- Results displayed as cards showing:
														
 
															+  - Matched text snippet
														
 
															+  - Source file path (clickable link to Docusaurus page)
														
 
															+  - Category badge (docs/notes/lists/posts)
														
 
															+  - Relevance score
														
 
															+- Loading spinner during search
														
 
															+5. Add ```rag/``` to ```.gitignore```
														
 
															+Add ```rag/chroma_db/``` to gitignore to keep vector data out of git
														
 
															+
														
 
															+## Metadata Schema (ChromaDB)
														
 
															+Each chunk stored with:
														
 
															+```js
														
 
															+{
														
 
															+    "source": "docs/lang/JavaScript.md",  # relative file path
														
 
															+    "category": "docs",                    # docs|notes|lists|posts
														
 
															+    "title": "JavaScript",                 # doc/post title
														
 
															+    "section": "Arrays > Map and Filter",  # header breadcrumb
														
 
															+    "tags": "tech",                        # comma-separated (posts only)
														
 
															+    "date": "2025-03-03",                  # posts only
														
 
															+    "url": "/til/docs/lang/JavaScript"     # reconstructed Docusaurus URL
														
 
															+}
														
 
															+```
														
 
															+
														
 
															+## Prerequisites
														
 
															+- Ollama installed and running with nomic-embed-text model pulled
														
 
															+- Python 3.10+
														
 
															+
														
 
															+## Usage
														
 
															+```sh
														
 
															+cd rag
														
 
															+pip install -r requirements.txt
														
 
															+ollama pull nomic-embed-text
														
 
															+
														
 
															+# Ingest all markdown files (run once, re-run to rebuild)
														
 
															+python ingest.py
														
 
															+
														
 
															+# Start the web UI
														
 
															+python server.py
														
 
															+# Open http://localhost:8808
														
 
															+```
														
 
															+
														
 
															+## Verification
														
 
															+1. Run ```python ingest.py``` — should report ~300+ files processed and ~1000+ chunks created
														
 
															+2. Run ```python server.py``` — should start on port 8808
														
 
															+3. Open browser to ```http://localhost:8808```
														
 
															+4. Search for "PostgreSQL" — should return relevant chunks from ```docs/db/PostgreSQL.md```
														
 
															+5. Search for "Docker" — should return chunks from docs/server/Docker.md
														
 
															+6. Search for a conceptual query like "how to set up SSL certificates" — should return relevant Let's Encrypt / Apache / Nginx docs
														
--- a/sidebarsnotes.js
+++ b/sidebarsnotes.js
@@ -164,6 +164,7 @@ module.exports = {
 
															           },
														
 
															           items: [
														
 
															             'work/projects/ai',
														
 
															+            'work/projects/ai-rag',
														
 
															             'work/projects/game',
														
 
															             'work/projects/gzet',
														
 
															             'work/projects/ham',