(26/02/18) Although I still primarily use the Lunr XML search feature, I occasionally use the AI helper and I want to get rid of the remote API calls. I'm adding this page for documenting the process.
Build a fully local RAG system that ingests ~313 markdown files from the Docusaurus site, stores them in ChromaDB with Ollama embeddings, and provides a web UI for semantic search and Q&A.
daw_til/
└── rag/
├── ingest.py # Markdown parser + ChromaDB ingestion
├── server.py # FastAPI web server + query endpoint
├── requirements.txt # Python dependencies
├── templates/
│ └── index.html # Web UI (search box + results)
└── chroma_db/ # ChromaDB persistence (gitignored)
Create rag/requirements.txt
chromadb
fastapi
uvicorn[standard]
ollama
python-frontmatter
rag/ingest.py — Markdown Ingestion Pipelinerag/server.py — FastAPI Query ServerGET / — serve the HTML UIPOST /query — accept a search query, embed it with Ollama, query ChromaDB for top-k results (default 8), return JSON with chunks + metadata + distancesGET /stats — return collection stats (doc count, chunk count)rag/templates/index.html — Web UIrag/ to .gitignore
Add rag/chroma_db/ to gitignore to keep vector data out of gitEach chunk stored with:
{
"source": "docs/lang/JavaScript.md", # relative file path
"category": "docs", # docs|notes|lists|posts
"title": "JavaScript", # doc/post title
"section": "Arrays > Map and Filter", # header breadcrumb
"tags": "tech", # comma-separated (posts only)
"date": "2025-03-03", # posts only
"url": "/til/docs/lang/JavaScript" # reconstructed Docusaurus URL
}
cd rag
pip install -r requirements.txt
ollama pull nomic-embed-text
# Ingest all markdown files (run once, re-run to rebuild)
python ingest.py
# Start the web UI
python server.py
# Open http://localhost:8808
python ingest.py — should report ~300+ files processed and ~1000+ chunks createdpython server.py — should start on port 8808http://localhost:8808docs/db/PostgreSQL.md