From zero to a RAG system: successes and failures

The author shares a detailed, candid account of their journey to develop an internal Retrieval Augmented Generation (RAG) system with a local LLM. Tasked with creating a chat tool to answer questions from a decade's worth of company projects and 1TB of unstructured data, the project presented numerous unexpected challenges, transforming what seemed like a simple task into an "emotional roller coaster."

Technology Stack Selection: Initially, the author chose Ollama for local LLaMA models, nomic-embed-text for embeddings, LlamaIndex for RAG orchestration, and Python for development. Early tests were promising, leading to initial overconfidence.
Document Chaos Mitigation: Faced with hundreds of gigabytes of unorganized documents, the system crashed due to processing irrelevant large files (videos, images, backups). The solution involved implementing robust filtering by file extension and name patterns, reducing the data by 54% and resolving memory issues.
Scaling Indexing: The default LlamaIndex JSON storage proved unmanageable for 451GB of data, leading to slow processing and data corruption. Migrating to ChromaDB as a dedicated vector database enabled batch processing, checkpointing, and reliable storage, fundamentally changing the indexing approach.
Hardware Bottleneck: Local CPU processing was too slow. A virtual machine with an NVIDIA RTX 4000 SFF Ada GPU was rented, significantly accelerating the indexing process (which still took several weeks), costing 184 euros.
User Experience & Deployment: A Flask API and Streamlit UI were built for the frontend. A key challenge was serving document references without storing the entire 451GB corpus on the production VM's limited disk space. This was solved by using Azure Blob Storage to serve original documents via SAS tokens on demand.
Final Architecture: The robust solution integrated Ollama, nomic-embed-text, ChromaDB (HNSW), LlamaIndex, Flask, Streamlit, Docker Compose, NVIDIA Container Toolkit, and Azure Blob Storage.
Lessons Learned: Key takeaways included managing memory with batch processing and explicit garbage collection, implementing error tolerance for problematic files, using checkpoints for long-running processes, and thorough monitoring.

While acknowledging it's not a perfect system, the author expresses satisfaction with the fast, reliable, and useful tool delivered to colleagues. The primary advice for others building similar systems is to invest heavily in building and curating high-quality source data.

From zero to a RAG system: successes and failures

The Lowdown

The Gossip

ChromaDB Clarification Conundrum

Literary RAG's Local Longings