RAG Pipeline — Qdrant + Document Ingestion
Make your AI answer questions from your actual documents, codebase, and knowledge base.
Completed Parts 1–2 (Ollama + Open WebUI), Docker, basic Python familiarity
35–45 minutes
4GB ($20/mo) minimum. 8GB ($40/mo) for larger document sets
Introduction
LLMs are powerful but they hallucinate and don't know your data. RAG (Retrieval-Augmented Generation) fixes this by feeding your actual documents into the LLM's context at query time. This part builds a production-grade pipeline — not a toy demo.
💰 Cost comparison: Pinecone Serverless costs $70+/month for vector storage. Qdrant on your VPS: $0 — unlimited vectors, unlimited queries.
RAG Architecture Overview
Documents → Chunking → Embedding → Vector DB → Query → Context Injection → LLM Response
┌──────────┐ ┌──────────┐ ┌──────────────┐ ┌─────────┐
│ Upload │───▸│ Chunking │───▸│ Embedding │───▸│ Qdrant │
│ PDFs, │ │ Split │ │ nomic-embed │ │ Vector │
│ Docs, │ │ into │ │ via Ollama │ │ DB │
│ Code │ │ segments │ │ │ │ │
└──────────┘ └──────────┘ └──────────────┘ └────┬────┘
│
┌──────────┐ ┌──────────┐ ┌──────────────┐ │
│ LLM │◂───│ Context │◂───│ Retrieve │◂────────┘
│ Response │ │ Inject │ │ Top-K docs │
└──────────┘ └──────────┘ └──────────────┘The embedding model (nomic-embed-text) runs locally via Ollama — no external API calls needed.
ollama pull nomic-embed-textDeploying Qdrant
Add Qdrant to your AI stack:
mkdir -p ~/ai-stack/qdrant && cd ~/ai-stack/qdrantversion: "3.8"
services:
qdrant:
image: qdrant/qdrant:latest
container_name: qdrant
restart: unless-stopped
ports:
- "6333:6333"
- "6334:6334"
environment:
- QDRANT__SERVICE__API_KEY=your-qdrant-api-key-change-this
volumes:
- qdrant-data:/qdrant/storage
volumes:
qdrant-data:docker compose up -dQdrant dashboard is available at http://your-server-ip:6333/dashboard.
Why Qdrant over ChromaDB?
| Feature | Qdrant | ChromaDB |
|---|---|---|
| Performance at scale | Excellent | Degrades |
| Filtering | Advanced payload filtering | Basic metadata |
| Production readiness | Built for production | Better for prototyping |
| Memory efficiency | On-disk + quantization | Primarily in-memory |
Document Ingestion Pipeline
Create a Python-based ingestion script using LangChain:
pip install langchain langchain-community qdrant-client pypdf2 pdfplumber#!/usr/bin/env python3
"""Document ingestion pipeline for Qdrant + Ollama RAG."""
import os
from langchain_community.document_loaders import PyPDFLoader, TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.embeddings import OllamaEmbeddings
from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams, PointStruct
import uuid
# Configuration
OLLAMA_URL = "http://localhost:11434"
QDRANT_URL = "http://localhost:6333"
QDRANT_API_KEY = "your-qdrant-api-key-change-this"
COLLECTION_NAME = "documents"
EMBEDDING_MODEL = "nomic-embed-text"
CHUNK_SIZE = 500
CHUNK_OVERLAP = 50
# Initialize
embeddings = OllamaEmbeddings(
base_url=OLLAMA_URL,
model=EMBEDDING_MODEL
)
qdrant = QdrantClient(url=QDRANT_URL, api_key=QDRANT_API_KEY)
# Create collection if it doesn't exist
collections = [c.name for c in qdrant.get_collections().collections]
if COLLECTION_NAME not in collections:
qdrant.create_collection(
collection_name=COLLECTION_NAME,
vectors_config=VectorParams(size=768, distance=Distance.COSINE)
)
# Text splitter
splitter = RecursiveCharacterTextSplitter(
chunk_size=CHUNK_SIZE,
chunk_overlap=CHUNK_OVERLAP,
separators=["\n\n", "\n", ". ", " "]
)
def ingest_file(filepath: str):
"""Ingest a single file into the vector database."""
ext = os.path.splitext(filepath)[1].lower()
if ext == ".pdf":
loader = PyPDFLoader(filepath)
elif ext in [".txt", ".md", ".py", ".js", ".ts"]:
loader = TextLoader(filepath)
else:
print(f"Skipping unsupported file: {filepath}")
return
docs = loader.load()
chunks = splitter.split_documents(docs)
for chunk in chunks:
vector = embeddings.embed_query(chunk.page_content)
qdrant.upsert(
collection_name=COLLECTION_NAME,
points=[PointStruct(
id=str(uuid.uuid4()),
vector=vector,
payload={
"text": chunk.page_content,
"source": filepath,
"page": chunk.metadata.get("page", 0)
}
)]
)
print(f"Ingested {len(chunks)} chunks from {filepath}")
if __name__ == "__main__":
import sys
for path in sys.argv[1:]:
ingest_file(path)# Ingest a single PDF
python3 ingest.py /path/to/document.pdf
# Ingest multiple files
python3 ingest.py docs/*.pdf docs/*.mdChunking Strategy
| Chunk Size | Overlap | Best For |
|---|---|---|
| 200–300 | 30 | FAQ, short-form Q&A |
| 500 (default) | 50 | General documents, best balance |
| 1000–1500 | 200 | Technical docs, code files |
Connecting RAG to Open WebUI
Configure Open WebUI to use your Qdrant-based RAG pipeline:
- Navigate to Admin Panel → Settings → Documents
- Set the Embedding Model to
nomic-embed-text - Configure the vector database connection to Qdrant
- Set Top-K to 5 (retrieve 5 most relevant chunks)
- Set Similarity Threshold to 0.7 (filter out weak matches)
Test by uploading a document through Open WebUI and asking questions about its content. Compare the quality of responses with and without RAG.
Optimizing Retrieval Quality
Troubleshooting Common Issues
| Problem | Cause | Fix |
|---|---|---|
| Irrelevant results | Chunks too large | Reduce chunk size to 300–500 |
| Missing context | Chunks too small | Increase chunk size, add overlap |
| Hallucination despite RAG | Low similarity threshold | Increase threshold to 0.75+ |
| Slow retrieval | Large collection | Enable on-disk storage, add indexes |
Batch Ingestion Script
A production-ready script that watches a directory for new documents:
#!/bin/bash
# Watch a directory and auto-ingest new documents
WATCH_DIR="/home/user/documents"
LOG_FILE="/var/log/rag-ingestion.log"
inotifywait -m -r -e create -e moved_to "$WATCH_DIR" |
while read dir action file; do
filepath="$dir$file"
echo "$(date): New file detected: $filepath" >> "$LOG_FILE"
python3 /home/user/ai-stack/rag/ingest.py "$filepath" >> "$LOG_FILE" 2>&1
donesudo apt install -y inotify-tools
chmod +x watch-and-ingest.shWhat's Next?
Your AI now answers from your actual documents — company wikis, technical docs, contracts, code. Zero data sent to third parties. In Part 4: AnythingLLM, we'll put this power in the hands of non-technical team members with a no-code AI app builder.
