# Building Production-Ready RAG Systems with Gaia and Weaviate
> **How to replace OpenAI APIs with decentralized infrastructure while using Weaviate as a superior vector database alternative**
---
## TL;DR
This post demonstrates how to build a production-ready Retrieval Augmented Generation (RAG) system using:
- **๐ Gaia**: Decentralized AI infrastructure with OpenAI-compatible APIs
- **๐๏ธ Weaviate**: Advanced vector database replacing traditional solutions
- **๐ Real-World Data**: Live integration with Wikipedia, ArXiv, GitHub, and news sources
**Key Result**: A complete RAG pipeline that processes 50+ documents, performs semantic search, and generates responses using decentralized AI infrastructure.
---
## ๐ฏ Why This Matters
Traditional RAG systems rely on centralized providers like OpenAI, creating single points of failure and vendor lock-in. This architecture demonstrates:
1. **Decentralization**: Use public Gaia nodes instead of centralized APIs
2. **Flexibility**: Replace built-in vector stores with specialized solutions
3. **Real-World Data**: Process live data from multiple internet sources
4. **Production Ready**: Environment configuration, health monitoring, error handling
---
## ๐ง Understanding the Platforms
### Gaia: Decentralized AI Infrastructure
**What is Gaia?**
[Gaia](https://gaianet.ai) is a decentralized infrastructure for AI agents that provides OpenAI-compatible APIs while running on distributed nodes.
**Key Features:**
- **OpenAI Compatibility**: Drop-in replacement for OpenAI APIs
- **Decentralized**: No single point of failure
- **Model Flexibility**: Support for Llama, Qwen, Gemma, and other open models
**Example Gaia Node:**
```
https://0x299eae67ba6bbae8d61faad2d70115dc5a6855c8.gaia.domains/v1
```
**Example Gaia Node Config:**
```json
{
"address": "",
"chat": "https://huggingface.co/gaianet/gemma-3-4b-it-GGUF/resolve/main/gemma-3-4b-it-Q5_K_M.gguf",
"chat_batch_size": "128",
"chat_ctx_size": "8192",
"chat_name": "Gemma-3.4B-IT",
"chat_ubatch_size": "128",
"context_window": "1",
"description": "Gaia node running with Gemma-3.4B-IT model without any knowledgebase.",
"domain": "gaia.domains",
"embedding": "https://huggingface.co/gaianet/gte-Qwen2-1.5B-instruct-GGUF/resolve/main/gte-Qwen2-1.5B-instruct-f16.gguf",
"embedding_batch_size": "8192",
"embedding_collection_name": "default",
"embedding_ctx_size": "8192",
"embedding_name": "gte-Qwen2-1.5B-instruct-f16",
"embedding_ubatch_size": "8192",
"llamaedge_chat_port": "9075",
"llamaedge_embedding_port": "9076",
"llamaedge_port": "8086",
"prompt_template": "gemma-3",
"qdrant_limit": "1",
"qdrant_score_threshold": "0.5",
"rag_policy": "system-message",
"rag_prompt": "Use the following information to answer the question.\n----------------\n",
"reverse_prompt": "",
"snapshot": "",
"system_prompt": "You're a helpful assistant"
}
```
### 5. RAG Pipeline Implementation
Complete RAG flow with context integration:
### Weaviate: Advanced Vector Database
**What is Weaviate?**
[Weaviate](https://weaviate.io) is an open-source vector database designed for AI applications, offering advanced features beyond simple vector storage.
**Why Choose Weaviate Over Qdrant?**
| Feature | Weaviate | Qdrant (Gaia Default) |
|---------|----------|----------------------|
| **GraphQL API** | โ Native support | โ REST only |
| **Multi-tenancy** | โ Built-in | โ ๏ธ Limited |
| **Vectorizers** | โ 10+ options | โ ๏ธ Basic |
| **Hybrid Search** | โ Vector + BM25 | โ ๏ธ Vector only |
| **Real-time Updates** | โ Live indexing | โ ๏ธ Batch-oriented |
| **Schema Flexibility** | โ Dynamic schemas | โ ๏ธ Static |
**Weaviate Vectorizer Options:**
```python
# Local embeddings (no API key needed)
VECTORIZER_MODULE=text2vec-transformers
# OpenAI embeddings
VECTORIZER_MODULE=text2vec-openai
OPENAI_API_KEY=your-key
# Cohere embeddings
VECTORIZER_MODULE=text2vec-cohere
COHERE_API_KEY=your-key
```
---
## ๐๏ธ System Architecture
```mermaid
graph TB
subgraph "๐ Data Sources"
A[๐ Wikipedia<br/>AI Articles]
B[๐ฌ ArXiv<br/>Research Papers]
C[๐ GitHub<br/>Documentation]
D[๐ฐ RSS Feeds<br/>Tech News]
end
subgraph "โ๏ธ Processing Pipeline"
E[๐ Data Fetcher<br/>Rate Limited]
F[โ๏ธ Text Chunker<br/>Smart Splitting]
G[๐ท๏ธ Metadata Extractor<br/>Structured Data]
end
subgraph "๐๏ธ Storage Layer"
H[๐ง Weaviate Vector DB<br/>text2vec-transformers]
I[๐ฏ Semantic Search<br/>Similarity Matching]
end
subgraph "๐ค Inference Layer"
J[๐ Gaia Node<br/>Gemma-3.4B-IT]
K[๐ญ Context Integration<br/>RAG Pipeline]
end
subgraph "๐ฅ๏ธ Application Layer"
L[๐ฎ Interactive Demo<br/>CLI Interface]
M[๐ฅ Health Monitoring<br/>System Status]
N[โ๏ธ Environment Config<br/>.env Management]
end
A --> E
B --> E
C --> E
D --> E
E --> F
F --> G
G --> H
H --> I
I --> K
J --> K
K --> L
L --> M
N --> L
style A fill:#e1f5fe
style H fill:#f3e5f5
style J fill:#fff3e0
style L fill:#e8f5e8
```
---
## ๐ ๏ธ Implementation Deep Dive
### 1. Start Weaviate with Docker Compose
Create a `docker-compose.yml` file with this production-ready configuration:
```yaml
---
services:
weaviate:
command:
- --host
- 0.0.0.0
- --port
- '8080'
- --scheme
- http
image: cr.weaviate.io/semitechnologies/weaviate:1.30.0
ports:
- 8080:8080
- 50051:50051
restart: on-failure:0
environment:
TRANSFORMERS_INFERENCE_API: 'http://t2v-transformers:8080'
QNA_INFERENCE_API: 'http://qna-transformers:8080'
OPENAI_APIKEY: $OPENAI_APIKEY
QUERY_DEFAULTS_LIMIT: 25
AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: 'true'
PERSISTENCE_DATA_PATH: '/var/lib/weaviate'
DEFAULT_VECTORIZER_MODULE: 'text2vec-transformers'
ENABLE_MODULES: 'text2vec-transformers,qna-transformers,generative-openai'
CLUSTER_HOSTNAME: 'node1'
t2v-transformers:
image: cr.weaviate.io/semitechnologies/transformers-inference:sentence-transformers-multi-qa-MiniLM-L6-cos-v1
environment:
ENABLE_CUDA: '0'
qna-transformers:
image: cr.weaviate.io/semitechnologies/qna-transformers:distilbert-base-uncased-distilled-squad
environment:
ENABLE_CUDA: '0'
```
Start Weaviate:
```bash
docker compose up -d
```
This configuration provides:
- **Latest Weaviate**: Version 1.30.0 with latest features
- **Multiple Vectorizers**: text2vec-transformers + QnA transformers
- **Production Ready**: Proper restart policies and persistence
- **GPU Support**: Set `ENABLE_CUDA: '1'` if you have NVIDIA GPU
```bash
# Gaia Node Configuration
GAIA_BASE_URL=https://0x299eae67ba6bbae8d61faad2d70115dc5a6855c8.gaia.domains/v1
GAIA_API_KEY=test-key
GAIA_MODEL_NAME=Gemma-3.4B-IT
# Weaviate Configuration
WEAVIATE_HOST=localhost
WEAVIATE_PORT=8080
WEAVIATE_USE_AUTH=false
# Vector Configuration
VECTORIZER_MODULE=text2vec-transformers
DEFAULT_COLLECTION_NAME=RealWorldKnowledgeBase
# Generation Parameters
MAX_TOKENS=300
TEMPERATURE=0.7
SEARCH_LIMIT=3
# Performance Tuning
BATCH_SIZE=100
CONNECTION_TIMEOUT=30
DEBUG=true
```
### 2. Environment Configuration
The system uses a comprehensive `.env` configuration for production readiness:
The system fetches real-world data from multiple sources:
#### Wikipedia Integration
```python
class WikipediaSource(DataSource):
def fetch_data(self, topics: List[str]) -> List[Dict[str, Any]]:
for topic in topics:
# Fetch full article content
params = {
'action': 'query',
'format': 'json',
'titles': topic,
'prop': 'extracts',
'explaintext': True
}
# Process and chunk content
chunks = self.chunk_text(content, max_length=1500)
```
#### ArXiv Research Papers
```python
class ArXivSource(DataSource):
def fetch_data(self, search_terms: List[str]) -> List[Dict[str, Any]]:
for term in search_terms:
params = {
'search_query': f'all:{term}',
'sortBy': 'submittedDate',
'sortOrder': 'descending'
}
# Parse XML response and extract metadata
```
#### GitHub Documentation
```python
class GitHubSource(DataSource):
def fetch_data(self, repos: List[str]) -> List[Dict[str, Any]]:
for repo in repos:
# Fetch README via GitHub API
readme_url = f"https://api.github.com/repos/{repo}/readme"
# Decode base64 content and process
```
### 3. Data Source Integration
The system fetches real-world data from multiple sources:
Advanced schema with nested properties for rich metadata:
```python
collection = weaviate_client.collections.create(
name="RealWorldKnowledgeBase",
vectorizer_config=Configure.Vectorizer.text2vec_transformers(),
properties=[
Property(name="title", data_type=DataType.TEXT),
Property(name="content", data_type=DataType.TEXT),
Property(name="source", data_type=DataType.TEXT),
Property(name="category", data_type=DataType.TEXT),
Property(
name="metadata",
data_type=DataType.OBJECT,
nested_properties=[
Property(name="url", data_type=DataType.TEXT),
Property(name="author", data_type=DataType.TEXT),
Property(name="published", data_type=DataType.TEXT),
Property(name="difficulty", data_type=DataType.TEXT),
Property(name="topic", data_type=DataType.TEXT),
Property(name="tags", data_type=DataType.TEXT_ARRAY),
Property(name="fetched_at", data_type=DataType.TEXT),
Property(name="chunk_index", data_type=DataType.INT),
Property(name="total_chunks", data_type=DataType.INT),
]
),
]
)
```
### 4. Weaviate Schema Design
Advanced schema with nested properties for rich metadata:
Complete RAG flow with context integration:
```python
def rag_query(self, query: str, collection_name: str = None) -> Dict[str, Any]:
# Step 1: Vector search in Weaviate
relevant_docs = self.search_knowledge(query, collection_name)
# Step 2: Prepare context for LLM
context_parts = []
for doc in relevant_docs:
context_parts.append(f"Title: {doc['title']}\nContent: {doc['content']}")
context = "\n\n".join(context_parts)
# Step 3: Generate response with Gaia node
response = self.llm_client.chat.completions.create(
model="Gemma-3.4B-IT",
messages=[
{"role": "system", "content": f"Use this context: {context}"},
{"role": "user", "content": query}
],
max_tokens=self.config.MAX_TOKENS,
temperature=self.config.TEMPERATURE
)
return {
"query": query,
"response": response.choices[0].message.content,
"sources": relevant_docs,
"model_used": "Gemma-3.4B-IT"
}
```
---
## ๐ Real-World Results
### Performance Metrics
Here are the actual results from our demo run:
```
๐ Total documents collected: 57
- ๐ Wikipedia: 7 articles (chunked into multiple docs)
- ๐ฌ ArXiv: 6 research papers
- ๐ GitHub: 37 documentation chunks
- ๐ฐ News: 6 recent articles
โฑ๏ธ Processing Time: 92.0 seconds (quick mode)
๐ Search Performance: ~50ms per query
๐ค Generation Speed: ~2-5 seconds per response
๐พ Storage: 63,479 characters across 57 documents
๐ Average document size: 1,113 characters
```
### Sample Query Results
Here are real responses from our system:
#### **Query**: *"What are the latest developments in large language models?"*
**Response**: Based on the provided context, here's a summary of recent developments in large language models:
* **GPT Models:** The largest and most capable LLMs are generative pretrained transformers (GPTs), like ChatGPT, Gemini, and Claude, which are widely used in generative chatbots.
* **Fine-tuning & Prompt Engineering:** LLMs can be adapted for specific tasks through fine-tuning or guided by prompt engineering.
* **Integration with other tools**: Semantic Kernel integrates with LLMs from OpenAI, Azure OpenAI, Hugging Face, and Nvidia. Weaviate also supports Cohere Multilingual ML Models.
The documents don't provide a comprehensive list of *all* the latest developments, but this captures the key trends highlighted in the given information.
**Sources Used**:
1. Large language model (Wikipedia) - Category: encyclopedia
2. weaviate Documentation (Part 6) - Category: documentation
3. semantic-kernel Documentation (Part 2) - Category: documentation
#### **Query**: *"How does retrieval augmented generation work?"*
**Response**: Here's how retrieval-augmented generation (RAG) works, based on the provided context:
1. **Collect and Embed Documents:** You gather text documents related to a specific topic or domain.
2. **Create Feature Vectors:** For each document (or section of a document), you use a deep learning network to create a "feature vector" โ often called an "embedding." This vector represents the meaning of the text in a numerical format.
3. **Store in a Vector Database:** These feature vectors are stored in a special database designed for quickly searching through large amounts of data based on similarity.
4. **User Prompt:** When a user asks a question or provides a prompt, the same process is applied to *that* prompt โ it's converted into a feature vector.
5. **Retrieve Relevant Documents:** The vector database searches for documents whose feature vectors are most similar to the prompt's feature vector. This identifies the most relevant pieces of information.
6. **Add to Context:** These retrieved documents are then added to what's called the "context window" โ essentially, they become part of the input provided to a large language model (LLM).
7. **Generate Response:** The LLM uses this expanded context to generate a response to the user's original prompt.
**Sources Used**:
1. Vector database (Part 2) - Category: encyclopedia
2. FADRM: Fast and Accurate Data Residual Matching for Dataset Distillation - Category: research
3. Controlling the false discovery rate under a non-parametric graphical dependence model - Category: research
#### **Query**: *"How do I use the OpenAI API?"*
**Response**: Here's how you can use the OpenAI API with the Python library, based on the provided documentation:
**1. Installation:**
```sh
pip install openai
```
**2. Basic Usage (Text Generation):**
```python
import openai
client = openai.OpenAI()
response = client.chat.completions.create(model="gpt-3.5-turbo")
print(response.choices[0].message.content)
```
**3. Handling Errors:**
```python
import openai
client = openai.OpenAI()
try:
client.fine_tuning.jobs.create(model="gpt-4o", training_file="file-abc123")
except openai.APIConnectionError as e:
print("The server could not be reached")
print(e.__cause__)
except openai.APIStatusError as e:
print(f"API Error: {e.status_code}")
print(e.response)
```
**Sources Used**:
1. openai-python Documentation (Part 1) - Category: documentation
2. openai-python Documentation (Part 11) - Category: documentation
3. openai-python Documentation (Part 19) - Category: documentation
### Data Source Statistics
```
๐ Categories:
documentation: 37 documents
encyclopedia: 7 documents
metadata: 1 documents
research: 6 documents
tech_news: 6 documents
๐ Sources:
arxiv: 6 documents
collection: 1 documents
github: 37 documents
rss: 6 documents
wikipedia: 7 documents
๐พ Weaviate Collection:
Collection name: RealWorldKnowledgeBase
Documents in collection: 57
Vectorizer: text2vec-transformers
```
---
## ๐ฏ Production Use Cases
### 1. AI Research Assistant
**Scenario**: Researchers need up-to-date information about AI developments
**Data Sources**: ArXiv papers, Wikipedia articles, GitHub repositories
**Query Examples**:
- "What are the latest developments in retrieval augmented generation?"
- "How do transformer architectures work?"
- "What are the current challenges in LLM training?"
### 2. Technical Documentation Helper
**Scenario**: Developers need help with API integration and implementation
**Data Sources**: GitHub READMEs, API documentation, technical guides
**Query Examples**:
- "How do I integrate OpenAI API with my application?"
- "What are the best practices for vector database setup?"
- "How to implement RAG with Weaviate?"
### 3. News and Trends Analyzer
**Scenario**: Businesses need insights into industry developments and market trends
**Data Sources**: TechCrunch, Hacker News, AI News feeds, industry reports
**Query Examples**:
- "What are the recent AI funding rounds and acquisitions?"
- "What companies are leading in AI innovation?"
- "What are the current regulatory challenges for AI?"
### 4. Educational Content Generator
**Scenario**: Educators and content creators need accurate, well-sourced explanations
**Data Sources**: Wikipedia, academic papers, documentation, tutorials
**Query Examples**:
- "Explain machine learning to beginners"
- "What is the difference between supervised and unsupervised learning?"
- "How do neural networks process information?"
---
## ๐ง Technical Implementation Details
**Available Models on Our Node:**
- `Gemma-3.4B-IT`: Google's instruction-tuned model (3.4B parameters)
- `gte-Qwen2-1.5B-instruct-f16`: Qwen-based model optimized for efficiency (1.5B parameters)
**Actual Configuration from Our Demo**:
```
๐ง Current Configuration:
Gaia URL: https://0x299eae67ba6bbae8d61faad2d70115dc5a6855c8.gaia.domains/v1
Weaviate: localhost:8080
Collection: MyKnowledgeBase
Vectorizer: text2vec-transformers
Max Tokens: 300
Temperature: 0.7
๐ Available models: ['Gemma-3.4B-IT', 'gte-Qwen2-1.5B-instruct-f16']
```
### Model Performance Analysis
Our demo showcases two different models available on the Gaia node:
#### Gemma-3.4B-IT (Google)
- **Size**: 3.4 billion parameters
- **Type**: Instruction-tuned model
- **Strengths**: Excellent for conversational AI and instruction following
- **Performance**: ~2-5 seconds per response (observed in demo)
- **Use Cases**: General Q&A, educational content, technical explanations
- **Quality**: Provides detailed, well-structured responses as seen in our examples
#### gte-Qwen2-1.5B-instruct-f16 (Alibaba)
- **Size**: 1.5 billion parameters
- **Type**: Instruction-tuned with 16-bit precision
- **Strengths**: Fast inference, good multilingual support
- **Performance**: ~1-2 seconds per response
- **Use Cases**: Quick responses, batch processing, resource-constrained environments
### Data Processing Pipeline
**Text Chunking Strategy**:
```python
def chunk_text(self, text: str, max_length: int = 1500) -> List[str]:
# Split by sentences to maintain context
sentences = re.split(r'(?<=[.!?])\s+', text)
chunks = []
current_chunk = ""
for sentence in sentences:
if len(current_chunk) + len(sentence) <= max_length:
current_chunk += " " + sentence if current_chunk else sentence
else:
if current_chunk:
chunks.append(current_chunk.strip())
current_chunk = sentence
return chunks
```
**Metadata Extraction**:
- **Source tracking**: Wikipedia, ArXiv, GitHub, RSS
- **Category classification**: encyclopedia, research, documentation, news
- **Timestamp tracking**: When content was fetched
- **Author information**: Where available
- **Difficulty levels**: beginner, intermediate, advanced
---
## ๐ Production Deployment Considerations
### Scaling the System
**Horizontal Scaling Options**:
1. **Multiple Gaia Nodes**: Load balance across different nodes
```python
gaia_nodes = [
"https://node1.gaia.domains/v1",
"https://node2.gaia.domains/v1",
"https://node3.gaia.domains/v1"
]
# Implement round-robin or weighted distribution
```
2. **Weaviate Clustering**: Scale vector operations
```yaml
# docker-compose.yml for cluster
services:
weaviate-node-1:
image: semitechnologies/weaviate:1.23.7
environment:
CLUSTER_HOSTNAME: 'node1'
weaviate-node-2:
image: semitechnologies/weaviate:1.23.7
environment:
CLUSTER_HOSTNAME: 'node2'
```
3. **Data Source Distribution**: Parallel fetching
```python
# Async data fetching
async def fetch_all_sources():
tasks = [
fetch_wikipedia_async(topics),
fetch_arxiv_async(search_terms),
fetch_github_async(repos),
fetch_rss_async(feeds)
]
results = await asyncio.gather(*tasks)
return flatten(results)
```
### Security and Authentication
**Production Security Checklist**:
โ **Environment Variables**: Never commit API keys
โ **Weaviate Authentication**: Enable for production
โ **Rate Limiting**: Implement client-side throttling
โ **Input Validation**: Sanitize user queries
โ **Network Security**: Use HTTPS/TLS encryption
โ **Access Control**: Implement user permissions
```bash
# Production Weaviate with auth
WEAVIATE_USE_AUTH=true
WEAVIATE_API_KEY=your-secure-production-key
```
### Monitoring and Observability
**Health Check Implementation**:
```python
def health_check(self) -> Dict[str, Any]:
health = {
"timestamp": time.time(),
"gaia": {"status": "unknown", "models": []},
"weaviate": {"status": "unknown", "collections": []},
"overall": "unknown"
}
# Test Gaia connection
try:
models = self.llm_client.models.list()
health["gaia"]["status"] = "healthy"
health["gaia"]["models"] = [m.id for m in models.data]
except Exception as e:
health["gaia"]["status"] = f"error: {e}"
# Test Weaviate connection
try:
is_ready = self.weaviate_client.is_ready()
if is_ready:
collections = self.weaviate_client.collections.list_all()
health["weaviate"]["status"] = "healthy"
health["weaviate"]["collections"] = list(collections.keys())
except Exception as e:
health["weaviate"]["status"] = f"error: {e}"
return health
```
---
## ๐ Performance Optimization Tips
### 1. Vector Search Optimization
**Batch Processing**:
```python
# Process multiple queries simultaneously
queries = ["query1", "query2", "query3"]
results = []
for query in queries:
result = collection.query.near_text(query=query, limit=5)
results.append(result)
```
**Index Tuning**:
```python
# Configure HNSW parameters for better performance
vectorizer_config = Configure.Vectorizer.text2vec_transformers(
vectorize_class_name=True,
model_config={
"ef_construction": 256, # Higher = better recall, slower build
"max_connections": 32, # Higher = better recall, more memory
}
)
```
### 2. LLM Response Optimization
**Context Window Management**:
```python
def optimize_context(self, docs: List[Dict], max_tokens: int = 2000) -> str:
context_parts = []
current_length = 0
for doc in sorted(docs, key=lambda x: x['score'], reverse=True):
doc_length = len(doc['content'])
if current_length + doc_length <= max_tokens:
context_parts.append(f"Title: {doc['title']}\n{doc['content']}")
current_length += doc_length
else:
break
return "\n\n".join(context_parts)
```
**Prompt Engineering**:
```python
system_prompt = """You are an AI assistant specializing in technical documentation and research.
Use the provided context to answer questions accurately and cite your sources when possible.
If the context doesn't contain relevant information, say so clearly.
Context:
{context}
Guidelines:
- Be concise but comprehensive
- Use bullet points for lists
- Cite sources when referencing specific information
- If uncertain, acknowledge limitations
"""
```
### 3. Data Ingestion Optimization
**Smart Caching**:
```python
import hashlib
from datetime import datetime, timedelta
def should_refresh_source(source_name: str, max_age_hours: int = 24) -> bool:
cache_file = f"cache/{source_name}_last_update.txt"
try:
with open(cache_file, 'r') as f:
last_update = datetime.fromisoformat(f.read().strip())
age = datetime.now() - last_update
return age > timedelta(hours=max_age_hours)
except FileNotFoundError:
return True
```
**Incremental Updates**:
```python
def get_new_documents_only(self, source: str, since: datetime) -> List[Dict]:
# Only fetch documents newer than the timestamp
# Implement based on source API capabilities
pass
```
---
## ๐ฎ Future Enhancements
### 1. Advanced Retrieval Strategies
**Hybrid Search Implementation**:
```python
# Combine vector search with keyword search
def hybrid_search(self, query: str, alpha: float = 0.7):
# Vector search (semantic similarity)
vector_results = collection.query.near_text(query=query, limit=10)
# BM25 search (keyword matching)
bm25_results = collection.query.bm25(query=query, limit=10)
# Combine results with weighted scoring
combined_results = self.combine_results(vector_results, bm25_results, alpha)
return combined_results
```
**Re-ranking with Cross-Encoders**:
```python
from sentence_transformers import CrossEncoder
def rerank_results(self, query: str, documents: List[Dict]) -> List[Dict]:
reranker = CrossEncoder('cross-encoder/ms-marco-MiniLM-L-6-v2')
pairs = [(query, doc['content']) for doc in documents]
scores = reranker.predict(pairs)
# Re-order documents by cross-encoder scores
for doc, score in zip(documents, scores):
doc['rerank_score'] = score
return sorted(documents, key=lambda x: x['rerank_score'], reverse=True)
```
### 2. Multi-Modal Capabilities
**Image and Document Processing**:
```python
# Future: Add support for PDFs, images, videos
class MultiModalSource(DataSource):
def process_pdf(self, pdf_path: str) -> List[Dict]:
# Extract text, images, tables from PDFs
pass
def process_image(self, image_path: str) -> Dict:
# OCR + image description
pass
```
### 3. Advanced Analytics
**Query Performance Tracking**:
```python
import time
from collections import defaultdict
class AnalyticsTracker:
def __init__(self):
self.query_times = defaultdict(list)
self.popular_queries = defaultdict(int)
self.source_usage = defaultdict(int)
def track_query(self, query: str, response_time: float, sources: List[str]):
self.query_times[query].append(response_time)
self.popular_queries[query] += 1
for source in sources:
self.source_usage[source] += 1
```
---
## ๐ Conclusion
This implementation demonstrates that building production-ready RAG systems with decentralized infrastructure is not only possible but practical. The combination of Gaia and Weaviate provides:
### Key Achievements
โ **Decentralized AI**: Successfully replaced OpenAI with public Gaia nodes
โ **Advanced Vector Operations**: Weaviate's capabilities exceed basic vector storage
โ **Real-World Data**: Live integration with multiple internet sources
โ **Production Features**: Configuration management, health monitoring, error handling
โ **Performance**: Sub-second search, 2-5 second generation times
โ **Scalability**: Architecture supports horizontal scaling
### Business Impact
- **Cost Reduction**: No API fees for LLM inference
- **Vendor Independence**: Avoid lock-in with centralized providers
- **Data Privacy**: Keep sensitive data within your infrastructure
- **Customization**: Full control over models and vectorization
- **Reliability**: Distributed infrastructure reduces single points of failure
### Technical Benefits
- **Modern Architecture**: Microservices-ready with clean separation of concerns
- **Flexibility**: Easy to swap models, vectorizers, or data sources
- **Observability**: Built-in health checks and performance monitoring
- **Developer Experience**: Environment-based configuration, comprehensive logging
### Getting Started
Ready to build your own decentralized RAG system? The complete implementation is available on GitHub with:
- ๐ Step-by-step setup instructions
- ๐งช Interactive demo with real data
- ๐ Performance benchmarks and optimization tips
- ๐ ๏ธ Production deployment guidelines
- ๐ง Troubleshooting and debugging tools
**Repository**: https://github.com/GaiaNet-AI/gaia-cookbook/tree/main/python/gaia-weaviate
**Demo Video**: https://youtu.be/zf9_WFhySho