# RAG discussion 24.01.2025
This is a discussion before we make a course/workshop to users: this isn't a "presentation" yet, but is trying to prepare a lesson/workshop.
## Who has done RAG solutions?
- Simo
- Aalto Papers RAG for HoAI
- Quick Finnish RAG testing for Piekkari HoAI project (on hold)
- RAG API example for Knowledge graph project
- Hossein
- Grant RAG for RES, PDFs with in-memory database and langcain.
- Yu
- Small scale ones with in-memory database
- Thomas
- SciComp Chat bot
## When have you (or might you want to) use RAGs

## What technologies you used?
Embeddings:
- OpenAI Embedding
- [bge-m3](https://huggingface.co/BAAI/bge-m3)
Frameworks:
- Langchain
- LlamaIndex
- LangServe (for serving functionality via FastAPI)
LLMs:
- Ollama
- vLLM
- Aalto OpenAI endpoints
Vector databases:
- Faiss
- MongoDB
- DuckDB
- [pgvector](https://github.com/pgvector/pgvector) - Embedding vector interface for PostgreSQL
Data processing:
- [PyMuPDF4LLM](https://pymupdf.readthedocs.io/en/latest/pymupdf4llm/) (PDF reading)
- [Unstructured](https://docs.unstructured.io/welcome)
- [PDFs to markdown](https://github.com/VikParuchuri/marker)
Extra tools:
- https://github.com/OpenSPG/KAG (for knowledge graphs)
- Agentic RAG
## What features / concepts to use / avoid?
- LangChain is easy to use if you look at the documents
- LlamaIndex has some good documentation, but some bad
- In memory DB is great for starting
- DuckDB can be used to store vectors
- Timewise costly parts are document cleaning and embedding calculation
-
## What are the key concepts that we should propagate to users?
- RAG != training or fine tuning
- LLM and you have data and you want the LLM to answer based on the data
- Fine tuning == LLM learning the language/data structure of data so that it can produce similar looking text
- RAG == Making data searchable via language queries and then using LLMs to answer the queries based on the found data
- Some key concepts:
- LLMs are (input tokens) -> (output tokens). Inputs are the prompt, system prompt, but you can also add in other relevant data. This is what RAGs does.
- Make a knowledge base searchable
- There is a restriction here, that the searchability depends on the "quality" of your data retriever.
- Find relevant knowledge based on the query (Retrieval)
- Insert it into the context window (Augmented Generation)
- Data that goes in is usually split into pieces
- We want a way to search relevant documents (chunks): ideally better than simple keyword search. This is via a vectorized search.
- These documents are usually vectorized using some embedding
- When user does a query, the query is embedded using the same embedding
- Relevant documents are matched based on how similar they are to the query embedding
- Found documents and the question are given to LLM in addition to instructions so that LLM can answer the question based on the data
- Data processing:
- Pre-processing (like e.g. - automatic - labeling of images), text extraction etc is of major importance for retrievability.
From LlamaIndex ([relevant page](https://docs.llamaindex.ai/en/stable/understanding/rag/)):

- Document = individual piece of text or data
- Document store = Stores documents or pieces of them
- Vector store = stores the corresponding embedding vectors for the documents + usually the document
- Retriever = takes a query, produces corresponding documents
Framework implements these:

## Lesson plan
- Duration:
- Expected audience:
- NOT included:
- data pre-processing
## Example tutorials
- Semantic search: https://python.langchain.com/docs/tutorials/retrievers/
- RAG examples: https://python.langchain.com/docs/tutorials/rag/