RAG discussion 24.01.2025

# RAG discussion 24.01.2025 This is a discussion before we make a course/workshop to users: this isn't a "presentation" yet, but is trying to prepare a lesson/workshop. ## Who has done RAG solutions? - Simo - Aalto Papers RAG for HoAI - Quick Finnish RAG testing for Piekkari HoAI project (on hold) - RAG API example for Knowledge graph project - Hossein - Grant RAG for RES, PDFs with in-memory database and langcain. - Yu - Small scale ones with in-memory database - Thomas - SciComp Chat bot ## When have you (or might you want to) use RAGs ![RAG](https://arxiv.org/html/2312.10997v5/extracted/2312.10997v5/images/RAG_case.png) ## What technologies you used? Embeddings: - OpenAI Embedding - [bge-m3](https://huggingface.co/BAAI/bge-m3) Frameworks: - Langchain - LlamaIndex - LangServe (for serving functionality via FastAPI) LLMs: - Ollama - vLLM - Aalto OpenAI endpoints Vector databases: - Faiss - MongoDB - DuckDB - [pgvector](https://github.com/pgvector/pgvector) - Embedding vector interface for PostgreSQL Data processing: - [PyMuPDF4LLM](https://pymupdf.readthedocs.io/en/latest/pymupdf4llm/) (PDF reading) - [Unstructured](https://docs.unstructured.io/welcome) - [PDFs to markdown](https://github.com/VikParuchuri/marker) Extra tools: - https://github.com/OpenSPG/KAG (for knowledge graphs) - Agentic RAG ## What features / concepts to use / avoid? - LangChain is easy to use if you look at the documents - LlamaIndex has some good documentation, but some bad - In memory DB is great for starting - DuckDB can be used to store vectors - Timewise costly parts are document cleaning and embedding calculation - ## What are the key concepts that we should propagate to users? - RAG != training or fine tuning - LLM and you have data and you want the LLM to answer based on the data - Fine tuning == LLM learning the language/data structure of data so that it can produce similar looking text - RAG == Making data searchable via language queries and then using LLMs to answer the queries based on the found data - Some key concepts: - LLMs are (input tokens) -> (output tokens). Inputs are the prompt, system prompt, but you can also add in other relevant data. This is what RAGs does. - Make a knowledge base searchable - There is a restriction here, that the searchability depends on the "quality" of your data retriever. - Find relevant knowledge based on the query (Retrieval) - Insert it into the context window (Augmented Generation) - Data that goes in is usually split into pieces - We want a way to search relevant documents (chunks): ideally better than simple keyword search. This is via a vectorized search. - These documents are usually vectorized using some embedding - When user does a query, the query is embedded using the same embedding - Relevant documents are matched based on how similar they are to the query embedding - Found documents and the question are given to LLM in addition to instructions so that LLM can answer the question based on the data - Data processing: - Pre-processing (like e.g. - automatic - labeling of images), text extraction etc is of major importance for retrievability. From LlamaIndex ([relevant page](https://docs.llamaindex.ai/en/stable/understanding/rag/)): ![Example RAG structure](https://docs.llamaindex.ai/en/stable/_static/getting_started/basic_rag.png) - Document = individual piece of text or data - Document store = Stores documents or pieces of them - Vector store = stores the corresponding embedding vectors for the documents + usually the document - Retriever = takes a query, produces corresponding documents Framework implements these: ![RAG process](https://docs.llamaindex.ai/en/stable/_static/getting_started/stages.png) ## Lesson plan - Duration: - Expected audience: - NOT included: - data pre-processing ## Example tutorials - Semantic search: https://python.langchain.com/docs/tutorials/retrievers/ - RAG examples: https://python.langchain.com/docs/tutorials/rag/