RAG ZOO plan of Action

# RAG ZOO plan of Action Repo: [https://github.com/pclubiitk/RAG-Zoo](https://github.com/pclubiitk/RAG-Zoo) Background: [https://pclub.in/project/2025/05/26/scyllaagnet/](https://pclub.in/project/2025/05/26/scyllaagnet/) ## PHASE 1 ### TEAM 1: Soham, Pradyumn, Anany **TASK: Understanding the Code Base and Refactoring** Go through `RAG-Zoo/docs`, `RAG-Zoo/rag_src`, `RAG-Zoo/tests`. This will help you understand the work done till now. The current code is rudimentary and ill structured. Refactor the code into `RAG-Zoo/src/ragzoo` (This is a structure that we'll likely follow. If you think modifications are required in the same, feel free to do so). I have made the subfolders and some of the files and hopefully they are self explanatory. This will help you get a hang of the current code base and also give a new perspective to the current structure. Specific work: 1. Design core interfaces (retriever, embedder, vector store, reranker, chunking, search, generation) and unit tests for these 2. Build base_rag.py, simple RAG pipeline, make initial simple example & test ### TEAM 2: Harshit, Harsh, Nideesh, Himanshu **TASK 1: Work on CI/CD and Packaging**. - Define project-level conventions (code style, packaging, versioning), - initialize repo skeleton, set up CI config & linting config - Set up project packaging structure **TASK 2: RAGs Research and utils construction** - Research about different RAG and Agents Pipelines we can integrate into RAGZoo. Currently we are planning to integrate Simple RAG, Adaptive RAG, Branched RAG, Corrective RAG, Cache Augmented Gen, HyDE, Self RAG. Stick to these for now. - Make flowchart/diagrams of how these will be implemented using the given infrastructure (the base classes will more or less have same inputs and outputs so assume that they are implemented and make the pseudocode and diagrams accordingly) - Construct additional utils (functions) required for these RAGs ## PHASE 2 Note: We'll decide the team distributions later based on interests Making Unit Tests and Integration tests for all the codes written is necessary. ### Team 1: - Implement various types of chunking, embeddings, - Add Support for advanced vector store (FAISS), hybrid search, pinecone etc - Expand Generators to include Huggingface local transformers and LLMs ### Team 2 - Add rerankers (BGE, BERT) - Build more advanced pipelines (self-RAG, branched, adaptive, etc. which are researched upon by Team 2 in Phase 1) ### Team 3: DevOps + Utils 1. Integrate simple pipeline test in CI 2. run integration tests combining all parts 3. Add parallel and streaming utilities ## PHASE 3 and henceforth 1. Agents Base Class 2. Agent Orchestrators 3. Agentic RAGs 4. Add advanced Agentic Frameworks like MCTS, LATS, META 5. Add advanced Knowledge structures like Knowledge Graphs 6. Stabilize all modules 7. caching support 8. Finalize documentation, write tutorials, package release, 9. publish to PyPI