LangChain LangGraph：Advanced RAG

# LangGraph: Building an Advanced RAG System This series of articles will explain how to implement an **Advanced RAG system** with **retrieval**, **routing**, **generation**, and **feedback refinement** capabilities using LangChain and LangGraph. The architecture integrates multiple information sources and dynamically selects appropriate retrieval strategies based on the input query. It then evaluates the quality and relevance of generated results, applying self-correction and path adjustment. In Retrieval-Augmented Generation (RAG) research, several new approaches have recently emerged to enhance generation quality and workflow control, including: * **Self-RAG** ([arXiv:2401.15884](https://arxiv.org/abs/2401.15884)): Ensures retrieved documents are helpful by letting the language model itself evaluate and rerank retrieval results, improving alignment between question and answer. * **Corrective-RAG** ([arXiv:2310.11511](https://arxiv.org/abs/2310.11511)): Mitigates hallucination via post-processing and feedback mechanisms to automatically check and correct model outputs, enhancing factual consistency. * **Adaptive-RAG** ([arXiv:2403.14403](https://arxiv.org/abs/2403.14403)): Dynamically chooses the most suitable retrieval sources and strategies depending on question type and task context. This series will draw inspiration from these research concepts and combine them with LangGraph’s modular design methodology to implement a **traceable and maintainable RAG workflow**. This first article introduces the overall design logic and module diagram. Later chapters will break down each stage node’s behavior, input/output format, and implementation details in LangGraph. --- ## Learning Resources: LangChain and LangGraph **LangChain** has become one of the most popular AI frameworks. It integrates multiple large language models (LLMs)—such as GPT-4, Claude, and Gemini—and connects external tools/databases, enabling rapid development of applications like chatbots, search Q\&A (RAG), and automated tool integrations. Once you’re comfortable with LangChain, the next step is **LangGraph**, which visualizes and integrates LangChain workflows. LangGraph represents program logic using **nodes** and **edges**, giving you an intuitive, graph-based way to understand and manage your agent. **LangChain learning resources**: * **[YouTube LangChain Crash Course](https://www.youtube.com/watch?v=yF9kGESAi3M)** A beginner-friendly walkthrough of LangChain concepts and examples. * **[LangChain Crash Course GitHub Repo](https://github.com/bhancockio/langchain-crash-course)** Complete code samples covering everything from chat models to RAG and agents. **LangGraph learning resources**: * **[YouTube LangGraph Crash Course](https://www.youtube.com/watch?v=jGg_1h0qzaM&t=62s)** Quick start visual tutorial introducing StateGraph, node design, and execution flow. * **[LangGraph Course GitHub Repo](https://github.com/iamvaibhavmehra/LangGraph-Course-freeCodeCamp)** Repository of code examples aligned with the video tutorials. Great for hands-on practice. * **[Udemy Course: Build Multi-Agent Systems using LangGraph](https://www.udemy.com/course/langgraph/)** A complete advanced course focused on multi-agent systems with LangGraph. --- ## From Traditional RAG to Advanced Design A typical RAG workflow has two stages: 1. **Retrieval** – Retrieve relevant documents from a vector database based on the input question. 2. **Generation** – Concatenate the question with retrieved context and pass it to the language model to generate an answer. This improves factual grounding but suffers from common issues: * Retrieved documents may be irrelevant or imprecise. * Generated answers may ignore the provided context. * No mechanism to judge the trustworthiness of the generated result. Recent research proposed improvements to address these issues. Below are three representative strategies: --- ### Self-RAG: Model Evaluates Retrieval Quality **Paper**: [Self-RAG (arXiv:2401.15884)](https://arxiv.org/abs/2401.15884) Self-RAG lets the LLM participate in re-ranking retrieved documents. By self-evaluating the link between generation and context, the model selects the most useful pieces of evidence, enhancing answer quality. The LLM acts as a “retrieval supervisor.” --- ### Corrective-RAG: Post-Review of Generation **Paper**: [Corrective-RAG (arXiv:2310.11511)](https://arxiv.org/abs/2310.11511) Corrective-RAG introduces a post-processing pipeline that reviews generated outputs against the supporting documents. If inconsistencies are detected, the system corrects or regenerates the response. This emphasizes **feedback and correction**, boosting reliability. --- ### Adaptive-RAG: Switching Strategies by Question Type **Paper**: [Adaptive-RAG (arXiv:2403.14403)](https://arxiv.org/abs/2403.14403) Adaptive-RAG dynamically selects different retrieval sources depending on the query. For factual queries, internal knowledge bases may suffice; for open-ended or time-sensitive questions, external search engines are used. This increases stability and flexibility. --- These three strategies correspond to improving **retrieval precision**, **generation validation**, and **workflow adaptivity**. In upcoming chapters, we will implement a workflow combining these ideas. --- ## System Architecture The following diagram illustrates the Advanced RAG pipeline, including document retrieval, answer generation, evaluation, and self-correction: ![image](https://hackmd.io/_uploads/rksVH1-ulg.png) --- ### 1. Retrieval and Initial Filtering: `Retrieve Documents → Grade Documents` The system first retrieves candidate documents from a vector database. Then, a grading model (`Grade Documents`) evaluates their semantic relevance to the user query. * If documents are relevant, proceed to generation. * If documents are irrelevant, trigger an external search. --- ### 2. Supplementary Retrieval: `Web Search` If internal retrieval is insufficient, the system performs a web search (e.g., `Tavily Search`) to supplement context with public information. This improves coverage for open-ended or novel questions. --- ### 3. Generation and Validation: `Generate Answer → Grade Generation` The system then generates a response and evaluates it in two phases: * **Phase 1**: Check for hallucinations (inconsistencies with retrieved context). * **Phase 2**: Verify whether the answer actually addresses the user’s question. If the answer fails, the system loops back for additional retrieval and regeneration, forming a self-correcting feedback cycle. --- ### 4. Question Routing and Self-Correction Loop At the start, a routing mechanism directs queries either to vector retrieval or directly to external web search depending on type. This embodies Adaptive-RAG: dynamically selecting sources based on query characteristics. --- ## Implementation {%preview https://github.com/WoodyChang21/LangGraph-Advanced-RAG %} ``` graph/ │ ├── chains/ # Subchains for each task (Router / Grader / Generator) │ ├── router.py │ ├── generation.py │ ├── hallucination_grader.py │ ├── answer_grader.py │ └── retrieval_grader.py │ ├── nodes/ # LangGraph nodes wrapping each chain │ ├── retrieve.py │ ├── grade_documents.py │ ├── generate.py │ └── web_search.py │ ├── consts.py # Node constants ├── graph.py # Main graph assembly └── state.py # GraphState definition ``` --- ### State Data Structure (GraphState) LangGraph uses a shared **state object** to connect nodes. Each node reads/updates only the fields it needs, keeping logic modular and traceable. ```python from typing import List, TypedDict class GraphState(TypedDict): """ Shared state of the graph. Attributes: question: user’s query generation: model-generated response web_search: whether external search is required documents: retrieved documents """ question: str generation: str web_search: bool documents: List[str] ``` --- ### Node Design ![image](https://hackmd.io/_uploads/rkHMdyWulg.png) | Node File | Purpose | Corresponding Chain | | -------------------- | --------------------------------------------- | ----------------------- | | `retrieve.py` | Retrieve documents from vector database | `ingestion.retriever()` | | `grade_documents.py` | Filter irrelevant docs & decide on web search | `retrieval_grader.py` | | `generate.py` | Generate response with RAG prompt | `generation.py` | | `web_search.py` | Call Tavily API for external search | Direct API call | --- ## Mapping Self-RAG, Corrective-RAG, Adaptive-RAG to Implementation The following diagram shows which parts of the system correspond to each strategy: ![image](https://hackmd.io/_uploads/S1uJMFWueg.png) --- ### Adaptive-RAG: Switching Retrieval Strategies At the start, the query passes through a `route_question()` function using a `question_router` chain: ```python def route_question(state: GraphState) -> str: source: RouteQuery = question_router.invoke({"question": state["question"]}) return WEBSEARCH if source.datasource == WEBSEARCH else RETRIEVE ``` This is Adaptive-RAG in action: **dynamic data source selection**. --- ### Corrective-RAG: Hallucination Checking After generation, a `hallucination_grader` verifies whether the output is grounded in retrieved context: ```python score = hallucination_grader.invoke({"documents": documents, "generation": generation}) if not score.binary_score: return "not supported" # regenerate ``` If hallucination is detected, the system retries with corrected inputs. --- ### Self-RAG: Model Evaluates Information and Response Self-RAG appears in two places: 1. **Document grading**: `grade_documents.py` ensures retrieved docs are relevant; irrelevant queries trigger web search. 2. **Answer grading**: `answer_grader` checks if the generated answer addresses the question. ```python score = answer_grader.invoke({"question": question, "generation": generation}) return "useful" if score.binary_score else "not useful" ``` Together, these checks ensure the system only delivers relevant and useful answers. --- ### Summary of the Three Layers 1. **Adaptive-RAG**: Choose the right information source. 2. **Self-RAG**: Validate retrievals and responses for quality and relevance. 3. **Corrective-RAG**: Automatically correct flawed outputs. --- ## Extension: Domain-Specific RAG (e.g., Customer Support) For enterprise use cases like internal knowledge bases or product support, it’s best to **restrict web search domains** to avoid unreliable sources. Tavily API supports domain filters: ```python from langchain_tavily import TavilySearch search = TavilySearch(max_results=3) results = search.invoke({ "query": "How to reset my DotDotWatch?", "include_domains": ["support.dotdot.com", "docs.dotdot.com"] }) ```