Natural Language to MongoDB and Auto-Generated Visualizations with LangGraph

# Natural Language to MongoDB and Auto-Generated Visualizations with LangGraph ## Introduction Querying databases and building plots often requires technical skills: writing MongoDB pipelines, validating queries, and then manually coding visualizations. This process is not only time-consuming but also inaccessible to non-technical users. With **LangGraph**, we can streamline the workflow: ask a question in plain English, and automatically receive both **query results** and a **visual chart**. The system is designed around a lightweight **Supervisor** that coordinates two specialized subgraphs: 1. **Search subgraph** — converts natural language into validated MongoDB queries with `langchain_mongodb.agent_toolkit`. 2. **Code/Plot subgraph** — takes structured results and generates charts through **code generation, validation, execution, and reflection**. This post explains the design of these subgraphs, how their nodes interact, and how they come together into a reliable pipeline for data exploration. --- ### System overview ```mermaid flowchart TD start([start]) supervisor[supervisor] code_plot_agent[code_plot_agent] search_agent[search_agent] end_node([end]) start --> supervisor supervisor -.-> code_plot_agent supervisor -.-> search_agent supervisor --> end_node code_plot_agent -.-> supervisor search_agent -.-> supervisor ``` **Use case example:** > **“Create a pie chart showing the top 5 airlines by flight count in the past month.”** --- ## Search Subgraph — NL → MongoDB (with `langchain_mongodb.agent_toolkit`) ### `langchain_mongodb.agent_toolkit` A built-in toolkit that turns natural language into MongoDB queries (MQL) through a ReAct agent + tools. Detailed document could be found in the following links: **Workflow Explanation:** {%preview https://www.mongodb.com/company/blog/technical/natural-language-agents-mongodb-text-mql-langchain %} **Implementation:** {%preview https://www.mongodb.com/docs/atlas/ai-integrations/langchain/natural-language-to-mql/ %} ### How it works With the predefined toolkit, a ReAct agent handles tool usage and planning: ```mermaid graph TD START([Start]) AGENT[ReAct Agent] TOOLS{Tools Available} END([End]) START --> AGENT AGENT --> TOOLS TOOLS --> AGENT AGENT --> END %% Available Tools (4 core tools) mongodb_query[mongodb_query Execute MQL operations] mongodb_schema[mongodb_schema Get collection structure] mongodb_list_collections[mongodb_list_collections Discover collections] mongodb_query_checker[mongodb_query_checker Validate generated queries] TOOLS -.-> mongodb_query TOOLS -.-> mongodb_schema TOOLS -.-> mongodb_list_collections TOOLS -.-> mongodb_query_checker classDef toolClass fill:#e1f5fe,stroke:#1976d2,stroke-width:2px class mongodb_query,mongodb_schema,mongodb_list_collections,mongodb_query_checker toolClass ``` `mongodb_query` - **Purpose**: Execution - **Goal**: Run the query `mongodb_query_checker` - **Purpose**: Validation - **Goal**: Validate MQL (syntax, logic, perf) before execution `mongodb_list_collections` - **Purpose**: Discovery - **Goal**: List available collections `mongodb_schema` - **Purpose**: Inspection - **Goal**: Analyze a collection’s structure ### MongoDB Query Generation Workflow ```mermaid graph TD START([User Query: Which airline has more flight in the past two weeks]) AGENT[ReAct Agent Text-to-MQL Processing] %% Step 1: Discovery STEP1[Step 1: Discover Collections] LIST[mongodb_list_collections ] %% Step 2: Schema Analysis STEP2[Step 2: Analyze Structure] SCHEMA[mongodb_schema ] %% Step 3: Query Generation & Validation STEP3[Step 3: Generate & Validate Query] CHECKER[mongodb_query_checker Validates: syntax, logic, performance] %% Step 4: Execution STEP4[Step 4: Execute Query] QUERY[mongodb_query Executes: Aggregation pipeline] END([Response: xxx airline ....]) %% Main workflow START --> AGENT AGENT --> STEP1 STEP1 --> LIST LIST --> STEP2 STEP2 --> SCHEMA SCHEMA --> STEP3 STEP3 --> CHECKER CHECKER --> STEP4 STEP4 --> QUERY QUERY --> AGENT AGENT --> END %% Feedback loops CHECKER -.->|Invalid Query| STEP3 QUERY -.->|Error/Retry| STEP3 %% Styling classDef toolClass fill:#e1f5fe,stroke:#01579b,stroke-width:2px classDef stepClass fill:#f3e5f5,stroke:#4a148c,stroke-width:2px classDef agentClass fill:#e8f5e8,stroke:#1b5e20,stroke-width:3px classDef startEnd fill:#fff3e0,stroke:#e65100,stroke-width:2px class LIST,SCHEMA,QUERY,CHECKER toolClass class STEP1,STEP2,STEP3,STEP4 stepClass class AGENT agentClass class START,END startEnd ``` ### Streaming & State (what runs under the hood) * The agent is created via `create_react_agent(llm, toolkit.get_tools())`. * You call `agent.stream({"messages": [("user", query)]}, stream_mode="values")`. * Each streamed event carries updated `messages` (user → agent thoughts → tool calls → tool results → final answer). * You can scan those messages to extract **tool outputs** (e.g., the raw JSON returned by `mongodb_query`) and pass a **stable JSON** onward to the plotter. ### Examples (inputs/outputs) **Input** ```json { "messages": [["user", "Which airline had the most flights in the past month?"]] } ``` **Output** (passed to the next subgraph) ```json { "search_result": [ { "_id": "B7", "totalFlights": 321 }, { "_id": "AE", "totalFlights": 304 }, { "_id": "CI", "totalFlights": 233 }, { "_id": "BR", "totalFlights": 152 }, { "_id": "IT", "totalFlights": 146 } ], "metadata": { "collection": "arrivals", "window": "last_30d" } } ``` > **Tip:** Keep the output **schema-stable** (e.g., `search_result` with `_id` + `totalFlights`) to make the downstream plotter trivial to implement. --- ## Code Generation, Checks, and Reflection While the search subgraph translates natural language into MongoDB queries, the **Code/Plot subgraph** takes the query results and turns them into a validated, executable visualization. This is achieved by chaining specialized **LangGraph nodes**, each handling a distinct responsibility. --- ### Subgraph Design ```mermaid flowchart LR START([Start]) GEN[Generate Code Node] CHECK[Check Code Node] REFLECT[Reflect Node] END([End]) START --> GEN --> CHECK CHECK -- success --> END CHECK -- error --> REFLECT --> GEN ``` --- ### Node Responsibilities **1. Generate Node** * Uses a **prompted LLM chain (`code_gen_chain`)** to generate plotting code. The chain is straightforward with well structured prompt to instruct the llm to generate code with the given input data. In this node, we are simply just generating code without any validation check. * Input: MongoDB rows, query summary, record count, insights. * Output: Structured object with: * `prefix`: analysis of data & chosen chart type * `imports`: required libraries * `code`: full Python plotting code --- **2. Check Code Node** * Runs two levels of validation: 1. **Import Check** → ensures required libraries are safe and available. 2. **Execution Check** → attempts to run `imports + code`. The check is easily done by doing simply python code shown as follow: ```python= try: # Run the code exec(generated_code) except Exception as e: # Update the state with the error message ``` If either fails, it appends error messages to state and sets `error="yes"`. If execution succeeds, it appends a completion message and sets `error="no"`. This effectively serves as a lightweight **runtime sandbox**. --- **3. Reflect Node** * Activated only if code execution fails and retry attempts remain. * Re-invokes the code generation chain, but with prior errors included in the conversation history. * Produces a revised generation attempt that is routed back into **Generate Node**. This creates a **self-correcting loop**: `generate → check → reflect → generate → …` --- ### Orchestration Logic The LangGraph orchestration is defined as follows: ```python workflow = StateGraph(GraphState) workflow.add_node("generate", generate) workflow.add_node("check_code", code_check) workflow.add_node("reflect", reflect) workflow.add_edge(START, "generate") workflow.add_edge("generate", "check_code") workflow.add_conditional_edges( "check_code", decide_to_finish, # decide: end / reflect / regenerate {"end": END, "reflect": "reflect", "generate": "generate"}, ) workflow.add_edge("reflect", "generate") app = workflow.compile(name="code_plot_agent") ``` **Decision policy:** * If `error == "no"` → finish. * If `error == "yes"` and `iterations < max_iterations` → retry (reflect or regenerate). * If retries exhausted → finish with error state. --- ### Why This Design Works * **Composability:** Each node is single-purpose (generation, validation, reflection). * **Safety:** Execution happens only after imports & code pass checks. * **Recoverability:** Failures are logged and re-routed into retries instead of crashing. * **Determinism:** By preserving `GraphState` (messages, error flags, iterations, search result), the pipeline is reproducible and debuggable. --- ## Results The system brings everything together: 1. **User Query**: top 5 airlines by flight count in the last month 2. **Search Agent (MongoDB query)** Example query result: ```json { "search_result": [ { "_id": "B7", "totalFlights": 321 }, { "_id": "AE", "totalFlights": 304 }, { "_id": "CI", "totalFlights": 233 }, { "_id": "BR", "totalFlights": 152 }, { "_id": "IT", "totalFlights": 146 } ], "metadata": { "collection": "arrivals", "window": "last_30d" } } ``` 2. **Code Agent (chart generation)** Using the above data, the code agent generated the following visualization: ![image](https://hackmd.io/_uploads/B1HOLVhYge.png) *Figure: Pie chart of the top 5 airlines by flight count in the past month.* ## Conclusion This project demonstrates how **LangGraph** can orchestrate specialized agents into a seamless workflow: * **Search Subgraph**: Converts natural language into validated MongoDB queries. * **Plot Subgraph**: Transforms structured results into safe, insightful charts. The result: a **fully automated NL → MongoDB → Visualization pipeline**. Users ask in plain English, and the system delivers **data-backed answers with visual context** — an approach that generalizes well beyond flight analytics to any domain where **data exploration + visualization** are critical.