# Natural Language to MongoDB and Auto-Generated Visualizations with LangGraph
## Introduction
Querying databases and building plots often requires technical skills: writing MongoDB pipelines, validating queries, and then manually coding visualizations. This process is not only time-consuming but also inaccessible to non-technical users.
With **LangGraph**, we can streamline the workflow: ask a question in plain English, and automatically receive both **query results** and a **visual chart**.
The system is designed around a lightweight **Supervisor** that coordinates two specialized subgraphs:
1. **Search subgraph** — converts natural language into validated MongoDB queries with `langchain_mongodb.agent_toolkit`.
2. **Code/Plot subgraph** — takes structured results and generates charts through **code generation, validation, execution, and reflection**.
This post explains the design of these subgraphs, how their nodes interact, and how they come together into a reliable pipeline for data exploration.
---
### System overview
```mermaid
flowchart TD
start([start])
supervisor[supervisor]
code_plot_agent[code_plot_agent]
search_agent[search_agent]
end_node([end])
start --> supervisor
supervisor -.-> code_plot_agent
supervisor -.-> search_agent
supervisor --> end_node
code_plot_agent -.-> supervisor
search_agent -.-> supervisor
```
**Use case example:**
> **“Create a pie chart showing the top 5 airlines by flight count in the past month.”**
---
## Search Subgraph — NL → MongoDB (with `langchain_mongodb.agent_toolkit`)
### `langchain_mongodb.agent_toolkit`
A built-in toolkit that turns natural language into MongoDB queries (MQL) through a ReAct agent + tools.
Detailed document could be found in the following links:
**Workflow Explanation:**
{%preview https://www.mongodb.com/company/blog/technical/natural-language-agents-mongodb-text-mql-langchain %}
**Implementation:**
{%preview https://www.mongodb.com/docs/atlas/ai-integrations/langchain/natural-language-to-mql/ %}
### How it works
With the predefined toolkit, a ReAct agent handles tool usage and planning:
```mermaid
graph TD
START([Start])
AGENT[ReAct Agent]
TOOLS{Tools Available}
END([End])
START --> AGENT
AGENT --> TOOLS
TOOLS --> AGENT
AGENT --> END
%% Available Tools (4 core tools)
mongodb_query[mongodb_query<br/>Execute MQL operations]
mongodb_schema[mongodb_schema<br/>Get collection structure]
mongodb_list_collections[mongodb_list_collections<br/>Discover collections]
mongodb_query_checker[mongodb_query_checker<br/>Validate generated queries]
TOOLS -.-> mongodb_query
TOOLS -.-> mongodb_schema
TOOLS -.-> mongodb_list_collections
TOOLS -.-> mongodb_query_checker
classDef toolClass fill:#e1f5fe,stroke:#1976d2,stroke-width:2px
class mongodb_query,mongodb_schema,mongodb_list_collections,mongodb_query_checker toolClass
```
`mongodb_query`
- **Purpose**: Execution
- **Goal**: Run the query
`mongodb_query_checker`
- **Purpose**: Validation
- **Goal**: Validate MQL (syntax, logic, perf) before execution
`mongodb_list_collections`
- **Purpose**: Discovery
- **Goal**: List available collections
`mongodb_schema`
- **Purpose**: Inspection
- **Goal**: Analyze a collection’s structure
### MongoDB Query Generation Workflow
```mermaid
graph TD
START([User Query: Which airline has more flight in the past two weeks])
AGENT[ReAct Agent<br/>Text-to-MQL Processing]
%% Step 1: Discovery
STEP1[Step 1: Discover Collections]
LIST[mongodb_list_collections<br/>]
%% Step 2: Schema Analysis
STEP2[Step 2: Analyze Structure]
SCHEMA[mongodb_schema<br/>]
%% Step 3: Query Generation & Validation
STEP3[Step 3: Generate & Validate Query]
CHECKER[mongodb_query_checker<br/>Validates: syntax, logic, performance]
%% Step 4: Execution
STEP4[Step 4: Execute Query]
QUERY[mongodb_query<br/>Executes: Aggregation pipeline]
END([Response: xxx airline ....])
%% Main workflow
START --> AGENT
AGENT --> STEP1
STEP1 --> LIST
LIST --> STEP2
STEP2 --> SCHEMA
SCHEMA --> STEP3
STEP3 --> CHECKER
CHECKER --> STEP4
STEP4 --> QUERY
QUERY --> AGENT
AGENT --> END
%% Feedback loops
CHECKER -.->|Invalid Query| STEP3
QUERY -.->|Error/Retry| STEP3
%% Styling
classDef toolClass fill:#e1f5fe,stroke:#01579b,stroke-width:2px
classDef stepClass fill:#f3e5f5,stroke:#4a148c,stroke-width:2px
classDef agentClass fill:#e8f5e8,stroke:#1b5e20,stroke-width:3px
classDef startEnd fill:#fff3e0,stroke:#e65100,stroke-width:2px
class LIST,SCHEMA,QUERY,CHECKER toolClass
class STEP1,STEP2,STEP3,STEP4 stepClass
class AGENT agentClass
class START,END startEnd
```
### Streaming & State (what runs under the hood)
* The agent is created via `create_react_agent(llm, toolkit.get_tools())`.
* You call `agent.stream({"messages": [("user", query)]}, stream_mode="values")`.
* Each streamed event carries updated `messages` (user → agent thoughts → tool calls → tool results → final answer).
* You can scan those messages to extract **tool outputs** (e.g., the raw JSON returned by `mongodb_query`) and pass a **stable JSON** onward to the plotter.
### Examples (inputs/outputs)
**Input**
```json
{
"messages": [["user", "Which airline had the most flights in the past month?"]]
}
```
**Output** (passed to the next subgraph)
```json
{
"search_result": [
{ "_id": "B7", "totalFlights": 321 },
{ "_id": "AE", "totalFlights": 304 },
{ "_id": "CI", "totalFlights": 233 },
{ "_id": "BR", "totalFlights": 152 },
{ "_id": "IT", "totalFlights": 146 }
],
"metadata": { "collection": "arrivals", "window": "last_30d" }
}
```
> **Tip:** Keep the output **schema-stable** (e.g., `search_result` with `_id` + `totalFlights`) to make the downstream plotter trivial to implement.
---
## Code Generation, Checks, and Reflection
While the search subgraph translates natural language into MongoDB queries, the **Code/Plot subgraph** takes the query results and turns them into a validated, executable visualization. This is achieved by chaining specialized **LangGraph nodes**, each handling a distinct responsibility.
---
### Subgraph Design
```mermaid
flowchart LR
START([Start])
GEN[Generate<br/>Code Node]
CHECK[Check<br/>Code Node]
REFLECT[Reflect<br/>Node]
END([End])
START --> GEN --> CHECK
CHECK -- success --> END
CHECK -- error --> REFLECT --> GEN
```
---
### Node Responsibilities
**1. Generate Node**
* Uses a **prompted LLM chain (`code_gen_chain`)** to generate plotting code.
The chain is straightforward with well structured prompt to instruct the llm to generate code with the given input data. In this node, we are simply just generating code without any validation check.
* Input: MongoDB rows, query summary, record count, insights.
* Output: Structured object with:
* `prefix`: analysis of data & chosen chart type
* `imports`: required libraries
* `code`: full Python plotting code
---
**2. Check Code Node**
* Runs two levels of validation:
1. **Import Check** → ensures required libraries are safe and available.
2. **Execution Check** → attempts to run `imports + code`.
The check is easily done by doing simply python code shown as follow:
```python=
try:
# Run the code
exec(generated_code)
except Exception as e:
# Update the state with the error message
```
If either fails, it appends error messages to state and sets `error="yes"`.
If execution succeeds, it appends a completion message and sets `error="no"`.
This effectively serves as a lightweight **runtime sandbox**.
---
**3. Reflect Node**
* Activated only if code execution fails and retry attempts remain.
* Re-invokes the code generation chain, but with prior errors included in the conversation history.
* Produces a revised generation attempt that is routed back into **Generate Node**.
This creates a **self-correcting loop**:
`generate → check → reflect → generate → …`
---
### Orchestration Logic
The LangGraph orchestration is defined as follows:
```python
workflow = StateGraph(GraphState)
workflow.add_node("generate", generate)
workflow.add_node("check_code", code_check)
workflow.add_node("reflect", reflect)
workflow.add_edge(START, "generate")
workflow.add_edge("generate", "check_code")
workflow.add_conditional_edges(
"check_code",
decide_to_finish, # decide: end / reflect / regenerate
{"end": END, "reflect": "reflect", "generate": "generate"},
)
workflow.add_edge("reflect", "generate")
app = workflow.compile(name="code_plot_agent")
```
**Decision policy:**
* If `error == "no"` → finish.
* If `error == "yes"` and `iterations < max_iterations` → retry (reflect or regenerate).
* If retries exhausted → finish with error state.
---
### Why This Design Works
* **Composability:** Each node is single-purpose (generation, validation, reflection).
* **Safety:** Execution happens only after imports & code pass checks.
* **Recoverability:** Failures are logged and re-routed into retries instead of crashing.
* **Determinism:** By preserving `GraphState` (messages, error flags, iterations, search result), the pipeline is reproducible and debuggable.
---
## Results
The system brings everything together:
1. **User Query**: top 5 airlines by flight count in the last month
2. **Search Agent (MongoDB query)**
Example query result:
```json
{
"search_result": [
{ "_id": "B7", "totalFlights": 321 },
{ "_id": "AE", "totalFlights": 304 },
{ "_id": "CI", "totalFlights": 233 },
{ "_id": "BR", "totalFlights": 152 },
{ "_id": "IT", "totalFlights": 146 }
],
"metadata": { "collection": "arrivals", "window": "last_30d" }
}
```
2. **Code Agent (chart generation)**
Using the above data, the code agent generated the following visualization:

*Figure: Pie chart of the top 5 airlines by flight count in the past month.*
## Conclusion
This project demonstrates how **LangGraph** can orchestrate specialized agents into a seamless workflow:
* **Search Subgraph**: Converts natural language into validated MongoDB queries.
* **Plot Subgraph**: Transforms structured results into safe, insightful charts.
The result: a **fully automated NL → MongoDB → Visualization pipeline**.
Users ask in plain English, and the system delivers **data-backed answers with visual context** — an approach that generalizes well beyond flight analytics to any domain where **data exploration + visualization** are critical.