# OpenSearch Workflows & Analytics using Gaia
In the world of modern data systems, search isn’t just about retrieving documents—it’s about answering questions. With the rise of Retrieval-Augmented Generation (RAG), AI-driven search has taken a leap forward. In this demo and walkthrough, we’ll explore how to combine OpenSearch with GaiaNet’s OpenAI-compatible LLM APIs to create intelligent, context-aware, question-answering systems.
This is the core of my talk at the conference: “***OpenSearch With Gaia: Building Next-Gen AI-Driven Search and Analytics Workflows.***”
### Pre-requisites:
Add `gaia.domains` in the allowlist for ML endpoints to enable Gaia's local inference to work in your OpenSearch dashboard before starting your OpenSearch instance:
```json
PUT /_cluster/settings
{
"persistent": {
"plugins.ml_commons.trusted_connector_endpoints_regex": [
"^https://runtime\\.sagemaker\\..*[a-z0-9-]\\.amazonaws\\.com/.*$",
"^https://api\\.openai\\.com/.*$",
"^https://api\\.cohere\\.ai/.*$",
"^https://bedrock-runtime\\..*[a-z0-9-]\\.amazonaws\\.com/.*$",
"^https://([a-zA-Z0-9_-]+\\.)?gaia\\.domains(/.*)?$"
]
}
}
```
##### Response
```json
{
"acknowledged": true,
"persistent": {
"plugins": {
"ml_commons": {
"trusted_connector_endpoints_regex": [
"""^https://runtime\.sagemaker\..*[a-z0-9-]\.amazonaws\.com/.*$""",
"""^https://api\.openai\.com/.*$""",
"""^https://api\.cohere\.ai/.*$""",
"""^https://bedrock-runtime\..*[a-z0-9-]\.amazonaws\.com/.*$""",
"""^https://([a-zA-Z0-9_-]+\.)?gaia\.domains(/.*)?$"""
]
}
}
},
"transient": {}
}
```
### 🔌 Step 1: Connecting OpenSearch to Gaia via ML Connectors
Using OpenSearch's ML plugin, we first create a connector to Gaia's `chat/completions` API:
```
POST /_plugins/_ml/connectors/_create
{
"name": "Gaia Chat Completions Connector",
"description": "Run local inference using Gaia nodes",
"version": "0.0.1",
"protocol": "http",
"parameters": {
"endpoint": "0xee7253294f6580c32c3ed745fe578b2eb8220f46.gaia.domains",
"model": "Qwen3-4B-Q5_K_M"
},
"credential": {
"openAI_key": "gaia"
},
"actions": [
{
"action_type": "predict",
"method": "POST",
"url": "https://${parameters.endpoint}/v1/chat/completions",
"request_body": "{ \"model\": \"${parameters.model}\", \"messages\": ${parameters.messages} }"
}
]
}
```
This allows OpenSearch to invoke Gaia's hosted LLM endpoints via a secure HTTP interface.
##### Response
```json
{
"connector_id": "CeOiK5cBCePo0njVuLcx"
}
```
### Enable access control
```json
PUT /_cluster/settings
{
"persistent": {
"plugins.ml_commons.connector_access_control_enabled": true
}
}
```
This enables access control for ML connectors — meaning:
- Only users with proper permissions can create, update, delete, or use ML connectors.
- You must now assign roles/permissions for users to access specific connectors.
### 📦 Step 2: Register and Deploy the Remote Model
Next, register this connector as a model within OpenSearch:
```json
POST /_plugins/_ml/models/_register
{
"name": "Gaia Chat Completions Connector",
"function_name": "remote",
"description": "Connector for Gaia chat completions API using Meta-Llama-3.1-8B-Instruct-Q5_K_M",
"connector_id": "CeOiK5cBCePo0njVuLcx"
}
```
Then deploy the model and deployment makes the LLM accessible for inference and pipeline tasks.
##### Response
```json
{
"task_id": "HeOmK5cBCePo0njVtrdM",
"status": "CREATED",
"model_id": "HuOmK5cBCePo0njVtrds"
}
```
```json
POST /_plugins/_ml/models/HuOmK5cBCePo0njVtrds/_deploy
```
#### Test the inference from your Gaia node
```json
POST /_plugins/_ml/models/HuOmK5cBCePo0njVtrds/_predict
{
"parameters": {
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "Hello!"
}
]
}
}
```
##### Example Response
```json
{
"inference_results": [
{
"output": [
{
"name": "response",
"dataAsMap": {
"id": "chatcmpl-9f41fb06-4c38-4b42-a5da-f9b4c2327778",
"object": "chat.completion",
"created": 1748784148,
"model": "Meta-Llama-3.1-8B-Instruct-Q5_K_M",
"choices": [
{
"index": 0,
"message": {
"content": "It's nice to meet you. Is there something I can help you with or would you like to chat?",
"role": "assistant"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 23,
"completion_tokens": 24,
"total_tokens": 47
}
}
}
],
"status_code": 200
}
]
}
```
### 🔄 Step 3: Create a RAG Pipeline
We define a pipeline that uses the model to answer questions using retrieved context:
```json
PUT /_search/pipeline/my-conversation-search-pipeline-openai
{
"response_processors": [
{
"retrieval_augmented_generation": {
"tag": "Gaia pipeline",
"description": "Demo pipeline Using Gaia",
"model_id": "lgfroJYB8LHbqDxhMJsl",
"context_field_list": [
"text"
],
"system_prompt": "You are a helpful assistant answering questions based ONLY on the provided flight data context. Do not use any prior knowledge. If the context doesn't contain the answer, say so.",
"user_instructions": "Generate a concise and informative answer in less than 100 words for the given question"
}
}
]
}
```
The key configuration here is the `retrieval_augmented_generation` processor which instructs the model to ground its answers solely in flight dataset fields like origin, destination, delay status, etc.
##### Response
```json
{
"acknowledged": true
}
```
```curl
GET /_search/pipeline/flights-gaia-rag-pipeline
```
##### Example Response
```json
{
"flights-gaia-rag-pipeline": {
"description": "RAG Pipeline for Flights Data using Gaia Connector",
"response_processors": [
{
"retrieval_augmented_generation": {
"tag": "gaia_flight_rag",
"description": "Generates response using Gaia based on selected flight data fields",
"model_id": "lgfroJYB8LHbqDxhMJsl",
"context_field_list": [
"_source.OriginCityName",
"_source.DestCityName",
"_source.Carrier",
"_source.Cancelled",
"_source.FlightDelay",
"_source.FlightDelayType",
"_source.FlightDelayMin",
"_source.DistanceMiles",
"_source.FlightTimeMin",
"_source.dayOfWeek"
],
"system_prompt": "You are a helpful flight data assistant. Answer questions based ONLY on the provided context fields from flight records. The context contains specific fields like OriginCityName, DestCityName, Cancelled status, Delays, etc. Do not use external knowledge. If the context doesn't contain the answer, state that the provided flight data doesn't have the information.",
"user_instructions": "Generate a concise and informative answer in less than 100 words for the given question based *only* on the context."
}
}
]
}
}
```
### 🔍 Step 4: Search with Context for RAG
We perform a contextual search on flight data:
```json
POST /opensearch_dashboards_sample_data_flights/_search
{
"size": 3,
"query": {
"bool": {
"must": [
{ "match": { "OriginCityName": "Frankfurt am Main" } },
{ "match": { "DestCityName": "Sydney" } }
]
}
},
"_source": [
"OriginCityName",
"DestCityName",
"Carrier",
"Cancelled",
"FlightDelay",
"FlightDelayType",
"FlightDelayMin",
"DistanceMiles",
"FlightTimeMin",
"AvgTicketPrice",
"timestamp"
]
}
```
This yields context-rich results we can feed into the model.
##### Example Response
```json
{
"took": 10,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 5,
"relation": "eq"
},
"max_score": 8.371025,
"hits": [
{
"_index": "opensearch_dashboards_sample_data_flights",
"_id": "MBuMoJYB4pSSGALzXpsC",
"_score": 8.371025,
"_source": {
"OriginCityName": "Frankfurt am Main",
"FlightDelay": false,
"DistanceMiles": 10247.856675613455,
"FlightTimeMin": 1030.7704158599038,
"AvgTicketPrice": 841.2656419677076,
"Carrier": "OpenSearch Dashboards Airlines",
"FlightDelayMin": 0,
"Cancelled": false,
"FlightDelayType": "No Delay",
"timestamp": "2025-04-28T00:00:00",
"DestCityName": "Sydney"
}
},
{
"_index": "opensearch_dashboards_sample_data_flights",
"_id": "4huMoJYB4pSSGALzXp2C",
"_score": 8.371025,
"_source": {
"OriginCityName": "Frankfurt am Main",
"FlightDelay": false,
"DistanceMiles": 10247.856675613455,
"FlightTimeMin": 1374.3605544798718,
"AvgTicketPrice": 931.8356400184891,
"Carrier": "Logstash Airways",
"FlightDelayMin": 0,
"Cancelled": true,
"FlightDelayType": "No Delay",
"timestamp": "2025-05-01T18:26:48",
"DestCityName": "Sydney"
}
},
{
"_index": "opensearch_dashboards_sample_data_flights",
"_id": "fKuMoJYBeh8aMc4VYAQi",
"_score": 8.371025,
"_source": {
"OriginCityName": "Frankfurt am Main",
"FlightDelay": false,
"DistanceMiles": 10247.856675613455,
"FlightTimeMin": 1499.302423068951,
"AvgTicketPrice": 560.3718963819292,
"Carrier": "OpenSearch Dashboards Airlines",
"FlightDelayMin": 0,
"Cancelled": false,
"FlightDelayType": "No Delay",
"timestamp": "2025-05-23T14:05:17",
"DestCityName": "Sydney"
}
}
]
}
}
```
### 🤖 Step 5: Answer Questions Using Gaia-Powered LLM
Using the model’s `_predict` endpoint:
```json
POST /_plugins/_ml/models/lgfroJYB8LHbqDxhMJsl/_predict
{
"parameters": {
"messages": [
{
"role": "user",
"content": "You are a helpful flight data assistant. Answer the user's question based ONLY on the provided context from flight records. Do not use external knowledge. If the context doesn't contain the answer, state that the provided flight data doesn't have the information.\n\nContext from flight records:\n---\nRecord 1: Origin=Frankfurt am Main, Destination=Sydney, Carrier=OpenSearch Dashboards Airlines, Cancelled=false, Delay=false, DelayType=No Delay, DelayMin=0, Distance=10247.86 miles, FlightTime=1030.77 min, AvgPrice=841.27, Timestamp=2025-04-28T00:00:00\n---\nRecord 2: Origin=Frankfurt am Main, Destination=Sydney, Carrier=Logstash Airways, Cancelled=true, Delay=false, DelayType=No Delay, DelayMin=0, Distance=10247.86 miles, FlightTime=1374.36 min, AvgPrice=931.84, Timestamp=2025-05-01T18:26:48\n---\nRecord 3: Origin=Frankfurt am Main, Destination=Sydney, Carrier=OpenSearch Dashboards Airlines, Cancelled=false, Delay=false, DelayType=No Delay, DelayMin=0, Distance=10247.86 miles, FlightTime=1499.30 min, AvgPrice=560.37, Timestamp=2025-05-23T14:05:17\n---\n\nUser Question: What was the delay time for the OpenSearch Dashboards Airlines flight from Frankfurt am Main to Sydney?"
}
]
}
}
```
##### Example Response
```json
{
"inference_results": [
{
"output": [
{
"name": "response",
"dataAsMap": {
"id": "chatcmpl-13d17ec9-b9a8-4039-9042-572db95591df",
"object": "chat.completion",
"created": 1748581630,
"model": "Meta-Llama-3.1-8B-Instruct-Q5_K_M",
"choices": [
{
"index": 0,
"message": {
"content": "The provided flight data doesn't have the answer. There are multiple records from OpenSearch Dashboards Airlines, but delay information is not available for any of them as Delay=false in all cases.",
"role": "assistant"
},
"finish_reason": "stop",
"logprobs": null
}
],
"usage": {
"prompt_tokens": 359,
"completion_tokens": 40,
"total_tokens": 399
}
}
}
],
"status_code": 200
}
]
}
```
Gaia LLM accurately responds:
> “The provided flight data doesn't have the answer, as the DelayType and DelayMin are 'No Delay' for both flights.”
### ✅ Summary
We’ve now demonstrated a full end-to-end RAG pipeline using OpenSearch and Gaia:
* Remote model connectivity with Gaia
* RAG pipeline definition
* Contextual querying and inference
This allows teams to build powerful, AI-augmented analytics systems that answer real-world questions based on internal data, not hallucinations.
Checkout the [github repo](https://github.com/harishkotra/opensearch-gaia)