---
# System prepended metadata

title: Amazon Bedrock Knowledge Bases + OpenSearch Serverless

---

# Amazon Bedrock Knowledge Bases + OpenSearch Serverless

A practical step‑by‑step guide for future you (and anyone you share this with)

This README walks through the full **console-based** flow for setting up an Amazon Bedrock Knowledge Base backed by **OpenSearch Serverless** and **S3 documents**.

It is written so that:

* You can **recreate the whole setup from scratch** later without having to re-open the workshop.
* You can **switch documents** in your knowledge base cleanly.
* You **avoid the common index / dimension / field-name errors** you ran into.

--- 
### TLDR Tools:
- S3 → document storage
- Bedrock Knowledge Base →
    - reads S3 files
    - chunks + embeds them
    - stores embeddings in a vector database
    - can create the OpenSearch index for you (if using “Quick create”)
- Bedrock Flow → orchestration layer that wires your question → KB retrieval → LLM answer
- **(Able to create through Bedrock)** *OpenSearch Serverless → the actual vector store holding your embeddings*


---

## 0. High‑Level Mental Model

Before clicking anything, anchor on this picture:

* **S3 bucket** = where your raw docs live (PDFs, TXT, shareholder letters, etc.).
* **Embeddings model** (Titan Text Embeddings v2) = converts each chunk of text into a 1024‑dimensional vector.
* **OpenSearch Serverless collection + index** = vector database that stores:

  * The **original text chunk** (e.g., in a `text` field)
  * The **embedding vector** (in a `vector` / `Vector` field)
  * **Metadata** (filename, chunk IDs, etc.).
* **Bedrock Knowledge Base** = the glue service that:

  * Reads from S3
  * Chunks text
  * Calls the embeddings model
  * Writes vectors + metadata into OpenSearch
  * Runs **RetrieveAndGenerate** when you query it.
* **Flows (optional)** = a visual orchestration that wires:

  * Your **question** → **Knowledge Base node** → **Prompt node** → **LLM answer**.

If something breaks, it’s almost always one of:

* Wrong **index name**
* Wrong **vector field name**
* Wrong **vector dimension** (1024 vs 1924)
* Missing **permissions** from Bedrock to OpenSearch.

---

## 1. Create OpenSearch Serverless Vector Store (Workshop Path)

> If you use **Bedrock “Quick create a new vector store”**, Bedrock will create the index for you. This section is for when you want to follow the **workshop’s manual OpenSearch setup**.

### 1.1 Create the Collection

1. In the AWS console, go to **Amazon OpenSearch Service**.
2. In the left menu, click **Serverless** → **Dashboard**.
3. Click **Get started** (if this is your first time) or **Create collection**.
4. Configure:

   * **Collection name:** `bedrock-sample-rag`
   * **Collection type:** `Vector Search`
   * **Deployment type:** `Enable redundancy`
   * **Security:** `Easy create` (for a workshop; in prod you’d do fine‑grained access).
5. Click **Next** → **Submit**.
6. After creation, note down:

   * **Collection name:** `bedrock-sample-rag`
   * **Collection ARN** (looks like `arn:aws:aoss:us-east-1:ACCOUNT_ID:collection/COLLECTION_ID`).

### 1.2 Create the Vector Index

Still inside the same collection:

1. Go to **Indexes** and click **Create index**.
2. Set:

   * **Index name:** `bedrock-sample-rag-index`
3. Under **Vector fields**, click **Add vector field** and configure:

   * **Field name:** `Vector` (or `vector`, but be consistent)
   * **Engine:** `faiss`
   * **Precision:** `FP16` (or FP32)
   * **Dimensions:** `1924` (per the workshop; **NOTE:** this will *not* match Titan v2’s 1024 dims – see Gotchas section)
   * **Distance metric:** `Euclidean`
   * Advanced settings (from workshop):

     * **M:** `16`
     * **ef_construction:** `512`
     * **ef_search:** `512` (if exposed)
4. Add **metadata fields** (if your console supports defining them now):

   * `text` (type: `text`)
   * `text-metadata` (type: `object` / `nested`)
5. Create the index.

📌 **Remember:**

* **Index name:** `bedrock-sample-rag-index`
* **Vector field name:** `Vector` (or `vector`)
* **Text field name:** `text`
* **Metadata field name:** `text-metadata`

These names must match what you later tell Bedrock.

---

## 2. Upload Documents to S3

1. In the AWS console, go to **Amazon S3**.
2. Create (or reuse) a bucket, e.g.:

   * `aws-bedrock-kb-workshop-aoss-fp67` (workshop example)
3. Upload your documents (e.g., Amazon shareholder letters):

   * `AMZN-2019-Shareholder-Letter.pdf`
   * `AMZN-2020-Shareholder-Letter.pdf`
   * `AMZN-2021-Shareholder-Letter.pdf`
   * `AMZN-2022-Shareholder-Letter.pdf`
4. Note the bucket path you’ll use in the Knowledge Base:

   * `s3://aws-bedrock-kb-workshop-aoss-fp67`

You can change files later without changing the bucket.

---

## 3. Create a Knowledge Base in Amazon Bedrock

You have **two modes** here:

* **Mode A: Quick create a new vector store (recommended for future you).**
* **Mode B: Use an existing vector store (what the workshop shows).**

### 3.1 Start Knowledge Base Creation

1. In the AWS console, go to **Amazon Bedrock**.
2. In the left menu, under **Build**, click **Knowledge Bases**.
3. Click **Create** → **Knowledge base with vector store**.
4. Step 1 – **Provide details**:

   * Knowledge base name: e.g. `amzn-shareholder-kb`
   * Service role:

     * For workshop: **Create and use a new service role**.
   * Data source type: **Amazon S3**.

Click **Next**.

### 3.2 Configure Data Source (S3)

1. **S3 URI:**

   * Browse and select your bucket, e.g. `s3://aws-bedrock-kb-workshop-aoss-fp67`.
2. **Chunking strategy:**

   * Change to **Fixed-size chunking**.
   * **Max tokens:** `512` (per workshop; you can tweak later).
3. Parsing strategy: **Default**.
4. Data deletion policy: usually **DELETE** (means if you delete a document from S3 and re‑sync, associated vectors can be removed).

Click **Next**.

### 3.3 Choose Embeddings Model

1. Under **Embeddings model**, click **Select model**.
2. Choose:

   * **`Titan Text Embeddings V2`** (v2.0)
3. Click **Apply**.

👉 Titan v2 produces **1024‑dimensional float vectors**. Your OpenSearch vector field must match this dimension if you are using an existing vector index.

---

## 4. Configure the Vector Store (Important Choices)

This is where most errors happen.

### 4.1 Option A – Quick Create a New Vector Store (Recommended)

Use this when you **don’t care about manually defining the index** and you want Bedrock to handle everything.

1. **Vector store creation method:**

   * Select **Quick create a new vector store – Recommended**.
2. **Vector store type:**

   * Choose **OpenSearch Serverless**.
3. **Collection ARN:**

   * Paste your collection ARN, e.g.:

     * `arn:aws:aoss:us-east-1:ACCOUNT_ID:collection/COLLECTION_ID`
4. Pick an index name for Bedrock to create, e.g.:

   * `kb-index-titan-v2-01`
5. Field mappings:

   * **Vector field name:** `vector`
   * **Text field name:** `text`
   * **Metadata field name:** `metadata` or `text-metadata`

Bedrock will:

* Create the index for you.
* Set `vector` as a `knn_vector` field with dimension **1024**.
* Configure `text` and `metadata` fields.

✅ **This avoids all dimension mismatch errors** and is what you should use going forward unless you have a strong reason not to.

### 4.2 Option B – Use an Existing Vector Store (Workshop Path)

Use this if you **already created** the index in OpenSearch (like `bedrock-sample-rag-index`).

1. **Vector store creation method:**

   * Select **Use an existing vector store**.
2. **Vector store type:**

   * **OpenSearch Serverless**.
3. **Collection ARN:**

   * Same as before, e.g. `arn:aws:aoss:us-east-1:ACCOUNT_ID:collection/osz7...`.
4. **Vector index name:**

   * `bedrock-sample-rag-index` (from the workshop).
5. **Index field mapping:**

   * **Vector field name:** `Vector` or `vector` (must match the field name in that index).
   * **Text field name:** `text`.
   * **Bedrock-managed metadata field name:** `text-metadata`.

⚠️ **Gotcha:** If this index was created with **1924 dimensions** for the `Vector` field, and you’re using **Titan v2 (1024 dims)**, you will get errors like:

> `Query vector has invalid dimension: 1024. Dimension should be: 1924`

To avoid this, either:

* Create a **new index** with a 1024‑dimensional `knn_vector` field, **or**
* Use **Quick create a new vector store** so Bedrock aligns dimensions for you.

---

## 5. Finish Creating and Sync the Knowledge Base

After configuring the vector store:

1. Click **Next**.
2. On **Review and create**, confirm:

   * Knowledge base name
   * S3 URI
   * Embeddings model: **Titan Text Embeddings V2**
   * Vector store details (collection ARN, index name, field names)
3. Click **Create Knowledge Base**.

Once it’s created:

1. Go to 
Knowledge Base.
2. Open the **Data source** tab.
3. Select your S3 data source.
4. Click **Sync**.

Bedrock will now:

* Scan the S3 bucket
* Chunk files
* Embed chunks with Titan v2
* Write to your OpenSearch index

If there are errors, check:

* **Index not found** → index name typo or index doesn’t exist when using “existing vector store” mode.
* **Invalid dimension** → mismatch between Titan v2 (1024) and your OpenSearch `knn_vector` dimension.

---

## 6. Test the Knowledge Base from the Console

1. In your Knowledge Base, look for **Test knowledge base**.
2. Click **Select model**.
3. Choose **Claude 3.5 Haiku** (or any supported model).
4. Ask a question like:

   * `What is Amazon doing in the field of generative AI?`
5. Click **Run**.
6. Expand **Show details** to see:

   * Retrieved chunks
   * Their source files
   * How they were used to answer your question.

This uses the **RetrieveAndGenerate** API under the hood: Bedrock retrieves relevant docs from OpenSearch, then feeds them to the LLM to generate an answer.

---

## 7. Build a Bedrock Flow that Uses the Knowledge Base

Once the Knowledge Base works, you can wire it into a **Flow** so others can use it without knowing any of the plumbing.

### 7.1 Create the Flow

1. In the Bedrock console, go to **Flows**.
2. Click **Create flow**.
3. Name it e.g. `langchain-kb-retriever`.
4. Service role: **Create and use a new service role**.
5. Click **Create Flow**.

You’ll see three default nodes:

* **Flow input**
* **Prompt**
* **Flow output**

### 7.2 Add a Knowledge Base Node

1. Add a new node of type **Knowledge base**.
2. Select the Knowledge Base you created in Section 3.
3. Make sure **Return retrieved results** is enabled.

### 7.3 Configure the Prompt Node

1. Click on the **Prompt** node.
2. Choose a model, e.g. **Nova Micro** or any LLM.
3. Use a prompt like:

```text
Human: You are a financial advisor AI system, and provide answers to questions by using fact-based and statistical information when possible.
Use the following pieces of information to provide a concise answer to the question enclosed in <question> tags.
If you don't know the answer, just say that you don't know; don't try to make up an answer.

<question>
{{question}}
</question>

The response should be specific and use statistics or numbers when possible.

Context: {{context}}

A:
```

4. In **Prompt settings → Inputs**, change the **context** parameter’s data type to **Array**, because the Knowledge Base returns an array of retrieved results.

### 7.4 Wire the Nodes Together

Connect the nodes so that:

* **Flow input → Prompt input**

  * Map `question` on Flow input → `question` on Prompt.
* **Flow input → Knowledge base input**

  * Map `question` on Flow input → `retrieval query` (or equivalent) on KB node.
* **Knowledge base output → Prompt input**

  * Map retrieved results → `context` input on the Prompt.
* **Prompt output → Flow output**

  * Map `completion` (or `text`) → Flow output.

Click **Save**.

### 7.5 Test the Flow

1. Open the **Test panel** (icon on the right side of the Flow editor).
2. Enter a question, e.g.:

   * `What is Amazon doing in the field of generative AI?`
3. Run the flow.
4. Inspect the trace for:

   * **Knowledge Base node** → what chunks were retrieved.
   * **Prompt node** → how the context and question were combined.
   * **Flow output** → final answer.

---

## 8. Updating Documents Later

If you just want to change which documents are used **without touching the OpenSearch index or KB wiring**:

### 8.1 Replace Docs, Same Bucket

1. Upload new documents into the **same S3 bucket**.
2. Optionally remove old documents if you don’t want them included.
3. In Bedrock → Knowledge Bases → your KB → **Data sources**:

   * Select your S3 data source.
   * Click **Sync**.

Bedrock will re‑index the bucket contents and update the embeddings in your vector store.

### 8.2 Fully Reset the Knowledge Base

If you want to wipe the KB logic but **keep the bucket**:

1. Delete the **Knowledge Base** in Bedrock.
2. Leave the **S3 bucket** and documents unchanged.
3. Create a **new Knowledge Base** pointing to the same S3 bucket.
4. Prefer **Quick create a new vector store** for fewer errors.

You can reuse the same S3 path; the KB is what you are resetting.

---

## What a Working Screenshot Looks Like

![image](https://hackmd.io/_uploads/ByopPW3eWe.png)

![image](https://hackmd.io/_uploads/SymRvbhgbl.png)

One of our knowledgebase sources: https://www.blackrock.com/corporate/literature/whitepaper/bii-global-outlook-in-charts.pdf 

---

## 9. Common Errors and How to Decode Them

### 9.1 `no such index [name]`

**Meaning:**

* Bedrock was told to use an index (e.g. `bedrock-kb-index-01`) that **doesn’t exist** in your collection, *and* you chose **Use an existing vector store**.

**Fix:**

* Either **create that index** manually in OpenSearch, or
* Switch to **Quick create a new vector store** and let Bedrock build it.

### 9.2 `Field 'vector' is not knn_vector type`

**Meaning:**

* Bedrock expects the field you named `vector` to be a **knn_vector**, but in the index it’s some other type (e.g. `float`, `text`, or not present).

**Fix:**

* Ensure that in your index mapping, the field is defined as:

  ```json
  "vector": {
    "type": "knn_vector",
    "dimension": 1024
  }
  ```

* Or again, let Bedrock **quick-create** the vector store so it defines this correctly.

### 9.3 `Query vector has invalid dimension: 1024. Dimension should be: 1924`

**Meaning:**

* Your embedding model produces 1024‑dim vectors (Titan v2), but your OpenSearch `knn_vector` field was defined with dimension 1924.

**Fix:**

* Create a **new index** whose vector field has **dimension 1024**, and point Bedrock to that.
* Or use **Quick create** so Bedrock sets the dimension correctly.

### 9.4 Access / permissions errors (403 / dependency failure)

**Meaning:**

* The **Data Access Policy** for your OpenSearch collection does not allow the Bedrock execution role to read/write the index.

**Fix:**

* Add the Bedrock KB role (`AmazonBedrockExecutionRoleForKnowledgeBase_*`) to the collection’s data policy with permissions like:

  ```json
  {
    "Rules": [
      {
        "ResourceType": "index",
        "Resource": [
          "index/bedrock-sample-rag/*"
        ],
        "Permission": [
          "aoss:CreateIndex",
          "aoss:UpdateIndex",
          "aoss:ReadDocument",
          "aoss:WriteDocument"
        ]
      }
    ],
    "Principal": [
      "arn:aws:iam::ACCOUNT_ID:role/AmazonBedrockExecutionRoleForKnowledgeBase_*"
    ]
  }
  ```

---

## 10. Cheat Sheet / TL;DR for Future Reference

When in doubt, follow this minimal happy path:

1. **Docs** → Put PDFs/TXT in `s3://your-bucket/path`.
2. **OpenSearch** → Create a **vector collection** (`Vector Search`).
3. In Bedrock → **Knowledge Bases** → Create:

   * S3 data source → your bucket
   * Embeddings → **Titan Text Embeddings V2**
   * **Vector store creation method → Quick create a new vector store**
4. Let Bedrock create the index and all field mappings.
5. After creation → **Sync** the data source.
6. Test the KB with **Test knowledge base**.
7. (Optional) Build a **Flow** with:

   * Flow input → KB node → Prompt node → Flow output.

If you stick to **Quick create** and keep S3 docs tidy, you’ll:

* Avoid 95% of the dimension and index errors.
* Be able to swap / add docs just by **re‑syncing** the KB.

You can now safely forget the workshop page and use this README as your canonical guide.