# Amazon Bedrock Knowledge Bases + OpenSearch Serverless
A practical step‑by‑step guide for future you (and anyone you share this with)
This README walks through the full **console-based** flow for setting up an Amazon Bedrock Knowledge Base backed by **OpenSearch Serverless** and **S3 documents**.
It is written so that:
* You can **recreate the whole setup from scratch** later without having to re-open the workshop.
* You can **switch documents** in your knowledge base cleanly.
* You **avoid the common index / dimension / field-name errors** you ran into.
---
### TLDR Tools:
- S3 → document storage
- Bedrock Knowledge Base →
- reads S3 files
- chunks + embeds them
- stores embeddings in a vector database
- can create the OpenSearch index for you (if using “Quick create”)
- Bedrock Flow → orchestration layer that wires your question → KB retrieval → LLM answer
- **(Able to create through Bedrock)** *OpenSearch Serverless → the actual vector store holding your embeddings*
---
## 0. High‑Level Mental Model
Before clicking anything, anchor on this picture:
* **S3 bucket** = where your raw docs live (PDFs, TXT, shareholder letters, etc.).
* **Embeddings model** (Titan Text Embeddings v2) = converts each chunk of text into a 1024‑dimensional vector.
* **OpenSearch Serverless collection + index** = vector database that stores:
* The **original text chunk** (e.g., in a `text` field)
* The **embedding vector** (in a `vector` / `Vector` field)
* **Metadata** (filename, chunk IDs, etc.).
* **Bedrock Knowledge Base** = the glue service that:
* Reads from S3
* Chunks text
* Calls the embeddings model
* Writes vectors + metadata into OpenSearch
* Runs **RetrieveAndGenerate** when you query it.
* **Flows (optional)** = a visual orchestration that wires:
* Your **question** → **Knowledge Base node** → **Prompt node** → **LLM answer**.
If something breaks, it’s almost always one of:
* Wrong **index name**
* Wrong **vector field name**
* Wrong **vector dimension** (1024 vs 1924)
* Missing **permissions** from Bedrock to OpenSearch.
---
## 1. Create OpenSearch Serverless Vector Store (Workshop Path)
> If you use **Bedrock “Quick create a new vector store”**, Bedrock will create the index for you. This section is for when you want to follow the **workshop’s manual OpenSearch setup**.
### 1.1 Create the Collection
1. In the AWS console, go to **Amazon OpenSearch Service**.
2. In the left menu, click **Serverless** → **Dashboard**.
3. Click **Get started** (if this is your first time) or **Create collection**.
4. Configure:
* **Collection name:** `bedrock-sample-rag`
* **Collection type:** `Vector Search`
* **Deployment type:** `Enable redundancy`
* **Security:** `Easy create` (for a workshop; in prod you’d do fine‑grained access).
5. Click **Next** → **Submit**.
6. After creation, note down:
* **Collection name:** `bedrock-sample-rag`
* **Collection ARN** (looks like `arn:aws:aoss:us-east-1:ACCOUNT_ID:collection/COLLECTION_ID`).
### 1.2 Create the Vector Index
Still inside the same collection:
1. Go to **Indexes** and click **Create index**.
2. Set:
* **Index name:** `bedrock-sample-rag-index`
3. Under **Vector fields**, click **Add vector field** and configure:
* **Field name:** `Vector` (or `vector`, but be consistent)
* **Engine:** `faiss`
* **Precision:** `FP16` (or FP32)
* **Dimensions:** `1924` (per the workshop; **NOTE:** this will *not* match Titan v2’s 1024 dims – see Gotchas section)
* **Distance metric:** `Euclidean`
* Advanced settings (from workshop):
* **M:** `16`
* **ef_construction:** `512`
* **ef_search:** `512` (if exposed)
4. Add **metadata fields** (if your console supports defining them now):
* `text` (type: `text`)
* `text-metadata` (type: `object` / `nested`)
5. Create the index.
📌 **Remember:**
* **Index name:** `bedrock-sample-rag-index`
* **Vector field name:** `Vector` (or `vector`)
* **Text field name:** `text`
* **Metadata field name:** `text-metadata`
These names must match what you later tell Bedrock.
---
## 2. Upload Documents to S3
1. In the AWS console, go to **Amazon S3**.
2. Create (or reuse) a bucket, e.g.:
* `aws-bedrock-kb-workshop-aoss-fp67` (workshop example)
3. Upload your documents (e.g., Amazon shareholder letters):
* `AMZN-2019-Shareholder-Letter.pdf`
* `AMZN-2020-Shareholder-Letter.pdf`
* `AMZN-2021-Shareholder-Letter.pdf`
* `AMZN-2022-Shareholder-Letter.pdf`
4. Note the bucket path you’ll use in the Knowledge Base:
* `s3://aws-bedrock-kb-workshop-aoss-fp67`
You can change files later without changing the bucket.
---
## 3. Create a Knowledge Base in Amazon Bedrock
You have **two modes** here:
* **Mode A: Quick create a new vector store (recommended for future you).**
* **Mode B: Use an existing vector store (what the workshop shows).**
### 3.1 Start Knowledge Base Creation
1. In the AWS console, go to **Amazon Bedrock**.
2. In the left menu, under **Build**, click **Knowledge Bases**.
3. Click **Create** → **Knowledge base with vector store**.
4. Step 1 – **Provide details**:
* Knowledge base name: e.g. `amzn-shareholder-kb`
* Service role:
* For workshop: **Create and use a new service role**.
* Data source type: **Amazon S3**.
Click **Next**.
### 3.2 Configure Data Source (S3)
1. **S3 URI:**
* Browse and select your bucket, e.g. `s3://aws-bedrock-kb-workshop-aoss-fp67`.
2. **Chunking strategy:**
* Change to **Fixed-size chunking**.
* **Max tokens:** `512` (per workshop; you can tweak later).
3. Parsing strategy: **Default**.
4. Data deletion policy: usually **DELETE** (means if you delete a document from S3 and re‑sync, associated vectors can be removed).
Click **Next**.
### 3.3 Choose Embeddings Model
1. Under **Embeddings model**, click **Select model**.
2. Choose:
* **`Titan Text Embeddings V2`** (v2.0)
3. Click **Apply**.
👉 Titan v2 produces **1024‑dimensional float vectors**. Your OpenSearch vector field must match this dimension if you are using an existing vector index.
---
## 4. Configure the Vector Store (Important Choices)
This is where most errors happen.
### 4.1 Option A – Quick Create a New Vector Store (Recommended)
Use this when you **don’t care about manually defining the index** and you want Bedrock to handle everything.
1. **Vector store creation method:**
* Select **Quick create a new vector store – Recommended**.
2. **Vector store type:**
* Choose **OpenSearch Serverless**.
3. **Collection ARN:**
* Paste your collection ARN, e.g.:
* `arn:aws:aoss:us-east-1:ACCOUNT_ID:collection/COLLECTION_ID`
4. Pick an index name for Bedrock to create, e.g.:
* `kb-index-titan-v2-01`
5. Field mappings:
* **Vector field name:** `vector`
* **Text field name:** `text`
* **Metadata field name:** `metadata` or `text-metadata`
Bedrock will:
* Create the index for you.
* Set `vector` as a `knn_vector` field with dimension **1024**.
* Configure `text` and `metadata` fields.
✅ **This avoids all dimension mismatch errors** and is what you should use going forward unless you have a strong reason not to.
### 4.2 Option B – Use an Existing Vector Store (Workshop Path)
Use this if you **already created** the index in OpenSearch (like `bedrock-sample-rag-index`).
1. **Vector store creation method:**
* Select **Use an existing vector store**.
2. **Vector store type:**
* **OpenSearch Serverless**.
3. **Collection ARN:**
* Same as before, e.g. `arn:aws:aoss:us-east-1:ACCOUNT_ID:collection/osz7...`.
4. **Vector index name:**
* `bedrock-sample-rag-index` (from the workshop).
5. **Index field mapping:**
* **Vector field name:** `Vector` or `vector` (must match the field name in that index).
* **Text field name:** `text`.
* **Bedrock-managed metadata field name:** `text-metadata`.
⚠️ **Gotcha:** If this index was created with **1924 dimensions** for the `Vector` field, and you’re using **Titan v2 (1024 dims)**, you will get errors like:
> `Query vector has invalid dimension: 1024. Dimension should be: 1924`
To avoid this, either:
* Create a **new index** with a 1024‑dimensional `knn_vector` field, **or**
* Use **Quick create a new vector store** so Bedrock aligns dimensions for you.
---
## 5. Finish Creating and Sync the Knowledge Base
After configuring the vector store:
1. Click **Next**.
2. On **Review and create**, confirm:
* Knowledge base name
* S3 URI
* Embeddings model: **Titan Text Embeddings V2**
* Vector store details (collection ARN, index name, field names)
3. Click **Create Knowledge Base**.
Once it’s created:
1. Go to
Knowledge Base.
2. Open the **Data source** tab.
3. Select your S3 data source.
4. Click **Sync**.
Bedrock will now:
* Scan the S3 bucket
* Chunk files
* Embed chunks with Titan v2
* Write to your OpenSearch index
If there are errors, check:
* **Index not found** → index name typo or index doesn’t exist when using “existing vector store” mode.
* **Invalid dimension** → mismatch between Titan v2 (1024) and your OpenSearch `knn_vector` dimension.
---
## 6. Test the Knowledge Base from the Console
1. In your Knowledge Base, look for **Test knowledge base**.
2. Click **Select model**.
3. Choose **Claude 3.5 Haiku** (or any supported model).
4. Ask a question like:
* `What is Amazon doing in the field of generative AI?`
5. Click **Run**.
6. Expand **Show details** to see:
* Retrieved chunks
* Their source files
* How they were used to answer your question.
This uses the **RetrieveAndGenerate** API under the hood: Bedrock retrieves relevant docs from OpenSearch, then feeds them to the LLM to generate an answer.
---
## 7. Build a Bedrock Flow that Uses the Knowledge Base
Once the Knowledge Base works, you can wire it into a **Flow** so others can use it without knowing any of the plumbing.
### 7.1 Create the Flow
1. In the Bedrock console, go to **Flows**.
2. Click **Create flow**.
3. Name it e.g. `langchain-kb-retriever`.
4. Service role: **Create and use a new service role**.
5. Click **Create Flow**.
You’ll see three default nodes:
* **Flow input**
* **Prompt**
* **Flow output**
### 7.2 Add a Knowledge Base Node
1. Add a new node of type **Knowledge base**.
2. Select the Knowledge Base you created in Section 3.
3. Make sure **Return retrieved results** is enabled.
### 7.3 Configure the Prompt Node
1. Click on the **Prompt** node.
2. Choose a model, e.g. **Nova Micro** or any LLM.
3. Use a prompt like:
```text
Human: You are a financial advisor AI system, and provide answers to questions by using fact-based and statistical information when possible.
Use the following pieces of information to provide a concise answer to the question enclosed in <question> tags.
If you don't know the answer, just say that you don't know; don't try to make up an answer.
<question>
{{question}}
</question>
The response should be specific and use statistics or numbers when possible.
Context: {{context}}
A:
```
4. In **Prompt settings → Inputs**, change the **context** parameter’s data type to **Array**, because the Knowledge Base returns an array of retrieved results.
### 7.4 Wire the Nodes Together
Connect the nodes so that:
* **Flow input → Prompt input**
* Map `question` on Flow input → `question` on Prompt.
* **Flow input → Knowledge base input**
* Map `question` on Flow input → `retrieval query` (or equivalent) on KB node.
* **Knowledge base output → Prompt input**
* Map retrieved results → `context` input on the Prompt.
* **Prompt output → Flow output**
* Map `completion` (or `text`) → Flow output.
Click **Save**.
### 7.5 Test the Flow
1. Open the **Test panel** (icon on the right side of the Flow editor).
2. Enter a question, e.g.:
* `What is Amazon doing in the field of generative AI?`
3. Run the flow.
4. Inspect the trace for:
* **Knowledge Base node** → what chunks were retrieved.
* **Prompt node** → how the context and question were combined.
* **Flow output** → final answer.
---
## 8. Updating Documents Later
If you just want to change which documents are used **without touching the OpenSearch index or KB wiring**:
### 8.1 Replace Docs, Same Bucket
1. Upload new documents into the **same S3 bucket**.
2. Optionally remove old documents if you don’t want them included.
3. In Bedrock → Knowledge Bases → your KB → **Data sources**:
* Select your S3 data source.
* Click **Sync**.
Bedrock will re‑index the bucket contents and update the embeddings in your vector store.
### 8.2 Fully Reset the Knowledge Base
If you want to wipe the KB logic but **keep the bucket**:
1. Delete the **Knowledge Base** in Bedrock.
2. Leave the **S3 bucket** and documents unchanged.
3. Create a **new Knowledge Base** pointing to the same S3 bucket.
4. Prefer **Quick create a new vector store** for fewer errors.
You can reuse the same S3 path; the KB is what you are resetting.
---
## What a Working Screenshot Looks Like


One of our knowledgebase sources: https://www.blackrock.com/corporate/literature/whitepaper/bii-global-outlook-in-charts.pdf
---
## 9. Common Errors and How to Decode Them
### 9.1 `no such index [name]`
**Meaning:**
* Bedrock was told to use an index (e.g. `bedrock-kb-index-01`) that **doesn’t exist** in your collection, *and* you chose **Use an existing vector store**.
**Fix:**
* Either **create that index** manually in OpenSearch, or
* Switch to **Quick create a new vector store** and let Bedrock build it.
### 9.2 `Field 'vector' is not knn_vector type`
**Meaning:**
* Bedrock expects the field you named `vector` to be a **knn_vector**, but in the index it’s some other type (e.g. `float`, `text`, or not present).
**Fix:**
* Ensure that in your index mapping, the field is defined as:
```json
"vector": {
"type": "knn_vector",
"dimension": 1024
}
```
* Or again, let Bedrock **quick-create** the vector store so it defines this correctly.
### 9.3 `Query vector has invalid dimension: 1024. Dimension should be: 1924`
**Meaning:**
* Your embedding model produces 1024‑dim vectors (Titan v2), but your OpenSearch `knn_vector` field was defined with dimension 1924.
**Fix:**
* Create a **new index** whose vector field has **dimension 1024**, and point Bedrock to that.
* Or use **Quick create** so Bedrock sets the dimension correctly.
### 9.4 Access / permissions errors (403 / dependency failure)
**Meaning:**
* The **Data Access Policy** for your OpenSearch collection does not allow the Bedrock execution role to read/write the index.
**Fix:**
* Add the Bedrock KB role (`AmazonBedrockExecutionRoleForKnowledgeBase_*`) to the collection’s data policy with permissions like:
```json
{
"Rules": [
{
"ResourceType": "index",
"Resource": [
"index/bedrock-sample-rag/*"
],
"Permission": [
"aoss:CreateIndex",
"aoss:UpdateIndex",
"aoss:ReadDocument",
"aoss:WriteDocument"
]
}
],
"Principal": [
"arn:aws:iam::ACCOUNT_ID:role/AmazonBedrockExecutionRoleForKnowledgeBase_*"
]
}
```
---
## 10. Cheat Sheet / TL;DR for Future Reference
When in doubt, follow this minimal happy path:
1. **Docs** → Put PDFs/TXT in `s3://your-bucket/path`.
2. **OpenSearch** → Create a **vector collection** (`Vector Search`).
3. In Bedrock → **Knowledge Bases** → Create:
* S3 data source → your bucket
* Embeddings → **Titan Text Embeddings V2**
* **Vector store creation method → Quick create a new vector store**
4. Let Bedrock create the index and all field mappings.
5. After creation → **Sync** the data source.
6. Test the KB with **Test knowledge base**.
7. (Optional) Build a **Flow** with:
* Flow input → KB node → Prompt node → Flow output.
If you stick to **Quick create** and keep S3 docs tidy, you’ll:
* Avoid 95% of the dimension and index errors.
* Be able to swap / add docs just by **re‑syncing** the KB.
You can now safely forget the workshop page and use this README as your canonical guide.