# Dynamic Pinecone Index Selection & Dataset-Level Migration
This document provides a full technical overview of the new functionality that enables platform administrators to assign Pinecone indexes at the **dataset level**, migrate vectors across indexes, and dynamically retrieve indexes from Pinecone.
---
## 1. 🎯 Purpose of the Feature
The platform currently uses a single global Pinecone index for all datasets.
This enhancement introduces support for **multiple Pinecone indexes** and allows **each dataset** inside a workspace to be individually assigned to any existing or newly created index.
Key goals:
* Allow platform admins to assign an index per dataset
* Allow creation of new indexes directly from the admin UI
* Migrate existing vectors of a dataset to the selected index
* Preserve namespace-per-dataset structure
* Fetch all available Pinecone indexes dynamically
This enables improved performance distribution, scalability, and operational flexibility.
---
## 2. 🧩 Current System Behavior
* Only **one global Pinecone index** is used (e.g., `default-index`).
* Every dataset in every workspace is stored as a **namespace** within this index.
* Namespace = `dataset.id`.
Example:
```
default-index
├── namespace: dataset_1
├── namespace: dataset_2
└── namespace: dataset_3
```
All vector operations — upsert, delete, search — operate within this single index.
---
## 3. 🧩 New Required Behavior
Each dataset becomes independently assignable to any Pinecone index.
### When a platform admin selects or creates an index:
1. **Only the selected dataset** is affected.
2. The dataset’s existing vector data is **migrated** to the newly selected index.
3. Namespace remains unchanged (the dataset ID).
4. Other datasets in the workspace remain in their current indexes.
5. All future operations for that dataset use the newly assigned index.
Resulting layout example:
```
index_A
└── namespace: dataset_1
index_B
└── namespace: dataset_2
default-index
└── namespace: dataset_3
```
This allows flexible distribution of datasets across indexes.
---
## 4. 🖥️ Admin-Only UI Behavior (Dataset-Level Settings)
A new control appears inside **each dataset’s Knowledge Base configuration screen**.
### Admin capabilities:
* **Select an existing Pinecone index** (from dynamic list)
* **Create a new Pinecone index**
* Apply selection → triggers dataset vector migration
### Visibility:
* Only visible to **platform administrators**
* Regular workspace users do not see or control index selection
### Scope:
* Selection applies **only to the dataset currently being configured**
* A workspace may have datasets spread across multiple indexes
---
## 5. 🔄 Dataset-Level Migration Flow
When the administrator selects a new index for a dataset, the system performs:
### 1. Detect old and new index names
Old index stored in the dataset metadata (`index_struct_dict`).
### 2. Fetch all vector IDs from the old index
Using the dataset’s namespace (dataset ID).
### 3. Fetch dense vectors, sparse vectors, and metadata in batches
### 4. Upsert vectors into the new index
Namespace remains:
```
namespace = dataset.id
```
### 5. Delete vectors from the old index upon successful migration
### 6. Update dataset metadata
`index_struct_dict['vector_store']['index_name'] = <selected index>`
### 7. All future writes go to the new index
Only one dataset is affected per migration.
---
## 6. 🔌 Dynamic Retrieval of Existing Pinecone Indexes
To enable index selection from the UI, the platform provides a new endpoint.
### **Endpoint**
```
GET /api/v1/vector-store/pinecone/indexes
```
### **Purpose**
* Fetch all existing Pinecone indexes directly from Pinecone
* Provide up-to-date data for admin index selection UI
### **Backend logic**
```python
pc = Pinecone(api_key=PINECONE_API_KEY)
indexes = pc.list_indexes().names()
```
Optionally, for each index:
```python
pc.describe_index(name)
```
### **Example Response**
```json
{
"indexes": [
{ "name": "default-index", "cloud": "aws", "region": "us-east-1", "dimension": 3072, "status": "ready" },
{ "name": "customer-xyz", "cloud": "aws", "region": "us-east-1", "dimension": 3072, "status": "ready" }
]
}
```
This ensures the UI always displays an accurate, real-time list of indexes.
---
## 7. 🛠 Code-Level Highlights (Required Adjustments)
Below are targeted updates to support the feature.
References include direct pointers to the existing code.
---
## 7.1 `vector_factory.py`
### **a. Store dataset-specific index_name in metadata**
`index_struct_dict` must include:
```json
"vector_store": {
"index_name": "<selected_index_name>",
"class_prefix": "<collection_name>"
}
```
### **b. Load index_name dynamically**
Replace use of:
```
config.get('PINECONE_INDEX_NAME')
```
with:
```python
index_name = self._dataset.index_struct_dict['vector_store']['index_name']
```
This ensures each dataset initializes Pinecone using its own assigned index.
### **c. Inject into PineconeConfig dynamically**
```python
config=PineconeConfig(
api_key=config.get('PINECONE_API_KEY'),
cloud=config.get('PINECONE_CLOUD'),
region=config.get('PINECONE_REGION'),
index_name=index_name,
dimension=int(config.get('PINECONE_DIMENSIONS')),
batch_size=int(config.get('PINECONE_BATCH_SIZE'))
)
```
No other modifications required.
---
## 7.2 `pinecone_vector.py`
### **a. Ensure dynamic index usage**
Since `PineconeConfig.index_name` is now dataset-specific, all operations—upsert, delete, query—are already aligned.
### **b. Correct `get_collection_name()` behavior**
Currently:
```python
index_name = dataset.index_struct_dict['vector_store']['index_name']
return index_name
```
This should instead continue to use the dataset ID as the namespace:
```python
return Dataset.gen_collection_name_by_id(dataset.id)
```
### **c. Add dataset migration helper**
A new internal method will handle the migration:
```python
def migrate_to_new_index(self, new_index_name):
# fetch from old index → upsert into new index → delete old
pass
```
This is triggered by the admin action.
### **d. Namespace handling remains unchanged**
All upserts and searches correctly use:
```python
namespace=self._dataset_id
```
which aligns with dataset-specific separation.
---
## 8. 📚 Updated System Behavior Summary
After implementing this feature:
* Each dataset can reside in a different Pinecone index
* Admins can dynamically assign or create indexes
* Only the selected dataset’s vectors are migrated
* Namespace structure remains unchanged
* Search, hybrid search, upsert, and delete all operate through the dataset’s assigned index
* Index lists are fetched dynamically from Pinecone via API
This enables horizontal scaling, improved index distribution, and operational flexibility without altering user workflow.
---