### [AI / ML領域相關學習筆記入口頁面](https://hackmd.io/@YungHuiHsu/BySsb5dfp)
#### [Deeplearning.ai GenAI/LLM系列課程筆記](https://learn.deeplearning.ai/)
##### GenAI
- [Large Language Models with Semantic Search。大型語言模型與語義搜索 ](https://hackmd.io/@YungHuiHsu/rku-vjhZT)
- [LangChain for LLM Application Development。使用LangChain進行LLM應用開發](https://hackmd.io/1r4pzdfFRwOIRrhtF9iFKQ)
- [Finetuning Large Language Models。微調大型語言模型](https://hackmd.io/@YungHuiHsu/HJ6AT8XG6)
##### RAG
- [Preprocessing Unstructured Data for LLM Applications。大型語言模型(LLM)應用的非結構化資料前處理](https://hackmd.io/@YungHuiHsu/BJDAbgpgR)
- [Building and Evaluating Advanced RAG。建立與評估進階RAG](https://hackmd.io/@YungHuiHsu/rkqGpCDca)
- [[GenAI][RAG] Multi-Modal Retrieval-Augmented Generation and Evaluaion。多模態的RAG與評估
](https://hackmd.io/@YungHuiHsu/B1LJcOlfA)
- [Llamaindex 與 Langchain的多模態檢索實作](https://hackmd.io/@YungHuiHsu/S1f8gEBD0)
##### AI Agents
- [AI Agents in LangGraph](https://hackmd.io/@YungHuiHsu/BJTKpkEHC)
##### Framework
- [LlamaIndex - Data Storage Architecture Design and Base Components<br>LlamaIndex - 資料儲存的架構設計與基礎元件](https://hackmd.io/@YungHuiHsu/HJ6iqpmPC)
- [Llamaindex 與 Langchain的多模態檢索實作](https://hackmd.io/@YungHuiHsu/S1f8gEBD0)
---
# Llamaindex 與 Langchain的多模態檢索實作
![image](https://hackmd.io/_uploads/ry5DTPcvA.png =800x)
詳細說明如下
### Llamaindex `class MultiModalVectorStoreIndex`
- [llama_index/llama-index-core/llama_index/core/indices/multi_modal
/base.py](https://github.com/run-llama/llama_index/blob/4cfd28ef84bffc90779638c1e29e42814822dab0/llama-index-core/llama_index/core/indices/multi_modal/base.py#L43)
::: info
- 透過設置`is_image_to_text=True`,將儲存在`ImageDocument`內的文字轉為embedding(使用文字的embed model)儲存
- 例如,影像的摘要,可以儲存在`ImageDocument`內的`ImageNodes`內
- 檢索時可選擇檢索ImageVector內的text或 image embedding,返回`ImageDocuments`內的原始圖像
:::
```mermaid
classDiagram
class VectorStoreIndex {
+nodes: Optional[Sequence[BaseNode]]
+index_struct: Optional[IndexDict]
+embed_model: Optional[BaseEmbedding]
+storage_context: Optional[StorageContext]
+use_async: bool
+store_nodes_override: bool
+show_progress: bool
}
class MultiModalVectorStoreIndex {
+image_namespace: str
+index_struct_cls: MultiModelIndexDict
+nodes: Optional[Sequence[BaseNode]]
+index_struct: Optional[MultiModelIndexDict]
+embed_model: Optional[BaseEmbedding]
+storage_context: Optional[StorageContext]
+use_async: bool
+store_nodes_override: bool
+show_progress: bool
+image_vector_store: Optional[VectorStore]
+image_embed_model: EmbedType
+is_image_to_text: bool
+is_image_vector_store_empty: bool
+is_text_vector_store_empty: bool
}
VectorStoreIndex <|-- MultiModalVectorStoreIndex
MultiModalVectorStoreIndex *-- MultiModelIndexDict
MultiModalVectorStoreIndex *-- ImageNode
MultiModalVectorStoreIndex *-- EmbedType
MultiModalVectorStoreIndex *-- VectorStore
MultiModalVectorStoreIndex *-- StorageContext
```
### Langchain MultiVector Retriever
- [MultiVector Retriever](https://python.langchain.com/v0.1/docs/modules/data_connection/retrievers/multi_vector/?ref=blog.langchain.dev#summary)
- [Multi-modal eval: GPT-4 w/ multi-modal embeddings and multi-vector retriever](https://langchain-ai.github.io/langchain-benchmarks/notebooks/retrieval/multi_modal_benchmarking/multi_modal_eval.html)
- 見Option 2: Multi-vector retriever
- 使用影像摘要檢索、返回原始影像
> This approach will generate and index image summaries. See detail here.
It will then retrieve the raw image to pass to GPT-4V for final synthesis.
```python=
import uuid
from langchain.embeddings import OpenAIEmbeddings
from langchain.retrievers.multi_vector import MultiVectorRetriever
from langchain.schema.document import Document
from langchain.schema.output_parser import StrOutputParser
from langchain.storage import InMemoryStore
def create_multi_vector_retriever(vectorstore, image_summaries, images):
"""
Create retriever that indexes summaries, but returns raw images or texts
:param vectorstore: Vectorstore to store embedded image sumamries
:param image_summaries: Image summaries
:param images: Base64 encoded images
:return: Retriever
"""
# Initialize the storage layer
store = InMemoryStore()
id_key = "doc_id"
# Create the multi-vector retriever
retriever = MultiVectorRetriever(
vectorstore=vectorstore,
docstore=store,
id_key=id_key,
)
# Helper function to add documents to the vectorstore and docstore
def add_documents(retriever, doc_summaries, doc_contents):
doc_ids = [str(uuid.uuid4()) for _ in doc_contents]
summary_docs = [
Document(page_content=s, metadata={id_key: doc_ids[i]})
for i, s in enumerate(doc_summaries)
]
retriever.vectorstore.add_documents(summary_docs)
retriever.docstore.mset(list(zip(doc_ids, doc_contents)))
add_documents(retriever, image_summaries, images)
return retriever
# The vectorstore to use to index the summaries
vectorstore_mvr = Chroma(
collection_name="multi-modal-rag-mv", embedding_function=OpenAIEmbeddings()
)
# Create retriever
retriever_multi_vector_img = create_multi_vector_retriever(
vectorstore_mvr,
image_summaries,
images_base_64_processed,
)
```
#### 程式邏輯與流程
1. **生成圖像摘要**:
```python
image_summaries, images_base_64_processed = generate_img_summaries(images_base_64)
```
2. **創建多向量檢索器**:
```python
retriever_multi_vector_img = create_multi_vector_retriever(
vectorstore_mvr,
image_summaries,
images_base_64_processed,
)
```
3. **在 `create_multi_vector_retriever` 函數中**:
- 初始化存儲層:
```python
store = InMemoryStore()
```
- 創建多向量檢索器:
```python
retriever = MultiVectorRetriever(
vectorstore=vectorstore,
docstore=store,
id_key=id_key,
)
```
- 添加文檔到 `vectorstore` 和 `docstore`:
```python
def add_documents(retriever, doc_summaries, doc_contents):
doc_ids = [str(uuid.uuid4()) for _ in doc_contents]
summary_docs = [
Document(page_content=s, metadata={id_key: doc_ids[i]})
for i, s in enumerate(doc_summaries)
]
retriever.vectorstore.add_documents(summary_docs)
retriever.docstore.mset(list(zip(doc_ids, doc_contents)))
```
- `doc_ids = [str(uuid.uuid4()) for _ in doc_contents]` 與 `uuid`
- `uuid` 是一個用於生成全局唯一標識符(UUID)的模塊。UUID 是一個128位的標識符,通常用於標識信息中的唯一實體,比如數據庫中的記錄。UUID 保證了其唯一性,即使是在不同系統之間生成的 UUID 也不會衝突。
- `uuid.uuid4()` 生成一個隨機的 UUID(基於隨機數生成)
為 `doc_contents` 中的每個文檔生成一個唯一的識別符(UUID),這些識別符將用於後續的數據存儲和檢索過程中
- 立用`doc_ids`建立`vectorstore` 與 `docstore` 之間的關聯
- `vectorstore` 中存儲的是文檔的摘要(`summary_docs`),每個摘要都有一個唯一的 `doc_id` 作為其元數據的一部分。
- `docstore` 中存儲的是文檔的內容(`doc_contents`),這裡是處理過的影像,每個內容與對應的 `doc_id` 關聯。
當需要檢索文檔時,可以通過 `doc_id` 來從 `vectorstore` 中獲取摘要,並從 `docstore` 中獲取對應的完整內容(影像)
這個流程利用唯一的 `doc_ids` 來將處理後的圖像(`processed_images`)與其摘要(`image_summaries`)關聯起來,並通過 `vectorstore` 和 `docstore` 進行存儲和檢索。`doc_ids` 作為關聯鍵,保證了在檢索時可以正確地匹配摘要與其對應的完整圖像