### [AI / ML領域相關學習筆記入口頁面](https://hackmd.io/@YungHuiHsu/BySsb5dfp) #### [Deeplearning.ai GenAI/LLM系列課程筆記](https://learn.deeplearning.ai/) ##### GenAI - [Large Language Models with Semantic Search。大型語言模型與語義搜索 ](https://hackmd.io/@YungHuiHsu/rku-vjhZT) - [LangChain for LLM Application Development。使用LangChain進行LLM應用開發](https://hackmd.io/1r4pzdfFRwOIRrhtF9iFKQ) - [Finetuning Large Language Models。微調大型語言模型](https://hackmd.io/@YungHuiHsu/HJ6AT8XG6) ##### RAG - [Preprocessing Unstructured Data for LLM Applications。大型語言模型(LLM)應用的非結構化資料前處理](https://hackmd.io/@YungHuiHsu/BJDAbgpgR) - [Building and Evaluating Advanced RAG。建立與評估進階RAG](https://hackmd.io/@YungHuiHsu/rkqGpCDca) - [[GenAI][RAG] Multi-Modal Retrieval-Augmented Generation and Evaluaion。多模態的RAG與評估 ](https://hackmd.io/@YungHuiHsu/B1LJcOlfA) - [Llamaindex 與 Langchain的多模態檢索實作](https://hackmd.io/@YungHuiHsu/S1f8gEBD0) ##### AI Agents - [AI Agents in LangGraph](https://hackmd.io/@YungHuiHsu/BJTKpkEHC) ##### Framework - [LlamaIndex - Data Storage Architecture Design and Base Components<br>LlamaIndex - 資料儲存的架構設計與基礎元件](https://hackmd.io/@YungHuiHsu/HJ6iqpmPC) - [Llamaindex 與 Langchain的多模態檢索實作](https://hackmd.io/@YungHuiHsu/S1f8gEBD0) --- # Llamaindex 與 Langchain的多模態檢索實作 ![image](https://hackmd.io/_uploads/ry5DTPcvA.png =800x) 詳細說明如下 ### Llamaindex `class MultiModalVectorStoreIndex` - [llama_index/llama-index-core/llama_index/core/indices/multi_modal /base.py](https://github.com/run-llama/llama_index/blob/4cfd28ef84bffc90779638c1e29e42814822dab0/llama-index-core/llama_index/core/indices/multi_modal/base.py#L43) ::: info - 透過設置`is_image_to_text=True`,將儲存在`ImageDocument`內的文字轉為embedding(使用文字的embed model)儲存 - 例如,影像的摘要,可以儲存在`ImageDocument`內的`ImageNodes`內 - 檢索時可選擇檢索ImageVector內的text或 image embedding,返回`ImageDocuments`內的原始圖像 ::: ```mermaid classDiagram class VectorStoreIndex { +nodes: Optional[Sequence[BaseNode]] +index_struct: Optional[IndexDict] +embed_model: Optional[BaseEmbedding] +storage_context: Optional[StorageContext] +use_async: bool +store_nodes_override: bool +show_progress: bool } class MultiModalVectorStoreIndex { +image_namespace: str +index_struct_cls: MultiModelIndexDict +nodes: Optional[Sequence[BaseNode]] +index_struct: Optional[MultiModelIndexDict] +embed_model: Optional[BaseEmbedding] +storage_context: Optional[StorageContext] +use_async: bool +store_nodes_override: bool +show_progress: bool +image_vector_store: Optional[VectorStore] +image_embed_model: EmbedType +is_image_to_text: bool +is_image_vector_store_empty: bool +is_text_vector_store_empty: bool } VectorStoreIndex <|-- MultiModalVectorStoreIndex MultiModalVectorStoreIndex *-- MultiModelIndexDict MultiModalVectorStoreIndex *-- ImageNode MultiModalVectorStoreIndex *-- EmbedType MultiModalVectorStoreIndex *-- VectorStore MultiModalVectorStoreIndex *-- StorageContext ``` ### Langchain MultiVector Retriever - [MultiVector Retriever](https://python.langchain.com/v0.1/docs/modules/data_connection/retrievers/multi_vector/?ref=blog.langchain.dev#summary) - [Multi-modal eval: GPT-4 w/ multi-modal embeddings and multi-vector retriever](https://langchain-ai.github.io/langchain-benchmarks/notebooks/retrieval/multi_modal_benchmarking/multi_modal_eval.html) - 見Option 2: Multi-vector retriever - 使用影像摘要檢索、返回原始影像 > This approach will generate and index image summaries. See detail here. It will then retrieve the raw image to pass to GPT-4V for final synthesis. ```python= import uuid from langchain.embeddings import OpenAIEmbeddings from langchain.retrievers.multi_vector import MultiVectorRetriever from langchain.schema.document import Document from langchain.schema.output_parser import StrOutputParser from langchain.storage import InMemoryStore def create_multi_vector_retriever(vectorstore, image_summaries, images): """ Create retriever that indexes summaries, but returns raw images or texts :param vectorstore: Vectorstore to store embedded image sumamries :param image_summaries: Image summaries :param images: Base64 encoded images :return: Retriever """ # Initialize the storage layer store = InMemoryStore() id_key = "doc_id" # Create the multi-vector retriever retriever = MultiVectorRetriever( vectorstore=vectorstore, docstore=store, id_key=id_key, ) # Helper function to add documents to the vectorstore and docstore def add_documents(retriever, doc_summaries, doc_contents): doc_ids = [str(uuid.uuid4()) for _ in doc_contents] summary_docs = [ Document(page_content=s, metadata={id_key: doc_ids[i]}) for i, s in enumerate(doc_summaries) ] retriever.vectorstore.add_documents(summary_docs) retriever.docstore.mset(list(zip(doc_ids, doc_contents))) add_documents(retriever, image_summaries, images) return retriever # The vectorstore to use to index the summaries vectorstore_mvr = Chroma( collection_name="multi-modal-rag-mv", embedding_function=OpenAIEmbeddings() ) # Create retriever retriever_multi_vector_img = create_multi_vector_retriever( vectorstore_mvr, image_summaries, images_base_64_processed, ) ``` #### 程式邏輯與流程 1. **生成圖像摘要**: ```python image_summaries, images_base_64_processed = generate_img_summaries(images_base_64) ``` 2. **創建多向量檢索器**: ```python retriever_multi_vector_img = create_multi_vector_retriever( vectorstore_mvr, image_summaries, images_base_64_processed, ) ``` 3. **在 `create_multi_vector_retriever` 函數中**: - 初始化存儲層: ```python store = InMemoryStore() ``` - 創建多向量檢索器: ```python retriever = MultiVectorRetriever( vectorstore=vectorstore, docstore=store, id_key=id_key, ) ``` - 添加文檔到 `vectorstore` 和 `docstore`: ```python def add_documents(retriever, doc_summaries, doc_contents): doc_ids = [str(uuid.uuid4()) for _ in doc_contents] summary_docs = [ Document(page_content=s, metadata={id_key: doc_ids[i]}) for i, s in enumerate(doc_summaries) ] retriever.vectorstore.add_documents(summary_docs) retriever.docstore.mset(list(zip(doc_ids, doc_contents))) ``` - `doc_ids = [str(uuid.uuid4()) for _ in doc_contents]` 與 `uuid` - `uuid` 是一個用於生成全局唯一標識符(UUID)的模塊。UUID 是一個128位的標識符,通常用於標識信息中的唯一實體,比如數據庫中的記錄。UUID 保證了其唯一性,即使是在不同系統之間生成的 UUID 也不會衝突。 - `uuid.uuid4()` 生成一個隨機的 UUID(基於隨機數生成) 為 `doc_contents` 中的每個文檔生成一個唯一的識別符(UUID),這些識別符將用於後續的數據存儲和檢索過程中 - 立用`doc_ids`建立`vectorstore` 與 `docstore` 之間的關聯 - `vectorstore` 中存儲的是文檔的摘要(`summary_docs`),每個摘要都有一個唯一的 `doc_id` 作為其元數據的一部分。 - `docstore` 中存儲的是文檔的內容(`doc_contents`),這裡是處理過的影像,每個內容與對應的 `doc_id` 關聯。 當需要檢索文檔時,可以通過 `doc_id` 來從 `vectorstore` 中獲取摘要,並從 `docstore` 中獲取對應的完整內容(影像) 這個流程利用唯一的 `doc_ids` 來將處理後的圖像(`processed_images`)與其摘要(`image_summaries`)關聯起來,並通過 `vectorstore` 和 `docstore` 進行存儲和檢索。`doc_ids` 作為關聯鍵,保證了在檢索時可以正確地匹配摘要與其對應的完整圖像