Try   HackMD

AI / ML領域相關學習筆記入口頁面

Deeplearning.ai GenAI/LLM系列課程筆記

GenAI
RAG
AI Agents
Framework

Llamaindex 與 Langchain的多模態檢索實作

image

詳細說明如下

Llamaindex class MultiModalVectorStoreIndex

  • 透過設置is_image_to_text=True,將儲存在ImageDocument內的文字轉為embedding(使用文字的embed model)儲存
    • 例如,影像的摘要,可以儲存在ImageDocument內的ImageNodes
  • 檢索時可選擇檢索ImageVector內的text或 image embedding,返回ImageDocuments內的原始圖像
VectorStoreIndex
+nodes: Optional[Sequence[BaseNode]]
+index_struct: Optional[IndexDict]
+embed_model: Optional[BaseEmbedding]
+storage_context: Optional[StorageContext]
+use_async: bool
+store_nodes_override: bool
+show_progress: bool
MultiModalVectorStoreIndex
+image_namespace: str
+index_struct_cls: MultiModelIndexDict
+nodes: Optional[Sequence[BaseNode]]
+index_struct: Optional[MultiModelIndexDict]
+embed_model: Optional[BaseEmbedding]
+storage_context: Optional[StorageContext]
+use_async: bool
+store_nodes_override: bool
+show_progress: bool
+image_vector_store: Optional[VectorStore]
+image_embed_model: EmbedType
+is_image_to_text: bool
+is_image_vector_store_empty: bool
+is_text_vector_store_empty: bool
MultiModelIndexDict
ImageNode
EmbedType
VectorStore
StorageContext

Langchain MultiVector Retriever

- MultiVector Retriever

import uuid from langchain.embeddings import OpenAIEmbeddings from langchain.retrievers.multi_vector import MultiVectorRetriever from langchain.schema.document import Document from langchain.schema.output_parser import StrOutputParser from langchain.storage import InMemoryStore def create_multi_vector_retriever(vectorstore, image_summaries, images): """ Create retriever that indexes summaries, but returns raw images or texts :param vectorstore: Vectorstore to store embedded image sumamries :param image_summaries: Image summaries :param images: Base64 encoded images :return: Retriever """ # Initialize the storage layer store = InMemoryStore() id_key = "doc_id" # Create the multi-vector retriever retriever = MultiVectorRetriever( vectorstore=vectorstore, docstore=store, id_key=id_key, ) # Helper function to add documents to the vectorstore and docstore def add_documents(retriever, doc_summaries, doc_contents): doc_ids = [str(uuid.uuid4()) for _ in doc_contents] summary_docs = [ Document(page_content=s, metadata={id_key: doc_ids[i]}) for i, s in enumerate(doc_summaries) ] retriever.vectorstore.add_documents(summary_docs) retriever.docstore.mset(list(zip(doc_ids, doc_contents))) add_documents(retriever, image_summaries, images) return retriever # The vectorstore to use to index the summaries vectorstore_mvr = Chroma( collection_name="multi-modal-rag-mv", embedding_function=OpenAIEmbeddings() ) # Create retriever retriever_multi_vector_img = create_multi_vector_retriever( vectorstore_mvr, image_summaries, images_base_64_processed, )

程式邏輯與流程

  1. 生成圖像摘要

    ​​​​image_summaries, images_base_64_processed = generate_img_summaries(images_base_64)
    
  2. 創建多向量檢索器

    ​​​​retriever_multi_vector_img = create_multi_vector_retriever(
    ​​​​    vectorstore_mvr,
    ​​​​    image_summaries,
    ​​​​    images_base_64_processed,
    ​​​​)
    
  3. create_multi_vector_retriever 函數中

    • 初始化存儲層:

      ​​​​​​​​store = InMemoryStore()
      
    • 創建多向量檢索器:

      ​​​​​​​​retriever = MultiVectorRetriever(
      ​​​​​​​​    vectorstore=vectorstore,
      ​​​​​​​​    docstore=store,
      ​​​​​​​​    id_key=id_key,
      ​​​​​​​​)
      
    • 添加文檔到 vectorstoredocstore

      ​​​​​​​​def add_documents(retriever, doc_summaries, doc_contents):
      ​​​​​​​​    doc_ids = [str(uuid.uuid4()) for _ in doc_contents]
      ​​​​​​​​    summary_docs = [
      ​​​​​​​​        Document(page_content=s, metadata={id_key: doc_ids[i]})
      ​​​​​​​​        for i, s in enumerate(doc_summaries)
      ​​​​​​​​    ]
      ​​​​​​​​    retriever.vectorstore.add_documents(summary_docs)
      ​​​​​​​​    retriever.docstore.mset(list(zip(doc_ids, doc_contents)))
      
      • doc_ids = [str(uuid.uuid4()) for _ in doc_contents]uuid

        • uuid 是一個用於生成全局唯一標識符(UUID)的模塊。UUID 是一個128位的標識符,通常用於標識信息中的唯一實體,比如數據庫中的記錄。UUID 保證了其唯一性,即使是在不同系統之間生成的 UUID 也不會衝突。
        • uuid.uuid4() 生成一個隨機的 UUID(基於隨機數生成)

        doc_contents 中的每個文檔生成一個唯一的識別符(UUID),這些識別符將用於後續的數據存儲和檢索過程中

      • 立用doc_ids建立vectorstoredocstore 之間的關聯

        • vectorstore 中存儲的是文檔的摘要(summary_docs),每個摘要都有一個唯一的 doc_id 作為其元數據的一部分。
        • docstore 中存儲的是文檔的內容(doc_contents),這裡是處理過的影像,每個內容與對應的 doc_id 關聯。

        當需要檢索文檔時,可以通過 doc_id 來從 vectorstore 中獲取摘要,並從 docstore 中獲取對應的完整內容(影像)

    這個流程利用唯一的 doc_ids 來將處理後的圖像(processed_images)與其摘要(image_summaries)關聯起來,並通過 vectorstoredocstore 進行存儲和檢索。doc_ids 作為關聯鍵,保證了在檢索時可以正確地匹配摘要與其對應的完整圖像