詳細說明如下
class MultiModalVectorStoreIndex
is_image_to_text=True
,將儲存在ImageDocument
內的文字轉為embedding(使用文字的embed model)儲存
ImageDocument
內的ImageNodes
內ImageDocuments
內的原始圖像Multi-modal eval: GPT-4 w/ multi-modal embeddings and multi-vector retriever
This approach will generate and index image summaries. See detail here.
It will then retrieve the raw image to pass to GPT-4V for final synthesis.
import uuid
from langchain.embeddings import OpenAIEmbeddings
from langchain.retrievers.multi_vector import MultiVectorRetriever
from langchain.schema.document import Document
from langchain.schema.output_parser import StrOutputParser
from langchain.storage import InMemoryStore
def create_multi_vector_retriever(vectorstore, image_summaries, images):
"""
Create retriever that indexes summaries, but returns raw images or texts
:param vectorstore: Vectorstore to store embedded image sumamries
:param image_summaries: Image summaries
:param images: Base64 encoded images
:return: Retriever
"""
# Initialize the storage layer
store = InMemoryStore()
id_key = "doc_id"
# Create the multi-vector retriever
retriever = MultiVectorRetriever(
vectorstore=vectorstore,
docstore=store,
id_key=id_key,
)
# Helper function to add documents to the vectorstore and docstore
def add_documents(retriever, doc_summaries, doc_contents):
doc_ids = [str(uuid.uuid4()) for _ in doc_contents]
summary_docs = [
Document(page_content=s, metadata={id_key: doc_ids[i]})
for i, s in enumerate(doc_summaries)
]
retriever.vectorstore.add_documents(summary_docs)
retriever.docstore.mset(list(zip(doc_ids, doc_contents)))
add_documents(retriever, image_summaries, images)
return retriever
# The vectorstore to use to index the summaries
vectorstore_mvr = Chroma(
collection_name="multi-modal-rag-mv", embedding_function=OpenAIEmbeddings()
)
# Create retriever
retriever_multi_vector_img = create_multi_vector_retriever(
vectorstore_mvr,
image_summaries,
images_base_64_processed,
)
生成圖像摘要:
image_summaries, images_base_64_processed = generate_img_summaries(images_base_64)
創建多向量檢索器:
retriever_multi_vector_img = create_multi_vector_retriever(
vectorstore_mvr,
image_summaries,
images_base_64_processed,
)
在 create_multi_vector_retriever
函數中:
初始化存儲層:
store = InMemoryStore()
創建多向量檢索器:
retriever = MultiVectorRetriever(
vectorstore=vectorstore,
docstore=store,
id_key=id_key,
)
添加文檔到 vectorstore
和 docstore
:
def add_documents(retriever, doc_summaries, doc_contents):
doc_ids = [str(uuid.uuid4()) for _ in doc_contents]
summary_docs = [
Document(page_content=s, metadata={id_key: doc_ids[i]})
for i, s in enumerate(doc_summaries)
]
retriever.vectorstore.add_documents(summary_docs)
retriever.docstore.mset(list(zip(doc_ids, doc_contents)))
doc_ids = [str(uuid.uuid4()) for _ in doc_contents]
與 uuid
uuid
是一個用於生成全局唯一標識符(UUID)的模塊。UUID 是一個128位的標識符,通常用於標識信息中的唯一實體,比如數據庫中的記錄。UUID 保證了其唯一性,即使是在不同系統之間生成的 UUID 也不會衝突。uuid.uuid4()
生成一個隨機的 UUID(基於隨機數生成)為 doc_contents
中的每個文檔生成一個唯一的識別符(UUID),這些識別符將用於後續的數據存儲和檢索過程中
立用doc_ids
建立vectorstore
與 docstore
之間的關聯
vectorstore
中存儲的是文檔的摘要(summary_docs
),每個摘要都有一個唯一的 doc_id
作為其元數據的一部分。docstore
中存儲的是文檔的內容(doc_contents
),這裡是處理過的影像,每個內容與對應的 doc_id
關聯。當需要檢索文檔時,可以通過 doc_id
來從 vectorstore
中獲取摘要,並從 docstore
中獲取對應的完整內容(影像)
這個流程利用唯一的 doc_ids
來將處理後的圖像(processed_images
)與其摘要(image_summaries
)關聯起來,並通過 vectorstore
和 docstore
進行存儲和檢索。doc_ids
作為關聯鍵,保證了在檢索時可以正確地匹配摘要與其對應的完整圖像