PG VectorStore 使用說明及問題

# PG VectorStore 使用說明及問題 ### 使用說明 1. 安裝依賴項目由於 langflow 目前並沒有用到 [pgvector-python](https://github.com/pgvector/pgvector-python) 但是 [LangChain PGVector](https://python.langchain.com/docs/integrations/vectorstores/pgvector) 需要用到因此要製作 PG VectorStore Client 給 LangFlow 用就必須先安裝相關套件 ```bash poetry add pgvector ``` 如果沒用過 Poetry 的話可以參考這個網站: https://blog.kyomind.tw/python-poetry/ 2. 啟動專案 langflow 都把啟動專案的指令包在 makefile 了因此只要 `make backend` 就可以成功 build 好後端囉 ```bash make backend make frontend # 前端也要記得 build ``` 3. 放程式碼進去 Custom Component node，不過等等會有問題需要解決 ```python from typing import Optional from langflow import CustomComponent from langchain.vectorstores.pgvector import PGVector from langchain.schema import Document from langchain.vectorstores.base import VectorStore from langchain.embeddings.base import Embeddings class PostgresqlVectorComponent(CustomComponent): """ A custom component for implementing a Vector Store using PostgreSQL. """ display_name: str = "PGVector" description: str = "Implementation of Vector Store using PostgreSQL" documentation = "https://python.langchain.com/docs/integrations/vectorstores/pgvector" beta = True def build_config(self): """ Builds the configuration for the component. Returns: - dict: A dictionary containing the configuration options for the component. """ return { "index_name": {"display_name": "Index Name", "value": "your_index"}, "code": {"show": True, "display_name": "Code"}, "documents": {"display_name": "Documents", "is_list": True}, "embedding": {"display_name": "Embedding"}, "pg_server_url": { "display_name": "PostgreSQL Server Connection String", "advanced": False, }, "collection_name": {"display_name": "Table", "advanced": False}, } def build( self, embedding: Embeddings, pg_server_url: str, collection_name: str, documents: Optional[Document] = None, ) -> VectorStore: """ Builds the Vector Store or BaseRetriever object. Args: - embedding (Embeddings): The embeddings to use for the Vector Store. - documents (Optional[Document]): The documents to use for the Vector Store. - collection_name (str): The name of the PG table. - pg_server_url (str): The URL for the PG server. Returns: - VectorStore: The Vector Store object. """ return PGVector.from_documents( embedding=embedding, documents=documents, collection_name=collection_name, connection_string=pg_server_url, ) ``` ### 問題點： `from langchain.vectorstores.pgvector import PGVector` LangChain 這邊的 **PGVector** 出了點問題如果你進去 [LangChain 的 source code 看 PGVector](https://github.com/langchain-ai/langchain/blob/3bf39ca635ffb95e6fca0244b4a30f6a2e036bed/libs/langchain/langchain/vectorstores/pgvector.py#L130) 是怎麼寫的：會發現有一行程式碼被註解了 - - -> `self.create_vector_extension()` ```python def __post_init__( self, ) -> None: """ Initialize the store. """ self._conn = self.connect() # self.create_vector_extension() from langchain.vectorstores._pgvector_data_models import ( CollectionStore, EmbeddingStore, ) self.CollectionStore = CollectionStore self.EmbeddingStore = EmbeddingStore self.create_tables_if_not_exists() self.create_collection() ``` `create_vector_extension` 會用來初始化 pg 用的插件如果被註解掉就會發現 ORM 認不得下面的 `VECTOR` 以下為預期執行結果，由 LangFlow 出現的報錯訊息 ```bash ValueError: Error building node pgvector(ID:pgvector-15vEN): (psycopg2.errors.UndefinedObject) type "vector" does not exist LINE 5: embedding VECTOR, ^ [SQL: CREATE TABLE langchain_pg_embedding ( uuid UUID NOT NULL, collection_id UUID, embedding VECTOR, document VARCHAR, cmetadata JSON, custom_id VARCHAR, PRIMARY KEY (uuid), FOREIGN KEY(collection_id) REFERENCES langchain_pg_collection (uuid) ON DELETE CASCADE ) ``` ### 觀察 1. LangChain 現在的 [master branch](https://github.com/langchain-ai/langchain/blob/cf271784fade8dc19879cbdfa172df84e65d1e7c/libs/langchain/langchain/vectorstores/pgvector.py#L139C39-L139C39) 已經把這個笨註解拿掉了 2. LangFlow 0.5.7 用的 LangFlow 版本並沒有用到已修復的版本 | LangFlow 版本| LangChain 版本 | 已修復 | | -------- | -------- | -------- | | (尚未推出) | v0.0.322 | V | | (尚未推出) | v0.0.321 | X | | 0.5.7 | ^0.0.320 | X | ### 解法 1. 自己進去 source code 改（暫時性解法） 2. 等 LangFlow 升 LangChain 版本