# PG VectorStore 使用說明及問題
### 使用說明
1. 安裝依賴項目
由於 langflow 目前並沒有用到 [pgvector-python](https://github.com/pgvector/pgvector-python)
但是 [LangChain PGVector](https://python.langchain.com/docs/integrations/vectorstores/pgvector) 需要用到
因此要製作 PG VectorStore Client 給 LangFlow 用就必須先安裝相關套件
```bash
poetry add pgvector
```
如果沒用過 Poetry 的話可以參考這個網站: https://blog.kyomind.tw/python-poetry/
2. 啟動專案
langflow 都把啟動專案的指令包在 makefile 了
因此只要 `make backend` 就可以成功 build 好後端囉
```bash
make backend
make frontend # 前端也要記得 build
```
3. 放程式碼進去 Custom Component node,不過等等會有問題需要解決
```python
from typing import Optional
from langflow import CustomComponent
from langchain.vectorstores.pgvector import PGVector
from langchain.schema import Document
from langchain.vectorstores.base import VectorStore
from langchain.embeddings.base import Embeddings
class PostgresqlVectorComponent(CustomComponent):
"""
A custom component for implementing a Vector Store using PostgreSQL.
"""
display_name: str = "PGVector"
description: str = "Implementation of Vector Store using PostgreSQL"
documentation = "https://python.langchain.com/docs/integrations/vectorstores/pgvector"
beta = True
def build_config(self):
"""
Builds the configuration for the component.
Returns:
- dict: A dictionary containing the configuration options for the component.
"""
return {
"index_name": {"display_name": "Index Name", "value": "your_index"},
"code": {"show": True, "display_name": "Code"},
"documents": {"display_name": "Documents", "is_list": True},
"embedding": {"display_name": "Embedding"},
"pg_server_url": {
"display_name": "PostgreSQL Server Connection String",
"advanced": False,
},
"collection_name": {"display_name": "Table", "advanced": False},
}
def build(
self,
embedding: Embeddings,
pg_server_url: str,
collection_name: str,
documents: Optional[Document] = None,
) -> VectorStore:
"""
Builds the Vector Store or BaseRetriever object.
Args:
- embedding (Embeddings): The embeddings to use for the Vector Store.
- documents (Optional[Document]): The documents to use for the Vector Store.
- collection_name (str): The name of the PG table.
- pg_server_url (str): The URL for the PG server.
Returns:
- VectorStore: The Vector Store object.
"""
return PGVector.from_documents(
embedding=embedding,
documents=documents,
collection_name=collection_name,
connection_string=pg_server_url,
)
```
### 問題點:
`from langchain.vectorstores.pgvector import PGVector`
LangChain 這邊的 **PGVector** 出了點問題
如果你進去 [LangChain 的 source code 看 PGVector](https://github.com/langchain-ai/langchain/blob/3bf39ca635ffb95e6fca0244b4a30f6a2e036bed/libs/langchain/langchain/vectorstores/pgvector.py#L130) 是怎麼寫的:
會發現有一行程式碼被註解了 - - -> `self.create_vector_extension()`
```python
def __post_init__(
self,
) -> None:
"""
Initialize the store.
"""
self._conn = self.connect()
# self.create_vector_extension()
from langchain.vectorstores._pgvector_data_models import (
CollectionStore,
EmbeddingStore,
)
self.CollectionStore = CollectionStore
self.EmbeddingStore = EmbeddingStore
self.create_tables_if_not_exists()
self.create_collection()
```
`create_vector_extension` 會用來初始化 pg 用的插件
如果被註解掉就會發現 ORM 認不得下面的 `VECTOR`
以下為預期執行結果,由 LangFlow 出現的報錯訊息
```bash
ValueError: Error building node pgvector(ID:pgvector-15vEN):
(psycopg2.errors.UndefinedObject) type "vector" does not exist
LINE 5: embedding VECTOR,
^
[SQL:
CREATE TABLE langchain_pg_embedding (
uuid UUID NOT NULL,
collection_id UUID,
embedding VECTOR,
document VARCHAR,
cmetadata JSON,
custom_id VARCHAR,
PRIMARY KEY (uuid),
FOREIGN KEY(collection_id) REFERENCES langchain_pg_collection (uuid) ON
DELETE CASCADE
)
```
### 觀察
1. LangChain 現在的 [master branch](https://github.com/langchain-ai/langchain/blob/cf271784fade8dc19879cbdfa172df84e65d1e7c/libs/langchain/langchain/vectorstores/pgvector.py#L139C39-L139C39) 已經把這個笨註解拿掉了
2. LangFlow 0.5.7 用的 LangFlow 版本並沒有用到已修復的版本
| LangFlow 版本| LangChain 版本 | 已修復 |
| -------- | -------- | -------- |
| (尚未推出) | v0.0.322 | V |
| (尚未推出) | v0.0.321 | X |
| 0.5.7 | ^0.0.320 | X |
### 解法
1. 自己進去 source code 改(暫時性解法)
2. 等 LangFlow 升 LangChain 版本