PGVector 集成 - Docs by LangChain

An implementation of LangChain vectorstore abstraction using postgres as the backend and utilizing the pgvector extension.

The code lives in an integration package called: langchain-postgres.

Status

This code has been ported over from langchain-community into a dedicated package called langchain-postgres. The following changes have been made:

langchain-postgres works only with psycopg3. Please update your connnecion strings from postgresql+psycopg2://... to postgresql+psycopg://langchain:langchain@... (yes, it’s the driver name is psycopg not psycopg3, but it’ll use psycopg3.
The schema of the embedding store and collection have been changed to make add_documents work correctly with user specified ids.
One has to pass an explicit connection object now.

Currently, there is no mechanism that supports easy data migration on schema changes. Any schema changes in the vectorstore will require the user to recreate the tables and re-add the documents. If this is a concern, please use a different vectorstore. If not, this implementation should be fine for your use case.

设置

First download the partner package:

pip install -qU langchain-postgres

You can run the following command to spin up a postgres container with the pgvector extension:

%docker run --name pgvector-container -e POSTGRES_USER=langchain -e POSTGRES_PASSWORD=langchain -e POSTGRES_DB=langchain -p 6024:5432 -d pgvector/pgvector:pg16

凭证

There are no credentials needed to run this notebook, just make sure you downloaded the langchain-postgres package and correctly started the postgres container. 如果您希望获得一流的模型调用自动追踪功能，还可以通过取消注释以下代码来设置 LangSmith API 密钥：

os.environ["LANGSMITH_API_KEY"] = getpass.getpass("Enter your LangSmith API key: ")
os.environ["LANGSMITH_TRACING"] = "true"

实例化

# | output: false
# | echo: false
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-large")

from langchain_postgres import PGVector

# See docker command above to launch a postgres instance with pgvector enabled.
connection = "postgresql+psycopg://langchain:langchain@localhost:6024/langchain"  # Uses psycopg3!
collection_name = "my_docs"

vector_store = PGVector(
    embeddings=embeddings,
    collection_name=collection_name,
    connection=connection,
    use_jsonb=True,
)

管理向量存储

向向量存储添加项目

Note that adding documents by ID will over-write any existing documents that match that ID.

from langchain_core.documents import Document

docs = [
    Document(
        page_content="there are cats in the pond",
        metadata={"id": 1, "location": "pond", "topic": "animals"},
    ),
    Document(
        page_content="ducks are also found in the pond",
        metadata={"id": 2, "location": "pond", "topic": "animals"},
    ),
    Document(
        page_content="fresh apples are available at the market",
        metadata={"id": 3, "location": "market", "topic": "food"},
    ),
    Document(
        page_content="the market also sells fresh oranges",
        metadata={"id": 4, "location": "market", "topic": "food"},
    ),
    Document(
        page_content="the new art exhibit is fascinating",
        metadata={"id": 5, "location": "museum", "topic": "art"},
    ),
    Document(
        page_content="a sculpture exhibit is also at the museum",
        metadata={"id": 6, "location": "museum", "topic": "art"},
    ),
    Document(
        page_content="a new coffee shop opened on Main Street",
        metadata={"id": 7, "location": "Main Street", "topic": "food"},
    ),
    Document(
        page_content="the book club meets at the library",
        metadata={"id": 8, "location": "library", "topic": "reading"},
    ),
    Document(
        page_content="the library hosts a weekly story time for kids",
        metadata={"id": 9, "location": "library", "topic": "reading"},
    ),
    Document(
        page_content="a cooking class for beginners is offered at the community center",
        metadata={"id": 10, "location": "community center", "topic": "classes"},
    ),
]

vector_store.add_documents(docs, ids=[doc.metadata["id"] for doc in docs])

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

从向量存储删除项目

vector_store.delete(ids=["3"])

查询向量存储

一旦创建了向量存储并添加了相关文档，您很可能希望在链或智能体运行期间对其进行查询。

Filtering support

The vectorstore supports a set of filters that can be applied against the metadata fields of the documents.

Operator	Meaning/Category
$eq	Equality (==)
$ne	Inequality (!=)
$lt	Less than (<)
$lte	Less than or equal (<=)
$gt	Greater than (>)
$gte	Greater than or equal (>=)
$in	Special Cased (in)
$nin	Special Cased (not in)
$between	Special Cased (between)
$like	Text (like)
$ilike	Text (case-insensitive like)
$and	Logical (and)
$or	Logical (or)

直接查询

可以按以下方式执行简单的相似度搜索：

results = vector_store.similarity_search(
    "kitty", k=10, filter={"id": {"$in": [1, 5, 2, 9]}}
)
for doc in results:
    print(f"* {doc.page_content} [{doc.metadata}]")

* there are cats in the pond [{'id': 1, 'topic': 'animals', 'location': 'pond'}]
* the library hosts a weekly story time for kids [{'id': 9, 'topic': 'reading', 'location': 'library'}]
* ducks are also found in the pond [{'id': 2, 'topic': 'animals', 'location': 'pond'}]
* the new art exhibit is fascinating [{'id': 5, 'topic': 'art', 'location': 'museum'}]

If you provide a dict with multiple fields, but no operators, the top level will be interpreted as a logical AND filter

vector_store.similarity_search(
    "ducks",
    k=10,
    filter={"id": {"$in": [1, 5, 2, 9]}, "location": {"$in": ["pond", "market"]}},
)

[Document(metadata={'id': 1, 'topic': 'animals', 'location': 'pond'}, page_content='there are cats in the pond'),
 Document(metadata={'id': 2, 'topic': 'animals', 'location': 'pond'}, page_content='ducks are also found in the pond')]

vector_store.similarity_search(
    "ducks",
    k=10,
    filter={
        "$and": [
            {"id": {"$in": [1, 5, 2, 9]}},
            {"location": {"$in": ["pond", "market"]}},
        ]
    },
)

[Document(metadata={'id': 1, 'topic': 'animals', 'location': 'pond'}, page_content='there are cats in the pond'),
 Document(metadata={'id': 2, 'topic': 'animals', 'location': 'pond'}, page_content='ducks are also found in the pond')]

如果您想执行相似度搜索并获取对应分数，可以运行：

results = vector_store.similarity_search_with_score(query="cats", k=1)
for doc, score in results:
    print(f"* [SIM={score:3f}] {doc.page_content} [{doc.metadata}]")

* [SIM=0.763449] there are cats in the pond [{'id': 1, 'topic': 'animals', 'location': 'pond'}]

For a full list of the different searches you can execute on a PGVector vector store, please refer to the API reference.

转换为检索器进行查询

您还可以将向量存储转换为检索器，以便在链中更方便地使用。

retriever = vector_store.as_retriever(search_type="mmr", search_kwargs={"k": 1})
retriever.invoke("kitty")

[Document(metadata={'id': 1, 'topic': 'animals', 'location': 'pond'}, page_content='there are cats in the pond')]

用于检索增强生成

有关如何将此向量存储用于检索增强生成 (RAG) 的指南，请参阅以下部分：

API 参考

For detailed documentation of all PGVector VectorStore features and configurations head to the API reference

连接这些文档到 Claude、VSCode 等工具，通过 MCP 获取实时答案。

在 GitHub 上编辑此页面或提交 issue。

Documentation Index

​Status

​设置

​凭证

​实例化

​管理向量存储

​向向量存储添加项目

​从向量存储删除项目

​查询向量存储

​Filtering support

​直接查询

​转换为检索器进行查询

​用于检索增强生成

​API 参考

Status

设置

凭证

实例化

管理向量存储

向向量存储添加项目

从向量存储删除项目

查询向量存储

Filtering support

直接查询

转换为检索器进行查询

用于检索增强生成

API 参考