ClickHouse 集成 - Docs by LangChain

ClickHouse is an open-source database for real-time apps and analytics with full SQL support. ClickHouse supports exact vector search (for example, using distance functions like L2Distance) and approximate vector search using vector similarity indexes (available in ClickHouse 25.8+). For details, see Exact and Approximate Vector Search.

This page shows how to use functionality related to the ClickHouse vector store.

设置

First set up a local clickhouse server with docker:

! docker run -d -p 8123:8123 -p 9000:9000 --name langchain-clickhouse-server --ulimit nofile=262144:262144 -e CLICKHOUSE_SKIP_USER_SETUP=1 clickhouse/clickhouse-server:26.2

You’ll need to install langchain-community and clickhouse-connect to use this integration

pip install -qU langchain-community clickhouse-connect

凭证

There are no credentials for this notebook, just make sure you have installed the packages as shown above. 如果您希望获得一流的模型调用自动追踪功能，还可以通过取消注释以下代码来设置 LangSmith API 密钥：

os.environ["LANGSMITH_API_KEY"] = getpass.getpass("Enter your LangSmith API key: ")
os.environ["LANGSMITH_TRACING"] = "true"

实例化

# | output: false
# | echo: false
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-large")

from langchain_community.vectorstores import Clickhouse, ClickhouseSettings

settings = ClickhouseSettings(table="clickhouse_example")
vector_store = Clickhouse(embeddings, config=settings)

管理向量存储

创建向量存储后，我们可以通过添加和删除不同的项目来与其交互。

向向量存储添加项目

我们可以使用 add_documents 函数向向量存储添加项目。

from uuid import uuid4

from langchain_core.documents import Document

document_1 = Document(
    page_content="I had chocolate chip pancakes and scrambled eggs for breakfast this morning.",
    metadata={"source": "tweet"},
)

document_2 = Document(
    page_content="The weather forecast for tomorrow is cloudy and overcast, with a high of 62 degrees.",
    metadata={"source": "news"},
)

document_3 = Document(
    page_content="Building an exciting new project with LangChain - come check it out!",
    metadata={"source": "tweet"},
)

document_4 = Document(
    page_content="Robbers broke into the city bank and stole $1 million in cash.",
    metadata={"source": "news"},
)

document_5 = Document(
    page_content="Wow! That was an amazing movie. I can't wait to see it again.",
    metadata={"source": "tweet"},
)

document_6 = Document(
    page_content="Is the new iPhone worth the price? Read this review to find out.",
    metadata={"source": "website"},
)

document_7 = Document(
    page_content="The top 10 soccer players in the world right now.",
    metadata={"source": "website"},
)

document_8 = Document(
    page_content="LangGraph is the best framework for building stateful, agentic applications!",
    metadata={"source": "tweet"},
)

document_9 = Document(
    page_content="The stock market is down 500 points today due to fears of a recession.",
    metadata={"source": "news"},
)

document_10 = Document(
    page_content="I have a bad feeling I am going to get deleted :(",
    metadata={"source": "tweet"},
)

documents = [
    document_1,
    document_2,
    document_3,
    document_4,
    document_5,
    document_6,
    document_7,
    document_8,
    document_9,
    document_10,
]
uuids = [str(uuid4()) for _ in range(len(documents))]

vector_store.add_documents(documents=documents, ids=uuids)

从向量存储删除项目

We can delete items from our vector store by ID by using the delete function.

vector_store.delete(ids=uuids[-1])

查询向量存储

一旦创建了向量存储并添加了相关文档，您很可能希望在链或智能体运行期间对其进行查询。

直接查询

相似度搜索

可以按以下方式执行简单的相似度搜索：

results = vector_store.similarity_search(
    "LangChain provides abstractions to make working with LLMs easy", k=2
)
for doc in results:
    print(f"* {doc.page_content} [{doc.metadata}]")

带分数的相似度搜索

您也可以进行带分数的搜索：

results = vector_store.similarity_search_with_score("Will it be hot tomorrow?", k=1)
for res, score in results:
    print(f"* [SIM={score:3f}] {res.page_content} [{res.metadata}]")

过滤

You can have direct access to ClickHouse SQL where statement. You can write WHERE clause following standard SQL. NOTE: Please be aware of SQL injection, this interface must not be directly called by end-user. If you customized your column_map in your settings, you can search with a filter like this:

meta = vector_store.metadata_column
results = vector_store.similarity_search_with_relevance_scores(
    "What did I eat for breakfast?",
    k=4,
    where_str=f"{meta}.source = 'tweet'",
)
for res in results:
    print(f"* {res.page_content} [{res.metadata}]")

其他搜索方法

There are a variety of other search methods that are not covered in this notebook, such as MMR search or searching by vector.

转换为检索器进行查询

您还可以将向量存储转换为检索器，以便在链中更方便地使用。 Here is how to transform your vector store into a retriever and then invoke the retriever with a simple query and filter.

retriever = vector_store.as_retriever(
    search_type="similarity_score_threshold",
    search_kwargs={"k": 1, "score_threshold": 0.5, "where_str": "metadata.source = 'news'"},
)
retriever.invoke("Stealing from the bank is a crime")

用于检索增强生成

有关如何将此向量存储用于检索增强生成 (RAG) 的指南，请参阅以下部分：

For more, check out the complete RAG template using Astra DB.

连接这些文档到 Claude、VSCode 等工具，通过 MCP 获取实时答案。

在 GitHub 上编辑此页面或提交 issue。

Documentation Index

​设置

​凭证

​实例化

​管理向量存储

​向向量存储添加项目

​从向量存储删除项目

​查询向量存储

​直接查询

​相似度搜索

​带分数的相似度搜索

​过滤

​其他搜索方法

​转换为检索器进行查询

​用于检索增强生成

设置

凭证

实例化

管理向量存储

向向量存储添加项目

从向量存储删除项目

查询向量存储

直接查询

相似度搜索

带分数的相似度搜索

过滤

其他搜索方法

转换为检索器进行查询

用于检索增强生成