Use this file to discover all available pages before exploring further.
ClickHouse is an open-source database for real-time apps and analytics with full SQL support. ClickHouse supports exact vector search (for example, using distance functions like L2Distance) and approximate vector search using vector similarity indexes (available in ClickHouse 25.8+). For details, see Exact and Approximate Vector Search.
This page shows how to use functionality related to the ClickHouse vector store.
There are no credentials for this notebook, just make sure you have installed the packages as shown above.如果您希望获得一流的模型调用自动追踪功能,还可以通过取消注释以下代码来设置 LangSmith API 密钥:
os.environ["LANGSMITH_API_KEY"] = getpass.getpass("Enter your LangSmith API key: ")os.environ["LANGSMITH_TRACING"] = "true"
from uuid import uuid4from langchain_core.documents import Documentdocument_1 = Document( page_content="I had chocolate chip pancakes and scrambled eggs for breakfast this morning.", metadata={"source": "tweet"},)document_2 = Document( page_content="The weather forecast for tomorrow is cloudy and overcast, with a high of 62 degrees.", metadata={"source": "news"},)document_3 = Document( page_content="Building an exciting new project with LangChain - come check it out!", metadata={"source": "tweet"},)document_4 = Document( page_content="Robbers broke into the city bank and stole $1 million in cash.", metadata={"source": "news"},)document_5 = Document( page_content="Wow! That was an amazing movie. I can't wait to see it again.", metadata={"source": "tweet"},)document_6 = Document( page_content="Is the new iPhone worth the price? Read this review to find out.", metadata={"source": "website"},)document_7 = Document( page_content="The top 10 soccer players in the world right now.", metadata={"source": "website"},)document_8 = Document( page_content="LangGraph is the best framework for building stateful, agentic applications!", metadata={"source": "tweet"},)document_9 = Document( page_content="The stock market is down 500 points today due to fears of a recession.", metadata={"source": "news"},)document_10 = Document( page_content="I have a bad feeling I am going to get deleted :(", metadata={"source": "tweet"},)documents = [ document_1, document_2, document_3, document_4, document_5, document_6, document_7, document_8, document_9, document_10,]uuids = [str(uuid4()) for _ in range(len(documents))]vector_store.add_documents(documents=documents, ids=uuids)
results = vector_store.similarity_search( "LangChain provides abstractions to make working with LLMs easy", k=2)for doc in results: print(f"* {doc.page_content} [{doc.metadata}]")
results = vector_store.similarity_search_with_score("Will it be hot tomorrow?", k=1)for res, score in results: print(f"* [SIM={score:3f}] {res.page_content} [{res.metadata}]")
You can have direct access to ClickHouse SQL where statement. You can write WHERE clause following standard SQL.NOTE: Please be aware of SQL injection, this interface must not be directly called by end-user.If you customized your column_map in your settings, you can search with a filter like this:
meta = vector_store.metadata_columnresults = vector_store.similarity_search_with_relevance_scores( "What did I eat for breakfast?", k=4, where_str=f"{meta}.source = 'tweet'",)for res in results: print(f"* {res.page_content} [{res.metadata}]")
您还可以将向量存储转换为检索器,以便在链中更方便地使用。Here is how to transform your vector store into a retriever and then invoke the retriever with a simple query and filter.
retriever = vector_store.as_retriever( search_type="similarity_score_threshold", search_kwargs={"k": 1, "score_threshold": 0.5, "where_str": "metadata.source = 'news'"},)retriever.invoke("Stealing from the bank is a crime")