Use this file to discover all available pages before exploring further.
本笔记介绍如何开始使用 Chroma 向量存储。
Chroma is a AI-native open-source vector database focused on developer productivity and happiness. Chroma is licensed under Apache 2.0. View the full docs of Chroma at this page, and find the API reference for the LangChain integration at this page.
Chroma CloudChroma Cloud powers serverless vector and full-text search. It’s extremely fast, cost-effective, scalable and painless. Create a DB and try it out in under 30 seconds with $5 of free credits.Get started with Chroma Cloud
您无需任何凭证即可使用 Chroma 向量存储,只需安装上述包即可!如果您是 Chroma Cloud 用户,请设置 CHROMA_TENANT、CHROMA_DATABASE 和 CHROMA_API_KEY 环境变量。When you install the chromadb package you also get access to the Chroma CLI, which can set these for you. First, login via the CLI, and then use the connect command:
chroma db connect [db_name] --env-file
如果您希望获得一流的模型调用自动追踪功能,还可以通过取消注释以下代码来设置 LangSmith API 密钥:
os.environ["LANGSMITH_API_KEY"] = getpass.getpass("Enter your LangSmith API key: ")os.environ["LANGSMITH_TRACING"] = "true"
If you have a Chroma server running locally, or you have deployed one yourself, you can connect to it by providing the host argument.For example, you can start a Chroma server running locally with chroma run, and then connect it with host='localhost':
from langchain_chroma import Chromavector_store = Chroma( collection_name="example_collection", embedding_function=embeddings, host="localhost",)
For other deployments you can use the port, ssl, and headers arguments to customize your connection.
from uuid import uuid4from langchain_core.documents import Documentdocument_1 = Document( page_content="I had chocolate chip pancakes and scrambled eggs for breakfast this morning.", metadata={"source": "tweet"}, id=1,)document_2 = Document( page_content="The weather forecast for tomorrow is cloudy and overcast, with a high of 62 degrees.", metadata={"source": "news"}, id=2,)document_3 = Document( page_content="Building an exciting new project with LangChain - come check it out!", metadata={"source": "tweet"}, id=3,)document_4 = Document( page_content="Robbers broke into the city bank and stole $1 million in cash.", metadata={"source": "news"}, id=4,)document_5 = Document( page_content="Wow! That was an amazing movie. I can't wait to see it again.", metadata={"source": "tweet"}, id=5,)document_6 = Document( page_content="Is the new iPhone worth the price? Read this review to find out.", metadata={"source": "website"}, id=6,)document_7 = Document( page_content="The top 10 soccer players in the world right now.", metadata={"source": "website"}, id=7,)document_8 = Document( page_content="LangGraph is the best framework for building stateful, agentic applications!", metadata={"source": "tweet"}, id=8,)document_9 = Document( page_content="The stock market is down 500 points today due to fears of a recession.", metadata={"source": "news"}, id=9,)document_10 = Document( page_content="I have a bad feeling I am going to get deleted :(", metadata={"source": "tweet"}, id=10,)documents = [ document_1, document_2, document_3, document_4, document_5, document_6, document_7, document_8, document_9, document_10,]uuids = [str(uuid4()) for _ in range(len(documents))]vector_store.add_documents(documents=documents, ids=uuids)
Now that we have added documents to our vector store, we can update existing documents by using the update_documents function.
updated_document_1 = Document( page_content="I had chocolate chip pancakes and fried eggs for breakfast this morning.", metadata={"source": "tweet"}, id=1,)updated_document_2 = Document( page_content="The weather forecast for tomorrow is sunny and warm, with a high of 82 degrees.", metadata={"source": "news"}, id=2,)vector_store.update_document(document_id=uuids[0], document=updated_document_1)# You can also update multiple documents at oncevector_store.update_documents( ids=uuids[:2], documents=[updated_document_1, updated_document_2])
results = vector_store.similarity_search( "LangChain provides abstractions to make working with LLMs easy", k=2, filter={"source": "tweet"},)for res in results: print(f"* {res.page_content} [{res.metadata}]")
results = vector_store.similarity_search_with_score( "Will it be hot tomorrow?", k=1, filter={"source": "news"})for res, score in results: print(f"* [SIM={score:3f}] {res.page_content} [{res.metadata}]")
results = vector_store.similarity_search_by_vector( embedding=embeddings.embed_query("I love green eggs and ham!"), k=1)for doc in results: print(f"* {doc.page_content} [{doc.metadata}]")
There are a variety of other search methods that are not covered in this notebook, such as MMR search. For a full list of the search abilities available for Chroma check out the API reference.
您还可以将向量存储转换为检索器,以便在链中更方便地使用。 For more information on the different search types and kwargs you can pass, please visit the Chroma API reference.
retriever = vector_store.as_retriever( search_type="mmr", search_kwargs={"k": 1, "fetch_k": 5})retriever.invoke("Stealing from the bank is a crime", filter={"source": "news"})