Chroma 集成 - Docs by LangChain

本笔记介绍如何开始使用 Chroma 向量存储。

Chroma is a AI-native open-source vector database focused on developer productivity and happiness. Chroma is licensed under Apache 2.0. View the full docs of Chroma at this page, and find the API reference for the LangChain integration at this page.

Chroma CloudChroma Cloud powers serverless vector and full-text search. It’s extremely fast, cost-effective, scalable and painless. Create a DB and try it out in under 30 seconds with $5 of free credits.Get started with Chroma Cloud

设置

要访问 Chroma 向量存储，您需要安装 langchain-chroma 集成包。

pip install -qU "langchain-chroma>=0.1.2"

凭证

您无需任何凭证即可使用 Chroma 向量存储，只需安装上述包即可！如果您是 Chroma Cloud 用户，请设置 CHROMA_TENANT、CHROMA_DATABASE 和 CHROMA_API_KEY 环境变量。 When you install the chromadb package you also get access to the Chroma CLI, which can set these for you. First, login via the CLI, and then use the connect command:

chroma db connect [db_name] --env-file

如果您希望获得一流的模型调用自动追踪功能，还可以通过取消注释以下代码来设置 LangSmith API 密钥：

os.environ["LANGSMITH_API_KEY"] = getpass.getpass("Enter your LangSmith API key: ")
os.environ["LANGSMITH_TRACING"] = "true"

初始化

基本初始化

以下是基本初始化，包括使用目录在本地保存数据。

# | output: false
# | echo: false
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-large")

本地运行（内存模式）

您可以通过简单地使用集合名称和向量嵌入提供者实例化 Chroma 来在内存中运行 Chroma 服务器：

from langchain_chroma import Chroma

vector_store = Chroma(
    collection_name="example_collection",
    embedding_function=embeddings,
)

如果您不需要数据持久化，这是使用 LangChain 构建 AI 应用时进行实验的绝佳选择。

本地运行（持久化数据）

您可以提供 persist_directory 参数来在程序多次运行之间保存数据：

from langchain_chroma import Chroma

vector_store = Chroma(
    collection_name="example_collection",
    embedding_function=embeddings,
    persist_directory="./chroma_langchain_db",
)

Connecting to a chroma Server

If you have a Chroma server running locally, or you have deployed one yourself, you can connect to it by providing the host argument. For example, you can start a Chroma server running locally with chroma run, and then connect it with host='localhost':

from langchain_chroma import Chroma

vector_store = Chroma(
    collection_name="example_collection",
    embedding_function=embeddings,
    host="localhost",
)

For other deployments you can use the port, ssl, and headers arguments to customize your connection.

Chroma cloud

Chroma Cloud users can also build with LangChain. Provide your Chroma instance with your Chroma Cloud API key, tenant, and DB name:

from langchain_chroma import Chroma

vector_store = Chroma(
    collection_name="example_collection",
    embedding_function=embeddings,
    chroma_cloud_api_key=os.getenv("CHROMA_API_KEY"),
    tenant=os.getenv("CHROMA_TENANT"),
    database=os.getenv("CHROMA_DATABASE"),
)

Initialization from client

You can also initialize from a Chroma client, which is particularly useful if you want easier access to the underlying database.

本地运行（内存模式）

import chromadb

client = chromadb.Client()

本地运行（持久化数据）

import chromadb

client = chromadb.PersistentClient(path="./chroma_langchain_db")

Connecting to a chroma Server

For example, if you are running a Chroma server locally (using chroma run):

import chromadb

client = chromadb.HttpClient(host="localhost", port=8000, ssl=False)

Chroma cloud

After setting your CHROMA_API_KEY, CHROMA_TENANT, and CHROMA_DATABASE, you can simply instantiate:

import chromadb

client = chromadb.CloudClient()

Access your chroma DB

collection = client.get_or_create_collection("collection_name")
collection.add(ids=["1", "2", "3"], documents=["a", "b", "c"])

Create a chroma vectorstore

vector_store_from_client = Chroma(
    client=client,
    collection_name="collection_name",
    embedding_function=embeddings,
)

管理向量存储

创建向量存储后，我们可以通过添加和删除不同的项目来与其交互。

向向量存储添加项目

我们可以使用 add_documents 函数向向量存储添加项目。

from uuid import uuid4

from langchain_core.documents import Document

document_1 = Document(
    page_content="I had chocolate chip pancakes and scrambled eggs for breakfast this morning.",
    metadata={"source": "tweet"},
    id=1,
)

document_2 = Document(
    page_content="The weather forecast for tomorrow is cloudy and overcast, with a high of 62 degrees.",
    metadata={"source": "news"},
    id=2,
)

document_3 = Document(
    page_content="Building an exciting new project with LangChain - come check it out!",
    metadata={"source": "tweet"},
    id=3,
)

document_4 = Document(
    page_content="Robbers broke into the city bank and stole $1 million in cash.",
    metadata={"source": "news"},
    id=4,
)

document_5 = Document(
    page_content="Wow! That was an amazing movie. I can't wait to see it again.",
    metadata={"source": "tweet"},
    id=5,
)

document_6 = Document(
    page_content="Is the new iPhone worth the price? Read this review to find out.",
    metadata={"source": "website"},
    id=6,
)

document_7 = Document(
    page_content="The top 10 soccer players in the world right now.",
    metadata={"source": "website"},
    id=7,
)

document_8 = Document(
    page_content="LangGraph is the best framework for building stateful, agentic applications!",
    metadata={"source": "tweet"},
    id=8,
)

document_9 = Document(
    page_content="The stock market is down 500 points today due to fears of a recession.",
    metadata={"source": "news"},
    id=9,
)

document_10 = Document(
    page_content="I have a bad feeling I am going to get deleted :(",
    metadata={"source": "tweet"},
    id=10,
)

documents = [
    document_1,
    document_2,
    document_3,
    document_4,
    document_5,
    document_6,
    document_7,
    document_8,
    document_9,
    document_10,
]
uuids = [str(uuid4()) for _ in range(len(documents))]

vector_store.add_documents(documents=documents, ids=uuids)

Update items in vector store

Now that we have added documents to our vector store, we can update existing documents by using the update_documents function.

updated_document_1 = Document(
    page_content="I had chocolate chip pancakes and fried eggs for breakfast this morning.",
    metadata={"source": "tweet"},
    id=1,
)

updated_document_2 = Document(
    page_content="The weather forecast for tomorrow is sunny and warm, with a high of 82 degrees.",
    metadata={"source": "news"},
    id=2,
)

vector_store.update_document(document_id=uuids[0], document=updated_document_1)
# You can also update multiple documents at once
vector_store.update_documents(
    ids=uuids[:2], documents=[updated_document_1, updated_document_2]
)

从向量存储删除项目

We can also delete items from our vector store as follows:

vector_store.delete(ids=uuids[-1])

查询向量存储

一旦创建了向量存储并添加了相关文档，您很可能希望在链或智能体运行期间对其进行查询。

直接查询

相似度搜索

可以按以下方式执行简单的相似度搜索：

results = vector_store.similarity_search(
    "LangChain provides abstractions to make working with LLMs easy",
    k=2,
    filter={"source": "tweet"},
)
for res in results:
    print(f"* {res.page_content} [{res.metadata}]")

带分数的相似度搜索

如果您想执行相似度搜索并获取对应分数，可以运行：

results = vector_store.similarity_search_with_score(
    "Will it be hot tomorrow?", k=1, filter={"source": "news"}
)
for res, score in results:
    print(f"* [SIM={score:3f}] {res.page_content} [{res.metadata}]")

Search by vector

You can also search by vector:

results = vector_store.similarity_search_by_vector(
    embedding=embeddings.embed_query("I love green eggs and ham!"), k=1
)
for doc in results:
    print(f"* {doc.page_content} [{doc.metadata}]")

其他搜索方法

There are a variety of other search methods that are not covered in this notebook, such as MMR search. For a full list of the search abilities available for Chroma check out the API reference.

转换为检索器进行查询

您还可以将向量存储转换为检索器，以便在链中更方便地使用。 For more information on the different search types and kwargs you can pass, please visit the Chroma API reference.

retriever = vector_store.as_retriever(
    search_type="mmr", search_kwargs={"k": 1, "fetch_k": 5}
)
retriever.invoke("Stealing from the bank is a crime", filter={"source": "news"})

用于检索增强生成

有关如何将此向量存储用于检索增强生成 (RAG) 的指南，请参阅以下部分：

API 参考

For detailed documentation of all Chroma vector store features and configurations head to the API reference

连接这些文档到 Claude、VSCode 等工具，通过 MCP 获取实时答案。

在 GitHub 上编辑此页面或提交 issue。

Documentation Index

​设置

​凭证

​初始化

​基本初始化

​本地运行（内存模式）

​本地运行（持久化数据）

​Connecting to a chroma Server

​Chroma cloud

​Initialization from client

​本地运行（内存模式）

​本地运行（持久化数据）

​Connecting to a chroma Server

​Chroma cloud

​Access your chroma DB

​Create a chroma vectorstore

​管理向量存储

​向向量存储添加项目

​Update items in vector store

​从向量存储删除项目

​查询向量存储

​直接查询

​相似度搜索

​带分数的相似度搜索

​Search by vector

​其他搜索方法

​转换为检索器进行查询

​用于检索增强生成

​API 参考

设置

凭证

初始化

基本初始化

本地运行（内存模式）

本地运行（持久化数据）

Connecting to a chroma Server

Chroma cloud

Initialization from client

本地运行（内存模式）

本地运行（持久化数据）

Connecting to a chroma Server

Chroma cloud

Access your chroma DB

Create a chroma vectorstore

管理向量存储

向向量存储添加项目

Update items in vector store

从向量存储删除项目

查询向量存储

直接查询

相似度搜索

带分数的相似度搜索

Search by vector

其他搜索方法

转换为检索器进行查询

用于检索增强生成

API 参考