Documentation Index Fetch the complete documentation index at: https://nvd-54.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
AsyncCockroachDBVectorStore is an implementation of a LangChain vector store using CockroachDB’s distributed SQL database with native vector support.
This notebook goes over how to use the AsyncCockroachDBVectorStore API.
The code lives in the integration package: langchain-cockroachdb .
CockroachDB is a distributed SQL database that provides:
Native vector support with the VECTOR data type (v24.2+)
Distributed C-SPANN indexes for approximate nearest neighbor (ANN) search (v25.2+)
SERIALIZABLE isolation by default for transaction correctness
Horizontal scalability with automatic sharding and replication
PostgreSQL wire-compatible for easy adoption
Key advantages for vector workloads
Distributed vector indexes : C-SPANN indexes automatically shard across your cluster
Multi-tenancy support : Prefix columns in indexes for efficient tenant isolation
Strong consistency : SERIALIZABLE transactions prevent data anomalies
High availability : Automatic failover with no data loss
Install
Install the integration library, langchain-cockroachdb.
pip install -qU langchain-cockroachdb
CockroachDB cluster
You need a CockroachDB cluster with vector support (v24.2+). Choose one option:
Option 1: CockroachDB Cloud (Recommended)
Sign up at cockroachlabs.cloud
Create a free cluster
Get your connection string from the cluster details page
Option 2: Docker (Development)
docker run -d \
--name cockroachdb \
-p 26257:26257 \
-p 8080:8080 \
cockroachdb/cockroach:latest \
start-single-node --insecure
Option 3: Local binary
Download from cockroachlabs.com/docs/releases
cockroach start-single-node --insecure --listen-addr=localhost:26257
Set your connection values
# For CockroachDB Cloud
CONNECTION_STRING = "cockroachdb://user:password@host:26257/database?sslmode=verify-full"
# For local insecure cluster
CONNECTION_STRING = "cockroachdb://root@localhost:26257/defaultdb?sslmode=disable"
TABLE_NAME = "langchain_vectors"
VECTOR_DIMENSION = 1536 # Depends on your embedding model
初始化
Create a connection engine
The CockroachDBEngine manages a connection pool to your cluster:
from langchain_cockroachdb import CockroachDBEngine
engine = CockroachDBEngine . from_connection_string (
url = CONNECTION_STRING ,
pool_size = 10 , # Connection pool size
max_overflow = 20 , # Additional connections allowed
pool_pre_ping = True , # Health check connections
)
Initialize a table
Create a table with the proper schema for vector storage:
await engine . ainit_vectorstore_table (
table_name = TABLE_NAME ,
vector_dimension = VECTOR_DIMENSION ,
)
Optional : Specify a schema nameawait engine . ainit_vectorstore_table (
table_name = TABLE_NAME ,
vector_dimension = VECTOR_DIMENSION ,
schema = "my_schema" , # Default: "public"
)
Create an embedding instance
Use any LangChain embeddings model .
from langchain_openai import OpenAIEmbeddings
embeddings = OpenAIEmbeddings ( model = "text-embedding-3-small" )
Initialize the vector store
from langchain_cockroachdb import AsyncCockroachDBVectorStore
vectorstore = AsyncCockroachDBVectorStore (
engine = engine ,
embeddings = embeddings ,
collection_name = TABLE_NAME ,
)
管理向量存储
添加文档
Add documents with metadata:
import uuid
from langchain_core . documents import Document
docs = [
Document (
id = str ( uuid . uuid4 ()),
page_content = "CockroachDB is a distributed SQL database" ,
metadata = { "source" : "docs" , "category" : "database" },
),
Document (
id = str ( uuid . uuid4 ()),
page_content = "Vector search enables semantic similarity" ,
metadata = { "source" : "docs" , "category" : "features" },
),
]
ids = await vectorstore . aadd_documents ( docs )
Add texts
Add text directly without structuring as documents:
texts = [ "First text" , "Second text" , "Third text" ]
metadatas = [{ "idx" : i } for i in range ( len ( texts ))]
ids = [ str ( uuid . uuid4 ()) for _ in texts ]
ids = await vectorstore . aadd_texts ( texts , metadatas = metadatas , ids = ids )
Performance note : CockroachDB’s vector indexes work best with smaller batch sizes. The default batch_size=100 is optimized for vector inserts. Large batch inserts of VECTOR types can cause performance degradation.
删除文档
Delete documents by ID:
await vectorstore . adelete ([ ids [ 0 ], ids [ 1 ]])
查询向量存储
相似度搜索
Search for similar documents using natural language:
query = "distributed database"
docs = await vectorstore . asimilarity_search ( query , k = 5 )
for doc in docs :
print ( f " { doc . page_content [: 50 ] } ..." )
Similarity search with scores
Get relevance scores with results:
docs_with_scores = await vectorstore . asimilarity_search_with_score ( query , k = 5 )
for doc , score in docs_with_scores :
print ( f "Score: { score :.4f } - { doc . page_content [: 50 ] } ..." )
Search by vector
Search using a pre-computed embedding vector:
query_vector = await embeddings . aembed_query ( query )
docs = await vectorstore . asimilarity_search_by_vector ( query_vector , k = 5 )
Maximum marginal relevance (MMR) search
Retrieve diverse results that balance relevance and diversity:
docs = await vectorstore . amax_marginal_relevance_search (
query ,
k = 5 , # Number of results to return
fetch_k = 20 , # Number of candidates to consider
lambda_mult = 0.5 , # 0 = max diversity, 1 = max relevance
)
Vector indexes
Speed up similarity search with CockroachDB’s C-SPANN vector indexes (requires v25.2+).
What is C-SPANN?
C-SPANN (CockroachDB Space Partition Approximate Nearest Neighbor) is a distributed vector index that:
Automatically shards across your cluster nodes
Provides sub-second query performance at scale
Supports cosine, Euclidean (L2), and inner product distances
Works with prefix columns for multi-tenant architectures
Create a vector index
from langchain_cockroachdb import CSPANNIndex , DistanceStrategy
# Create a cosine distance index (most common)
index = CSPANNIndex (
distance_strategy = DistanceStrategy . COSINE ,
name = "my_vector_index" ,
)
await vectorstore . aapply_vector_index ( index )
Distance strategies
Choose the distance metric that matches your use case:
# Cosine similarity (most common for text embeddings)
CSPANNIndex ( distance_strategy = DistanceStrategy . COSINE )
# Euclidean distance (L2)
CSPANNIndex ( distance_strategy = DistanceStrategy . EUCLIDEAN )
# Inner product (for normalized vectors)
CSPANNIndex ( distance_strategy = DistanceStrategy . INNER_PRODUCT )
Tune index parameters
Adjust partition sizes for performance:
index = CSPANNIndex (
distance_strategy = DistanceStrategy . COSINE ,
min_partition_size = 16 , # Minimum vectors per partition
max_partition_size = 128 , # Maximum vectors per partition
)
await vectorstore . aapply_vector_index ( index )
Query-time tuning
Adjust search parameters at query time:
from langchain_cockroachdb import CSPANNQueryOptions
# Increase beam size for better recall (slower)
query_options = CSPANNQueryOptions ( beam_size = 200 ) # Default: 100
docs = await vectorstore . asimilarity_search (
query ,
k = 10 ,
query_options = query_options ,
)
Drop an index
Remove a vector index:
index = CSPANNIndex ( name = "my_vector_index" )
await vectorstore . adrop_vector_index ( index )
元数据过滤
Filter similarity searches using metadata fields.
Supported operators
Operator Meaning Example $eqEquality {"category": "news"}$neNot equal {"category": {"$ne": "spam"}}$gtGreater than {"year": {"$gt": 2020}}$gteGreater than or equal {"rating": {"$gte": 4.0}}$ltLess than {"year": {"$lt": 2023}}$lteLess than or equal {"rating": {"$lte": 3.0}}$inIn list {"category": {"$in": ["news", "blog"]}}$ninNot in list {"source": {"$nin": ["spam", "test"]}}$betweenBetween values {"year": {"$between": [2020, 2023]}}$likePattern match {"source": {"$like": "wiki%"}}$ilikeCase-insensitive {"category": {"$ilike": "%NEWS%"}}$andLogical AND {"$and": [{...}, {...}]}$orLogical OR {"$or": [{...}, {...}]}
Filter examples
# Simple equality
docs = await vectorstore . asimilarity_search (
query ,
filter = { "category" : "news" },
)
# Numeric comparison
docs = await vectorstore . asimilarity_search (
query ,
filter = { "year" : { "$gte" : 2020 }},
)
# Complex filters
docs = await vectorstore . asimilarity_search (
query ,
filter = {
"$and" : [
{ "category" : { "$in" : [ "news" , "blog" ]}},
{ "year" : { "$gte" : 2020 }},
{ "rating" : { "$gt" : 3.5 }},
]
},
)
Sync interface
All async methods have sync equivalents using the sync wrapper:
from langchain_cockroachdb import CockroachDBVectorStore
# Create sync vectorstore
vectorstore = CockroachDBVectorStore (
engine = engine ,
embeddings = embeddings ,
collection_name = TABLE_NAME ,
)
# Use sync methods
docs = vectorstore . similarity_search ( query , k = 5 )
ids = vectorstore . add_documents ( docs )
vectorstore . apply_vector_index ( index )
Usage for retrieval-augmented generation (RAG)
For implementing RAG with CockroachDB as your vector store, see the LangChain RAG tutorial . The CockroachDB vector store can be used in place of any other vector store in those patterns.
⚠️ This operation cannot be undone
Drop the vector store table:
await engine . adrop_table ( TABLE_NAME )
API 参考
For detailed documentation of all features and configurations:
Additional resources
连接这些文档 到 Claude、VSCode 等工具,通过 MCP 获取实时答案。