Use this file to discover all available pages before exploring further.
This page provides a quickstart for using Apache Cassandra® as a Vector Store.
Cassandra is a NoSQL, row-oriented, highly scalable and highly available database.Starting with version 5.0, the database ships with vector search capabilities.
Note: in addition to access to the database, an OpenAI API Key is required to run the full example.
Note: depending on your LangChain setup, you may need to install/upgrade other dependencies needed for this demo(specifically, recent versions of datasets, openai, pypdf and tiktoken are required, along with langchain-community).
The Vector Store integration shown in this page can be used with Cassandra as well as other derived databases, such as Astra DB, which use the CQL (Cassandra Query Language) protocol.
DataStax Astra DB is a managed serverless database built on Cassandra, offering the same interface and strengths.
Depending on whether you connect to a Cassandra cluster or to Astra DB through CQL, you will provide different parameters when creating the vector store object.
You first need to create a cassandra.cluster.Session object, as described in the Cassandra driver documentation. The details vary (e.g. with network settings and authentication), but this might be something like:
from cassandra.cluster import Clustercluster = Cluster(["127.0.0.1"])session = cluster.connect()
You can now set the session, along with your desired keyspace name, as a global CassIO parameter:
vstore = Cassandra( embedding=embe, table_name="cassandra_vector_demo", # session=None, keyspace=None # Uncomment on older versions of LangChain)
Note: you can also pass your session and keyspace directly as parameters when creating the vector store. Using the global cassio.init setting, however, comes handy if your applications uses Cassandra in several ways (for instance, for vector store, chat memory and LLM response caching), as it allows to centralize credential and DB connection management in one place.
In the above, metadata dictionaries are created from the source data and are part of the Document.Add some more entries, this time with add_texts:
texts = ["I think, therefore I am.", "To the things themselves!"]metadatas = [{"author": "descartes"}, {"author": "husserl"}]ids = ["desc_01", "huss_xy"]inserted_ids_2 = vstore.add_texts(texts=texts, metadatas=metadatas, ids=ids)print(f"\nInserted {len(inserted_ids_2)} documents.")
Note: you may want to speed up the execution of add_texts and add_documents by increasing the concurrency level forthese bulk operations - check out the methods’ batch_size parameterfor more details. Depending on the network and the client machine specifications, your best-performing choice of parameters may vary.
This section demonstrates metadata filtering and getting the similarity scores back:
results = vstore.similarity_search("Our life is what we make of it", k=3)for res in results: print(f"* {res.page_content} [{res.metadata}]")
results_filtered = vstore.similarity_search( "Our life is what we make of it", k=3, filter={"author": "plato"},)for res in results_filtered: print(f"* {res.page_content} [{res.metadata}]")
results = vstore.similarity_search_with_score("Our life is what we make of it", k=3)for res, score in results: print(f"* [SIM={score:3f}] {res.page_content} [{res.metadata}]")
results = vstore.max_marginal_relevance_search( "Our life is what we make of it", k=3, filter={"author": "aristotle"},)for res in results: print(f"* {res.page_content} [{res.metadata}]")
retriever = vstore.as_retriever(search_kwargs={"k": 3})philo_template = """You are a philosopher that draws inspiration from great thinkers of the pastto craft well-thought answers to user questions. Use the provided context as the basisfor your answers and do not make up new reasoning paths - just mix-and-match what you are given.Your answers must be concise and to the point, and refrain from answering about other topics than philosophy.CONTEXT:{context}QUESTION: {question}YOUR ANSWER:"""philo_prompt = ChatPromptTemplate.from_template(philo_template)llm = ChatOpenAI()chain = ( {"context": retriever, "question": RunnablePassthrough()} | philo_prompt | llm | StrOutputParser())
chain.invoke("How does Russel elaborate on Peirce's idea of the security blanket?")
the following essentially retrieves the Session object from CassIO and runs a CQL DROP TABLE statement with it:(You will lose the data you stored in it.)
For more information, extended quickstarts and additional usage examples, please visit the CassIO documentation for more on using the LangChain Cassandra vector store.
Apache Cassandra, Cassandra and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries.