Create a new Pinecone account, or sign into your existing one, and create an API key to use in this notebook.
import osfrom getpass import getpassfrom pinecone import Pinecone# get API key at app.pinecone.ioos.environ["PINECONE_API_KEY"] = os.getenv("PINECONE_API_KEY") or getpass( "Enter your Pinecone API key: ")# initialize clientpc = Pinecone()
from uuid import uuid4from langchain_core.documents import Documentdocuments = [ Document( page_content="I had chocolate chip pancakes and scrambled eggs for breakfast this morning.", metadata={"source": "social"}, ), Document( page_content="The weather forecast for tomorrow is cloudy and overcast, with a high of 62 degrees.", metadata={"source": "news"}, ), Document( page_content="Building an exciting new project with LangChain - come check it out!", metadata={"source": "social"}, ), Document( page_content="Robbers broke into the city bank and stole $1 million in cash.", metadata={"source": "news"}, ), Document( page_content="Wow! That was an amazing movie. I can't wait to see it again.", metadata={"source": "social"}, ), Document( page_content="Is the new iPhone worth the price? Read this review to find out.", metadata={"source": "website"}, ), Document( page_content="The top 10 soccer players in the world right now.", metadata={"source": "website"}, ), Document( page_content="LangGraph is the best framework for building stateful, agentic applications!", metadata={"source": "social"}, ), Document( page_content="The stock market is down 500 points today due to fears of a recession.", metadata={"source": "news"}, ), Document( page_content="I have a bad feeling I am going to get deleted :(", metadata={"source": "social"}, ),]uuids = [str(uuid4()) for _ in range(len(documents))]vector_store.add_documents(documents=documents, ids=uuids)
Once we have loaded our documents into the vector store we’re most likely ready to begin querying. There are various method for doing this in LangChain.First, we’ll see how to perform a simple vector search by querying our vector_store directly via the similarity_search method:
results = vector_store.similarity_search("I'm building a new LangChain project!", k=3)for res in results: print(f"* {res.page_content} [{res.metadata}]")
* Building an exciting new project with LangChain - come check it out! [{'source': 'social'}]* Building an exciting new project with LangChain - come check it out! [{'source': 'social'}]* LangGraph is the best framework for building stateful, agentic applications! [{'source': 'social'}]
We can also add metadata filtering to our query to limit our search based on various criteria. Let’s try a simple filter to limit our search to include only records with source=="social":
results = vector_store.similarity_search( "I'm building a new LangChain project!", k=3, filter={"source": "social"},)for res in results: print(f"* {res.page_content} [{res.metadata}]")
* Building an exciting new project with LangChain - come check it out! [{'source': 'social'}]* Building an exciting new project with LangChain - come check it out! [{'source': 'social'}]* LangGraph is the best framework for building stateful, agentic applications! [{'source': 'social'}]
When comparing these results, we can see that our first query returned a different record from the "website" source. In our latter, filtered, query—this is no longer the case.
We can also search while returning the similarity score in a list of (document, score) tuples. Where the document is a LangChain Document object containing our text content and metadata.
results = vector_store.similarity_search_with_score( "I'm building a new LangChain project!", k=3, filter={"source": "social"})for doc, score in results: print(f"[SIM={score:3f}] {doc.page_content} [{doc.metadata}]")
[SIM=12.959961] Building an exciting new project with LangChain - come check it out! [{'source': 'social'}][SIM=12.959961] Building an exciting new project with LangChain - come check it out! [{'source': 'social'}][SIM=1.942383] LangGraph is the best framework for building stateful, agentic applications! [{'source': 'social'}]
We can now query our retriever using the invoke method:
retriever.invoke( input="I'm building a new LangChain project!", filter={"source": "social"})
/usr/local/lib/python3.11/dist-packages/langchain_core/vectorstores/base.py:1082: UserWarning: Relevance scores must be between 0 and 1, got [(Document(id='093fd11f-c85b-4c83-83f0-117df64ff442', metadata={'source': 'social'}, page_content='Building an exciting new project with LangChain - come check it out!'), 6.97998045), (Document(id='54f8f645-9f77-4aab-b9fa-709fd91ae3b3', metadata={'source': 'social'}, page_content='Building an exciting new project with LangChain - come check it out!'), 6.97998045), (Document(id='f9f82811-187c-4b25-85b5-7a42b4da3bff', metadata={'source': 'social'}, page_content='LangGraph is the best framework for building stateful, agentic applications!'), 1.471191405)] self.vectorstore.similarity_search_with_relevance_scores(
[Document(id='093fd11f-c85b-4c83-83f0-117df64ff442', metadata={'source': 'social'}, page_content='Building an exciting new project with LangChain - come check it out!'), Document(id='54f8f645-9f77-4aab-b9fa-709fd91ae3b3', metadata={'source': 'social'}, page_content='Building an exciting new project with LangChain - come check it out!'), Document(id='f9f82811-187c-4b25-85b5-7a42b4da3bff', metadata={'source': 'social'}, page_content='LangGraph is the best framework for building stateful, agentic applications!')]