Skip to main content

Documentation Index

Fetch the complete documentation index at: https://nvd-54.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

本指南将帮助您开始使用 OpenAI 向量嵌入模型 using LangChain. For detailed documentation on OpenAIEmbeddings 功能和配置选项的详细文档,请参阅 API reference.

概述

集成详情

设置

要访问 OpenAI embedding 模型,您需要创建一个 OpenAI 账户,获取 API 密钥,并安装 langchain-openai 集成包。

凭证

前往 platform.openai.com 注册 OpenAI 并生成 API 密钥。 Once you’ve done this set the OPENAI_API_KEY 环境变量:
import getpass
import os

if not os.getenv("OPENAI_API_KEY"):
    os.environ["OPENAI_API_KEY"] = getpass.getpass("请输入您的 OpenAI API 密钥: ")
如果您通过代理或服务模拟器转发请求,可以通过环境变量设置 base URL,而无需传递 base_url 参数。解析顺序(优先匹配):
  1. 显式的 base_url(或 openai_api_base)关键字参数。
  2. OPENAI_API_BASE — LangChain 在初始化时读取。
  3. OPENAI_BASE_URL — 由底层 openai SDK 客户端读取。
要启用模型调用的自动追踪,请设置您的 LangSmith API key:
os.environ["LANGSMITH_TRACING"] = "true"
os.environ["LANGSMITH_API_KEY"] = getpass.getpass("请输入您的 LangSmith API 密钥: ")

安装

LangChain 的 OpenAI 集成位于 langchain-openai 包中:
pip install -qU langchain-openai

实例化

现在我们可以实例化模型对象并生成聊天补全:
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(
    model="text-embedding-3-large",
    # With the `text-embedding-3` class
    # of models, you can specify the size
    # of the embeddings you want returned.
    # dimensions=1024
)
Azure OpenAI v1 API supportAs of langchain-openai>=1.0.1, OpenAIEmbeddings can be used directly with Azure OpenAI endpoints using the new v1 API, including support for Microsoft Entra ID authentication. 请参阅 Using with Azure OpenAI section below for details.

索引与检索

向量嵌入模型常用于检索增强生成 (RAG) 流程中, 既用于索引数据,也用于后续检索数据。 更详细的说明请参阅我们的 RAG tutorials. 下面展示如何使用 embeddings 对象来索引和检索数据。 在此示例中,我们将在 InMemoryVectorStore.
# 使用示例文本创建向量存储
from langchain_core.vectorstores import InMemoryVectorStore

text = "LangChain is the framework for building context-aware reasoning applications"

vectorstore = InMemoryVectorStore.from_texts(
    [text],
    embedding=embeddings,
)

# Use the vectorstore as a retriever
retriever = vectorstore.as_retriever()

# Retrieve the most similar text
retrieved_documents = retriever.invoke("What is LangChain?")

# show the retrieved document's content
retrieved_documents[0].page_content
'LangChain is the framework for building context-aware reasoning applications'

直接使用

Under the hood, the vectorstore and retriever implementations are calling embeddings.embed_documents(...) and embeddings.embed_query(...) to create embeddings for the text(s) used in from_texts and retrieval invoke operations, respectively. You can directly call these methods to get embeddings for your own use cases.

Embed single texts

You can embed single texts or documents with embed_query:
single_vector = embeddings.embed_query(text)
print(str(single_vector)[:100])  # Show the first 100 characters of the vector
[-0.019276829436421394, 0.0037708976306021214, -0.03294256329536438, 0.0037671267054975033, 0.008175

Embed multiple texts

You can embed multiple texts with embed_documents:
text2 = (
    "LangGraph is a library for building stateful, multi-actor applications with LLMs"
)
two_vectors = embeddings.embed_documents([text, text2])
for vector in two_vectors:
    print(str(vector)[:100])  # Show the first 100 characters of the vector
[-0.019260549917817116, 0.0037612367887049913, -0.03291035071015358, 0.003757466096431017, 0.0082049
[-0.010181212797760963, 0.023419594392180443, -0.04215526953339577, -0.001532090245746076, -0.023573

Using with Azure OpenAI

Azure OpenAI v1 API supportAs of langchain-openai>=1.0.1, OpenAIEmbeddings can be used directly with Azure OpenAI endpoints using the new v1 API. This provides a unified way to use OpenAI embeddings whether hosted on OpenAI or Azure.For the traditional Azure-specific implementation, continue to use AzureOpenAIEmbeddings.

Using Azure OpenAI v1 API with API Key

To use OpenAIEmbeddings with Azure OpenAI, set the base_url to your Azure endpoint with /openai/v1/ appended:
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(
    model="text-embedding-3-large",  # Your Azure deployment name
    base_url="https://{your-resource-name}.openai.azure.com/openai/v1/",
    api_key="your-azure-api-key"
)

# Use as normal
vector = embeddings.embed_query("Hello world")

Using Azure OpenAI with Microsoft entra ID

The v1 API adds native support for Microsoft Entra ID authentication with automatic token refresh. Pass a token provider callable to the api_key parameter:
from azure.identity import DefaultAzureCredential, get_bearer_token_provider
from langchain_openai import OpenAIEmbeddings

# Create a token provider that handles automatic refresh
token_provider = get_bearer_token_provider(
    DefaultAzureCredential(),
    "https://cognitiveservices.azure.com/.default"
)

embeddings = OpenAIEmbeddings(
    model="text-embedding-3-large",  # Your Azure deployment name
    base_url="https://{your-resource-name}.openai.azure.com/openai/v1/",
    api_key=token_provider  # Callable that handles token refresh
)

# Use as normal
vectors = embeddings.embed_documents(["Hello", "World"])
Installation requirementsTo use Microsoft Entra ID authentication, install the Azure Identity library:
pip install azure-identity
You can also pass a token provider callable to the api_key parameter when using asynchronous functions. You must import DefaultAzureCredential from azure.identity.aio:
from azure.identity.aio import DefaultAzureCredential
from langchain_openai import OpenAIEmbeddings

credential = DefaultAzureCredential()

embeddings_async = OpenAIEmbeddings(
    model="text-embedding-3-large",
    api_key=credential
)

# Use async methods when using async callable
vectors = await embeddings_async.aembed_documents(["Hello", "World"])

当使用 an async callable for the API key, you must use async methods (aembed_query, aembed_documents). Sync methods will raise an error.

API 参考

For detailed documentation on OpenAIEmbeddings 功能和配置选项的详细文档,请参阅 API reference.