Skip to main content

Documentation Index

Fetch the complete documentation index at: https://nvd-54.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

Nebius Token Factory 提供对 high-quality 向量嵌入模型 through a unified 接口。 The Nebius 向量嵌入模型 convert text into numerical vectors that capture semantic meaning, making them useful for various applications like semantic search, clustering, and recommendations.

概述

The NebiusEmbeddings class provides access to Nebius Token Factory’s 向量嵌入模型 through LangChain. These embeddings can be used for semantic search, document similarity, and other NLP tasks requiring vector representations of text.

集成详情

  • Provider: Nebius Token Factory
  • Model Type: Text 向量嵌入模型
  • Primary Use Case: Generate vector representations of text for semantic similarity and retrieval
  • Currently Highlighted Model: Qwen/Qwen3-Embedding-8B
  • Embedding Dimensions: 4,096 (for Qwen/Qwen3-Embedding-8B)

设置

安装

The Nebius integration can be installed via pip:
pip install -U langchain-nebius

凭证

Nebius requires an API key that can be passed as an initialization parameter api_key or set as the environment variable NEBIUS_API_KEY. You can obtain an API key by creating an account on Nebius Token Factory.
import getpass
import os

# Make sure you've set your API key as an environment variable
if "NEBIUS_API_KEY" not in os.environ:
    os.environ["NEBIUS_API_KEY"] = getpass.getpass("请输入您的 Nebius API 密钥: ")

实例化

The NebiusEmbeddings class can be instantiated with optional parameters for the API key and model name:
from langchain_nebius import NebiusEmbeddings

# Initialize the embeddings model
embeddings = NebiusEmbeddings(
    # api_key="YOUR_API_KEY",  # 您可以直接传递 API 密钥
    model="Qwen/Qwen3-Embedding-8B"  # The default embedding model
)

可用模型

The list of supported models is available at Nebius Token Factory Models Page

索引与检索

向量嵌入模型常用于检索增强生成 (RAG) 流程中, both for indexing data and later retrieving it. The following example demonstrates how to use NebiusEmbeddings with a vector store for document retrieval.
from langchain_community.vectorstores import FAISS
from langchain_core.documents import Document

# Prepare documents
docs = [
    Document(
        page_content="Machine learning algorithms build mathematical models based on sample data"
    ),
    Document(page_content="Deep learning uses neural networks with many layers"),
    Document(page_content="Climate change is a major global environmental challenge"),
    Document(
        page_content="Neural networks are inspired by the human brain's structure"
    ),
]

# Create vector store
vector_store = FAISS.from_documents(docs, embeddings)

# Perform similarity search
query = "How does the brain influence AI?"
results = vector_store.similarity_search(query, k=2)

print("Search results for query:", query)
for i, doc in enumerate(results):
    print(f"Result {i + 1}: {doc.page_content}")
Search results for query: How does the brain influence AI?
Result 1: Neural networks are inspired by the human brain's structure
Result 2: Deep learning uses neural networks with many layers

Using with InMemoryVectorStore

You can also use the InMemoryVectorStore for lightweight applications:
from langchain_core.vectorstores import InMemoryVectorStore

# Create a sample text
text = "LangChain is a framework for developing applications powered by language models"

# Create a vector store
vectorstore = InMemoryVectorStore.from_texts(
    [text],
    embedding=embeddings,
)

# Use as a retriever
retriever = vectorstore.as_retriever()

# Retrieve similar documents
docs = retriever.invoke("What is LangChain?")
print(f"Retrieved document: {docs[0].page_content}")
Retrieved document: LangChain is a framework for developing applications powered by language models

直接使用

You can directly use the NebiusEmbeddings class to generate embeddings for text without using a vector store.

Embedding a single text

You can use the embed_query method to embed a single piece of text:
query = "What is machine learning?"
query_embedding = embeddings.embed_query(query)

# Check the embedding dimension
print(f"Embedding dimension: {len(query_embedding)}")
print(f"First few values: {query_embedding[:5]}")
Embedding dimension: 4096
First few values: [0.007419586181640625, 0.002246856689453125, 0.00193023681640625, -0.0066070556640625, -0.0179901123046875]

Embedding multiple texts

You can embed multiple texts at once using the embed_documents method:
documents = [
    "Machine learning is a branch of artificial intelligence",
    "Deep learning is a subfield of machine learning",
    "Natural language processing deals with interactions between computers and human language",
]

document_embeddings = embeddings.embed_documents(documents)

# Check the results
print(f"Number of document embeddings: {len(document_embeddings)}")
print(f"Each embedding has {len(document_embeddings[0])} dimensions")
Number of document embeddings: 3
Each embedding has 4096 dimensions

异步支持

NebiusEmbeddings 支持异步操作:
import asyncio


async def generate_embeddings_async():
    # Embed a single query
    query_result = await embeddings.aembed_query("What is the capital of France?")
    print(f"Async query embedding dimension: {len(query_result)}")

    # Embed multiple documents
    docs = [
        "Paris is the capital of France",
        "Berlin is the capital of Germany",
        "Rome is the capital of Italy",
    ]
    docs_result = await embeddings.aembed_documents(docs)
    print(f"Async document embeddings count: {len(docs_result)}")


await generate_embeddings_async()
Async query embedding dimension: 4096
Async document embeddings count: 3

Document similarity example

import numpy as np
from scipy.spatial.distance import cosine

# Create some documents
documents = [
    "Machine learning algorithms build mathematical models based on sample data",
    "Deep learning uses neural networks with many layers",
    "Climate change is a major global environmental challenge",
    "Neural networks are inspired by the human brain's structure",
]

# Embed the documents
embeddings_list = embeddings.embed_documents(documents)


# Function to calculate similarity
def calculate_similarity(embedding1, embedding2):
    return 1 - cosine(embedding1, embedding2)


# Print similarity matrix
print("Document Similarity Matrix:")
for i, emb_i in enumerate(embeddings_list):
    similarities = []
    for j, emb_j in enumerate(embeddings_list):
        similarity = calculate_similarity(emb_i, emb_j)
        similarities.append(f"{similarity:.4f}")
    print(f"Document {i + 1}: {similarities}")
Document Similarity Matrix:
Document 1: ['1.0000', '0.8282', '0.5811', '0.7985']
Document 2: ['0.8282', '1.0000', '0.5897', '0.8315']
Document 3: ['0.5811', '0.5897', '1.0000', '0.5918']
Document 4: ['0.7985', '0.8315', '0.5918', '1.0000']

API 参考

更多详情 about the Nebius Token Factory API, visit the Nebius Token Factory Documentation.