Skip to main content

Documentation Index

Fetch the complete documentation index at: https://nvd-54.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

本 notebook 提供了 the UnDatasIO document loader. UnDatasIO enables efficient loading and parsing of various document formats including PDF, PNG, JPG, JPEG, and JFIF, with features like document lazy loading and native async support, all through UnDatasIO’s secure cloud API. These capabilities make the processed data ready for generative AI workflows like RAG 的快速入门概览。 For detailed documentation on all features and configurations, refer to the official API reference.

概述

加载器特性

SourceDocument Lazy LoadingNative Async Support
UnDatasIOLoader

设置

凭证

UnDatasIO requires an API token. Generate a free token at undatas.io and set it in the cell below:
import getpass
import os

if "UNDATASIO_TOKEN" not in os.environ:
    os.environ["UNDATASIO_TOKEN"] = getpass.getpass(
        "Enter your UnDatasIO API token: "
    )

安装

Normal installation

运行此 notebook 的其余部分需要以下包。
# Install package, compatible with API partitioning
pip install langchain-undatasio

初始化

The UnDatasIOLoader supports single-file upload & parsing via the UnDatasIO cloud API.
from langchain_undatasio import UnDatasIOLoader

loader = UnDatasIOLoader(
    token=os.environ["UNDATASIO_TOKEN"],
    file_path="demo.pdf"
)

加载

docs = loader.load()
docs[0]
Document(
    metadata={'source': 'demo.pdf', 'task_id': 't1', 'file_id': 'f1'},
    page_content='Growing a Tail: Increasing Output Diversity in Large Language Models\n\nAuthors: Michal Shur-Ofry1, Bar Horowitz-Amsalem1†, Adir Rahamim2, Yonatan Belinkov2*\n\nAffiliations:\n\n1Law Faculty, Hebrew University of Jerusalem; Jerusalem, Israel.\n\n2Faculty of Computer Science, Technion – I'
)
print(docs[0].page_content[:300])
Growing a Tail: Increasing Output Diversity in Large Language Models

Authors: Michal Shur-Ofry1, Bar Horowitz-Amsalem1†, Adir Rahamim2, Yonatan Belinkov2*

Affiliations:

1Law Faculty, Hebrew University of Jerusalem; Jerusalem, Israel.

2Faculty of Computer Science, Technion – I

惰性加载

UnDatasIOLoader supports lazy loading for memory-efficient iteration.
pages = []
for doc in loader.lazy_load():
    pages.append(doc)

pages[0]
Document(
    metadata={'source': 'demo.pdf', 'task_id': 't1', 'file_id': 'f1'},
    page_content='Growing a Tail: Increasing Output Diversity in Large Language Models\n\nAuthors: Michal Shur-Ofry1, Bar Horowitz-Amsalem1†, Adir Rahamim2, Yonatan Belinkov2*\n\nAffiliations:\n\n1Law Faculty, Hebrew University of Jerusalem; Jerusalem, Israel.\n\n2Faculty of Computer Science, Technion – I'
)

See also