Skip to main content

Documentation Index

Fetch the complete documentation index at: https://nvd-54.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

This guide provides a quick overview for getting started with UnstructuredXMLLoader document loader. The UnstructuredXMLLoader is used to load XML files. The loader works with .xml files. The page content will be the text extracted from the XML tags.

概述

集成详情

ClassPackageLocalSerializableJS support
UnstructuredXMLLoaderlangchain_community

加载器特性

SourceDocument Lazy LoadingNative Async Support
UnstructuredXMLLoader

设置

To access UnstructuredXMLLoader document loader you’ll need to install the langchain-community integration package.

凭证

No credentials are needed to use the UnstructuredXMLLoader 要启用模型调用的自动追踪,请设置你的 LangSmith API 密钥:
os.environ["LANGSMITH_API_KEY"] = getpass.getpass("Enter your LangSmith API key: ")
os.environ["LANGSMITH_TRACING"] = "true"

安装

安装 langchain_community
pip install -qU langchain_community

初始化

现在我们可以实例化模型对象并加载文档:
from langchain_community.document_loaders import UnstructuredXMLLoader

loader = UnstructuredXMLLoader(
    "./example_data/factbook.xml",
)

加载

docs = loader.load()
docs[0]
Document(metadata={'source': './example_data/factbook.xml'}, page_content='United States\n\nWashington, DC\n\nJoe Biden\n\nBaseball\n\nCanada\n\nOttawa\n\nJustin Trudeau\n\nHockey\n\nFrance\n\nParis\n\nEmmanuel Macron\n\nSoccer\n\nTrinidad & Tobado\n\nPort of Spain\n\nKeith Rowley\n\nTrack & Field')
print(docs[0].metadata)
{'source': './example_data/factbook.xml'}

惰性加载

page = []
for doc in loader.lazy_load():
    page.append(doc)
    if len(page) >= 10:
        # do some paged operation, e.g.
        # index.upsert(page)

        page = []

API 参考

For detailed documentation of all UnstructuredXMLLoader features and configurations head to the API reference