Skip to main content

Documentation Index

Fetch the complete documentation index at: https://nvd-54.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

Compatibility: Only available on Node.js.
This notebook provides a quick overview for getting started with UnstructuredLoader document loaders. For detailed documentation of all UnstructuredLoader features and configurations head to the API reference.

概述

集成详情

ClassPackageCompatibilityLocalPY support
UnstructuredLoader@langchain/communityNode-only

设置

要访问 UnstructuredLoader document loader,你需要install the @langchain/community integration package, and create an Unstructured account and get an API key.

Local

You can run Unstructured locally in your computer using Docker. To do so, you need to have Docker installed.
docker run -p 8000:8000 -d --rm --name unstructured-api downloads.unstructured.io/unstructured-io/unstructured-api:latest --port 8000 --host 0.0.0.0

凭证

前往 unstructured.io 注册 Unstructured 并生成 API 密钥。完成后设置 UNSTRUCTURED_API_KEY 环境变量:
export UNSTRUCTURED_API_KEY="your-api-key"

安装

LangChain 的 UnstructuredLoader 集成位于 @langchain/community 包中:
npm install @langchain/community @langchain/core

实例化

Now we can instantiate our model object and load documents:
import { UnstructuredLoader } from "@langchain/community/document_loaders/fs/unstructured"

const loader = new UnstructuredLoader("../../../../../../examples/src/document_loaders/example_data/notion.mdx")

Load

const docs = await loader.load()
docs[0]
Document {
  pageContent: '# Testing the notion markdownloader',
  metadata: {
    filename: 'notion.mdx',
    languages: [ 'eng' ],
    filetype: 'text/plain',
    category: 'NarrativeText'
  },
  id: undefined
}
console.log(docs[0].metadata)
{
  filename: 'notion.mdx',
  languages: [ 'eng' ],
  filetype: 'text/plain',
  category: 'NarrativeText'
}

Directories

You can also load all of the files in the directory using UnstructuredDirectoryLoader, which inherits from DirectoryLoader:
import { UnstructuredDirectoryLoader } from "@langchain/community/document_loaders/fs/unstructured";

const directoryLoader = new UnstructuredDirectoryLoader(
  "../../../../../../examples/src/document_loaders/example_data/",
  {}
);
const directoryDocs = await directoryLoader.load();
console.log("directoryDocs.length: ", directoryDocs.length);
console.log(directoryDocs[0])

Unknown file type: Star_Wars_The_Clone_Wars_S06E07_Crisis_at_the_Heart.srt
Unknown file type: test.mp3
directoryDocs.length:  247
Document {
  pageContent: 'Bitcoin: A Peer-to-Peer Electronic Cash System',
  metadata: {
    filetype: 'application/pdf',
    languages: [ 'eng' ],
    page_number: 1,
    filename: 'bitcoin.pdf',
    category: 'Title'
  },
  id: undefined
}

API 参考

有关所有 UnstructuredLoader 功能和配置的详细文档,请前往 API 参考