Skip to main content

Documentation Index

Fetch the complete documentation index at: https://nvd-54.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

This notebook provides a quick overview for getting started with FireCrawlLoader document loaders. For detailed documentation of all FireCrawlLoader features and configurations head to the API reference.

概述

集成详情

ClassPackageLocalSerializablePY support
FireCrawlLoader@langchain/community🟠 (see details below)beta

Loader features

SourceWeb LoaderNode Envs Only
FireCrawlLoader
FireCrawl crawls and convert any website into LLM-ready data. It crawls all accessible sub-pages and give you clean markdown and metadata for each. No sitemap required. FireCrawl handles complex tasks such as reverse proxies, caching, rate limits, and content blocked by JavaScript. Built by the mendable.ai team. This guide shows how to scrap and crawl entire websites and load them using the FireCrawlLoader in LangChain.

设置

要访问 FireCrawlLoader document loader,你需要install the @langchain/community integration, and the @mendable/firecrawl-js@0.0.36 package. Then create a FireCrawl account and get an API key.

凭证

Sign up and get your free FireCrawl API key to start. FireCrawl offers 300 free credits to get you started, and it’s open-source in case you want to self-host. 完成后设置 FIRECRAWL_API_KEY 环境变量:
export FIRECRAWL_API_KEY="your-api-key"
如果你想要自动追踪模型调用,还可以设置你的 LangSmith API 密钥,取消注释以下内容:
# export LANGSMITH_TRACING="true"
# export LANGSMITH_API_KEY="your-api-key"

安装

LangChain 的 FireCrawlLoader 集成位于 @langchain/community 包中:
npm install @langchain/community @langchain/core @mendable/firecrawl-js@0.0.36

实例化

Here’s an example of how to use the FireCrawlLoader to load web search results: Firecrawl offers 3 modes: scrape, crawl, and map. In scrape mode, Firecrawl will only scrape the page you provide. In crawl mode, Firecrawl will crawl the entire website. In map mode, Firecrawl will return semantic links related to the website. The formats (scrapeOptions.formats for crawl mode) parameter allows selection from "markdown", "html", or "rawHtml". However, the Loaded Document will return content in only one format, prioritizing as follows: markdown, then html, and finally rawHtml. Now we can instantiate our model object and load documents:
import "@mendable/firecrawl-js";
import { FireCrawlLoader } from "@langchain/community/document_loaders/web/firecrawl"

const loader = new FireCrawlLoader({
  url: "https://firecrawl.dev", // The URL to scrape
  apiKey: "...", // Optional, defaults to `FIRECRAWL_API_KEY` in your env.
  mode: "scrape", // The mode to run the crawler in. Can be "scrape" for single urls or "crawl" for all accessible subpages
  params: {
    // optional parameters based on Firecrawl API docs
    // For API documentation, visit https://docs.firecrawl.dev
  },
})

Load

const docs = await loader.load()
docs[0]
Document {
  pageContent: "Introducing [Smart Crawl!](https://www.firecrawl.dev/smart-crawl)\n" +
    " Join the waitlist to turn any web"... 18721 more characters,
  metadata: {
    title: "Home - Firecrawl",
    description: "Firecrawl crawls and converts any website into clean markdown.",
    keywords: "Firecrawl,Markdown,Data,Mendable,LangChain",
    robots: "follow, index",
    ogTitle: "Firecrawl",
    ogDescription: "Turn any website into LLM-ready data.",
    ogUrl: "https://www.firecrawl.dev/",
    ogImage: "https://www.firecrawl.dev/og.png?123",
    ogLocaleAlternate: [],
    ogSiteName: "Firecrawl",
    sourceURL: "https://firecrawl.dev",
    pageStatusCode: 500
  },
  id: undefined
}
console.log(docs[0].metadata)
{
  title: "Home - Firecrawl",
  description: "Firecrawl crawls and converts any website into clean markdown.",
  keywords: "Firecrawl,Markdown,Data,Mendable,LangChain",
  robots: "follow, index",
  ogTitle: "Firecrawl",
  ogDescription: "Turn any website into LLM-ready data.",
  ogUrl: "https://www.firecrawl.dev/",
  ogImage: "https://www.firecrawl.dev/og.png?123",
  ogLocaleAlternate: [],
  ogSiteName: "Firecrawl",
  sourceURL: "https://firecrawl.dev",
  pageStatusCode: 500
}

Additional parameters

For params you can pass any of the params according to the Firecrawl documentation.

API 参考

有关所有 FireCrawlLoader 功能和配置的详细文档,请前往 API 参考