本文介绍如何useDocumentation Index
Fetch the complete documentation index at: https://nvd-54.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
WebBaseLoader to load all text from HTML webpages into a document format that we can use downstream. For more custom logic for loading webpages look at some child class examples such as IMSDbLoader, AZLyricsLoader, and CollegeConfidentialLoader.
If you don’t want to worry about website crawling, bypassing JS-blocking sites, and data cleaning, consider using FireCrawlLoader or the faster option SpiderLoader.
概述
集成详情
- TODO: Fill in table features.
- TODO: Remove JS support link if not relevant, otherwise ensure link is correct.
- TODO: Make sure API reference links are correct.
| Class | Package | Local | Serializable | JS support |
|---|---|---|---|---|
WebBaseLoader | langchain-community | ✅ | ❌ | ❌ |
加载器特性
| Source | Document Lazy Loading | Native Async Support |
|---|---|---|
WebBaseLoader | ✅ | ✅ |
设置
凭证
WebBaseLoader does not require any credentials.
安装
To use theWebBaseLoader you first need to install the langchain-community Python 包。
初始化
现在我们可以实例化模型对象并加载文档:loader.requests_kwargs = {'verify':False}
Initialization with multiple pages
You can also pass in a list of pages to load from.加载
Load multiple urls concurrently
You can speed up the scraping process by scraping and parsing multiple urls concurrently. There are reasonable limits to concurrent requests, defaulting to 2 per second. If you aren’t concerned about being a good citizen, or you control the server you are scraping and don’t care about load, you can change therequests_per_second parameter to increase the max concurrent requests. Note, while this will speed up the scraping process, but may cause the server to block you. Be careful!
Loading a xml file, or using a different BeautifulSoup parser
You can also look atSitemapLoader for an example of how to load a sitemap file, which is an example of using this feature.
惰性加载
You can use lazy loading to only load one page at a time in order to minimize memory requirements.Async
Using proxies
Sometimes you might need to use proxies to get around IP blocks. You can pass in a dictionary of proxies to the loader (andrequests underneath) to use them.
API 参考
For detailed documentation of allWebBaseLoader features and configurations head to the API reference
通过 MCP 将这些文档连接到 Claude、VSCode 等工具以获取实时答案。

