Skip to main content

Documentation Index

Fetch the complete documentation index at: https://nvd-54.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

AgentQL’s document loader provides structured data extraction from any web page using an AgentQL query. AgentQL can be used across multiple languages and web pages without breaking over time and change.

概述

AgentQLLoader 需要以下两个参数: 以下参数为可选设置:
  • api_key: Your AgentQL API key from dev.agentql.com. Optional.
  • timeout: The number of seconds to wait for a request before timing out. Defaults to 900.
  • is_stealth_mode_enabled: Whether to enable experimental anti-bot evasion strategies. This feature may not work for all websites at all times. Data extraction may take longer to complete with this mode enabled. Defaults to False.
  • wait_for: The number of seconds to wait for the page to load before extracting data. Defaults to 0.
  • is_scroll_to_bottom_enabled: Whether to scroll to bottom of the page before extracting data. Defaults to False.
  • mode: "standard" uses deep data analysis, while "fast" trades some depth of analysis for speed and is adequate for most usecases. Learn more about the modes in this guide. Defaults to "fast".
  • is_screenshot_enabled: Whether to take a screenshot before extracting data. Returned in ‘metadata’ as a Base64 string. Defaults to False.
AgentQLLoader is implemented with AgentQL’s REST API

集成详情

ClassPackageLocalSerializableJS support
AgentQLLoaderlangchain-agentql

加载器特性

SourceDocument Lazy LoadingNative Async Support
AgentQLLoader

设置

要使用 AgentQL Document Loader,你需要configure the AGENTQL_API_KEY environment variable, or use the api_key parameter. You can acquire an API key from our Dev Portal.

安装

安装 langchain-agentql
pip install -qU langchain-agentql

设置凭证

import os

os.environ["AGENTQL_API_KEY"] = "YOUR_AGENTQL_API_KEY"

初始化

接下来实例化你的模型对象:
from langchain_agentql.document_loaders import AgentQLLoader

loader = AgentQLLoader(
    url="https://www.agentql.com/blog",
    query="""
    {
        posts[] {
            title
            url
            date
            author
        }
    }
    """,
    is_scroll_to_bottom_enabled=True,
)

加载

docs = loader.load()
docs[0]
Document(metadata={'request_id': 'bdb9dbe7-8a7f-427f-bc16-839ccc02cae6', 'generated_query': None, 'screenshot': None}, page_content="{'posts': [{'title': 'Launch Week Recap—make the web AI-ready', 'url': 'https://www.agentql.com/blog/2024-launch-week-recap', 'date': 'Nov 18, 2024', 'author': 'Rachel-Lee Nabors'}, {'title': 'Accurate data extraction from PDFs and images with AgentQL', 'url': 'https://www.agentql.com/blog/accurate-data-extraction-pdfs-images', 'date': 'Feb 1, 2025', 'author': 'Rachel-Lee Nabors'}, {'title': 'Introducing Scheduled Scraping Workflows', 'url': 'https://www.agentql.com/blog/scheduling', 'date': 'Dec 2, 2024', 'author': 'Rachel-Lee Nabors'}, {'title': 'Updates to Our Pricing Model', 'url': 'https://www.agentql.com/blog/2024-pricing-update', 'date': 'Nov 19, 2024', 'author': 'Rachel-Lee Nabors'}, {'title': 'Get data from any page: AgentQL’s REST API Endpoint—Launch week day 5', 'url': 'https://www.agentql.com/blog/data-rest-api', 'date': 'Nov 15, 2024', 'author': 'Rachel-Lee Nabors'}]}")
print(docs[0].metadata)
{'request_id': 'bdb9dbe7-8a7f-427f-bc16-839ccc02cae6', 'generated_query': None, 'screenshot': None}

惰性加载

AgentQLLoader currently only loads one Document at a time. Therefore, load() and lazy_load() behave the same:
pages = [doc for doc in loader.lazy_load()]
pages
[Document(metadata={'request_id': '06273abd-b2ef-4e15-b0ec-901cba7b4825', 'generated_query': None, 'screenshot': None}, page_content="{'posts': [{'title': 'Launch Week Recap—make the web AI-ready', 'url': 'https://www.agentql.com/blog/2024-launch-week-recap', 'date': 'Nov 18, 2024', 'author': 'Rachel-Lee Nabors'}, {'title': 'Accurate data extraction from PDFs and images with AgentQL', 'url': 'https://www.agentql.com/blog/accurate-data-extraction-pdfs-images', 'date': 'Feb 1, 2025', 'author': 'Rachel-Lee Nabors'}, {'title': 'Introducing Scheduled Scraping Workflows', 'url': 'https://www.agentql.com/blog/scheduling', 'date': 'Dec 2, 2024', 'author': 'Rachel-Lee Nabors'}, {'title': 'Updates to Our Pricing Model', 'url': 'https://www.agentql.com/blog/2024-pricing-update', 'date': 'Nov 19, 2024', 'author': 'Rachel-Lee Nabors'}, {'title': 'Get data from any page: AgentQL’s REST API Endpoint—Launch week day 5', 'url': 'https://www.agentql.com/blog/data-rest-api', 'date': 'Nov 15, 2024', 'author': 'Rachel-Lee Nabors'}]}")]

API 参考

有关如何使用此集成的更多信息,请参阅 git repo or the langchain integration documentation