构建 SQL 智能体 - Docs by LangChain

概述

在本教程中，你将学习如何使用 LangChain 智能体构建一个能够回答 SQL 数据库相关问题的智能体。在高层次上，智能体将：

从数据库获取可用的表和模式

决定哪些表与问题相关

获取相关表的模式

根据问题和模式信息生成查询

使用 LLM 双重检查查询是否有常见错误

执行查询并返回结果

纠正数据库引擎暴露的错误，直到查询成功

根据结果生成响应

构建 SQL 数据库的 Q&A 系统需要执行模型生成的 SQL 查询。这样做存在固有风险。确保你的数据库连接权限始终尽可能地限制在智能体需要的范围内。这将减轻（但不能消除）构建模型驱动系统的风险。

概念

我们将涵盖以下概念：

用于从 SQL 数据库读取的工具
LangChain 智能体
人机协作流程

设置

安装

pip install langchain  langgraph  langchain-community

LangSmith

设置 LangSmith 来检查链或智能体内部发生了什么。然后设置以下环境变量：

export LANGSMITH_TRACING="true"
export LANGSMITH_API_KEY="..."

1. 选择 LLM

选择一个支持工具调用的模型：

👉 Read the OpenAI chat model integration docs

pip install -U "langchain[openai]"

import os
from langchain.chat_models import init_chat_model

os.environ["OPENAI_API_KEY"] = "sk-..."

model = init_chat_model("gpt-5.4")

👉 Read the Anthropic chat model integration docs

pip install -U "langchain[anthropic]"

import os
from langchain.chat_models import init_chat_model

os.environ["ANTHROPIC_API_KEY"] = "sk-..."

model = init_chat_model("claude-sonnet-4-6")

👉 Read the Azure chat model integration docs

pip install -U "langchain[openai]"

import os
from langchain.chat_models import init_chat_model

os.environ["AZURE_OPENAI_API_KEY"] = "..."
os.environ["AZURE_OPENAI_ENDPOINT"] = "..."
os.environ["OPENAI_API_VERSION"] = "2025-03-01-preview"

model = init_chat_model(
    "azure_openai:gpt-5.4",
    azure_deployment=os.environ["AZURE_OPENAI_DEPLOYMENT_NAME"],
)

👉 Read the Google GenAI chat model integration docs

pip install -U "langchain[google-genai]"

import os
from langchain.chat_models import init_chat_model

os.environ["GOOGLE_API_KEY"] = "..."

model = init_chat_model("google_genai:gemini-2.5-flash-lite")

👉 Read the AWS Bedrock chat model integration docs

pip install -U "langchain[aws]"

from langchain.chat_models import init_chat_model

# Follow the steps here to configure your credentials:
# https://docs.aws.amazon.com/bedrock/latest/userguide/getting-started.html

model = init_chat_model(
    "anthropic.claude-3-5-sonnet-20240620-v1:0",
    model_provider="bedrock_converse",
)

👉 Read the HuggingFace chat model integration docs

pip install -U "langchain[huggingface]"

import os
from langchain.chat_models import init_chat_model

os.environ["HUGGINGFACEHUB_API_TOKEN"] = "hf_..."

model = init_chat_model(
    "microsoft/Phi-3-mini-4k-instruct",
    model_provider="huggingface",
    temperature=0.7,
    max_tokens=1024,
)

👉 Read the OpenRouter chat model integration docs

pip install -U "langchain-openrouter"

import os
from langchain.chat_models import init_chat_model

os.environ["OPENROUTER_API_KEY"] = "sk-..."

model = init_chat_model(
    "auto",
    model_provider="openrouter",
)

以下示例中显示的输出使用了 OpenAI。

2. 配置数据库

你将为本教程创建一个 SQLite 数据库。SQLite 是一个轻量级数据库，易于设置和使用。我们将加载 chinook 数据库，这是一个代表数字媒体商店的示例数据库。为方便起见，我们已将数据库（Chinook.db）托管在公共 GCS 存储桶中。

import requests, pathlib

url = "https://storage.googleapis.com/benchmarks-artifacts/chinook/Chinook.db"
local_path = pathlib.Path("Chinook.db")

if local_path.exists():
    print(f"{local_path} already exists, skipping download.")
else:
    response = requests.get(url)
    if response.status_code == 200:
        local_path.write_bytes(response.content)
        print(f"File downloaded and saved as {local_path}")
    else:
        print(f"Failed to download the file. Status code: {response.status_code}")

我们将使用 langchain_community 包中提供的便捷 SQL 数据库包装器来与数据库交互。该包装器提供了一个简单的接口来执行 SQL 查询和获取结果：

from langchain_community.utilities import SQLDatabase

db = SQLDatabase.from_uri("sqlite:///Chinook.db")

print(f"Dialect: {db.dialect}")
print(f"Available tables: {db.get_usable_table_names()}")
print(f'Sample output: {db.run("SELECT * FROM Artist LIMIT 5;")}')

Dialect: sqlite
Available tables: ['Album', 'Artist', 'Customer', 'Employee', 'Genre', 'Invoice', 'InvoiceLine', 'MediaType', 'Playlist', 'PlaylistTrack', 'Track']
Sample output: [(1, 'AC/DC'), (2, 'Accept'), (3, 'Aerosmith'), (4, 'Alanis Morissette'), (5, 'Alice In Chains')]

3. 添加数据库交互工具

使用 langchain_community 包中提供的 SQLDatabase 包装器来与数据库交互。该包装器提供了一个简单的接口来执行 SQL 查询和获取结果：

from langchain_community.agent_toolkits import SQLDatabaseToolkit

toolkit = SQLDatabaseToolkit(db=db, llm=model)

tools = toolkit.get_tools()

for tool in tools:
    print(f"{tool.name}: {tool.description}\n")

4. 使用 `create_agent`

使用 create_agent 构建一个 ReAct 智能体，代码量最少。智能体将解释请求并生成 SQL 命令，工具将执行这些命令。如果命令有错误，错误消息会返回给模型。模型可以检查原始请求和新的错误消息并生成新命令。这种向模型提供反馈（在本例中是错误消息）的模式非常强大。使用描述性的系统提示词初始化智能体以自定义其行为：

system_prompt = """
You are an agent designed to interact with a SQL database.
Given an input question, create a syntactically correct {dialect} query to run,
then look at the results of the query and return the answer. Unless the user
specifies a specific number of examples they wish to obtain, always limit your
query to at most {top_k} results.

You can order the results by a relevant column to return the most interesting
examples in the database. Never query for all the columns from a specific table,
only ask for the relevant columns given the question.

You MUST double check your query before executing it. If you get an error while
executing a query, rewrite the query and try again.

DO NOT make any DML statements (INSERT, UPDATE, DELETE, DROP etc.) to the
database.

To start you should ALWAYS look at the tables in the database to see what you
can query. Do NOT skip this step.

Then you should query the schema of the most relevant tables.
""".format(
    dialect=db.dialect,
    top_k=5,
)

现在，使用模型、工具和提示词创建智能体：

from langchain.agents import create_agent


agent = create_agent(
    model,
    tools,
    system_prompt=system_prompt,
)

5. 运行智能体

在示例查询上运行智能体并观察其行为：

question = "Which genre on average has the longest tracks?"

for step in agent.stream(
    {"messages": [{"role": "user", "content": question}]},
    stream_mode="values",
):
    step["messages"][-1].pretty_print()

智能体正确地编写了查询，检查了查询，并运行它来生成最终响应。

你可以在 LangSmith 追踪中检查上述运行的所有方面，包括采取的步骤、调用的工具、LLM 看到的提示词等。

（可选）使用 Studio

Studio 提供了一个”客户端”循环以及记忆，因此你可以将其作为聊天界面运行并查询数据库。

在 Studio 中运行你的智能体

除了之前提到的包之外，你还需要：

pip install -U langgraph-cli[inmem]>=0.4.0

在你将运行的目录中，你需要一个包含以下内容的 langgraph.json 文件：

{
  "dependencies": ["."],
  "graphs": {
      "agent": "./sql_agent.py:agent",
      "graph": "./sql_agent_langgraph.py:graph"
  },
  "env": ".env"
}

创建一个 sql_agent.py 文件并插入以下内容：

#sql_agent.py for studio
import pathlib

from langchain.agents import create_agent
from langchain.chat_models import init_chat_model
from langchain_community.agent_toolkits import SQLDatabaseToolkit
from langchain_community.utilities import SQLDatabase
import requests


# 初始化 LLM
model = init_chat_model("gpt-5.4")

# 获取数据库，存储在本地
url = "https://storage.googleapis.com/benchmarks-artifacts/chinook/Chinook.db"
local_path = pathlib.Path("Chinook.db")

if local_path.exists():
    print(f"{local_path} already exists, skipping download.")
else:
    response = requests.get(url)
    if response.status_code == 200:
        local_path.write_bytes(response.content)
        print(f"File downloaded and saved as {local_path}")
    else:
        print(f"Failed to download the file. Status code: {response.status_code}")

db = SQLDatabase.from_uri("sqlite:///Chinook.db")

# 创建工具
toolkit = SQLDatabaseToolkit(db=db, llm=model)

tools = toolkit.get_tools()

for tool in tools:
    print(f"{tool.name}: {tool.description}\n")

# 使用 create_agent
system_prompt = """
You are an agent designed to interact with a SQL database.
Given an input question, create a syntactically correct {dialect} query to run,
then look at the results of the query and return the answer. Unless the user
specifies a specific number of examples they wish to obtain, always limit your
query to at most {top_k} results.

You can order the results by a relevant column to return the most interesting
examples in the database. Never query for all the columns from a specific table,
only ask for the relevant columns given the question.

You MUST double check your query before executing it. If you get an error while
executing a query, rewrite the query and try again.

DO NOT make any DML statements (INSERT, UPDATE, DELETE, DROP etc.) to the
database.

To start you should ALWAYS look at the tables in the database to see what you
can query. Do NOT skip this step.

Then you should query the schema of the most relevant tables.
""".format(
    dialect=db.dialect,
    top_k=5,
)

agent = create_agent(
    model,
    tools,
    system_prompt=system_prompt,
)

6. 实现人机协作审查

在执行之前检查智能体的 SQL 查询以防止意外操作或低效查询是明智的。 LangChain 智能体提供内置的人机协作中间件支持，为智能体工具调用添加监督。让我们配置智能体在调用 sql_db_query 工具时暂停以进行人工审查：

from langchain.agents import create_agent
from langchain.agents.middleware import HumanInTheLoopMiddleware 
from langgraph.checkpoint.memory import InMemorySaver 


agent = create_agent(
    model,
    tools,
    system_prompt=system_prompt,
    middleware=[
        HumanInTheLoopMiddleware(
            interrupt_on={"sql_db_query": True},
            description_prefix="Tool execution pending approval",
        ),
    ],
    checkpointer=InMemorySaver(),
)

我们为智能体添加了检查点以允许暂停和恢复执行。有关此内容以及可用的中间件配置，请参阅人机协作指南。

运行智能体时，它现在会在执行 sql_db_query 工具之前暂停以进行审查。我们可以使用 Command 恢复执行，在本例中接受查询。有关详情，请参阅人机协作指南。

后续步骤

如需更深度的自定义，请查看这个教程，了解如何直接使用 LangGraph 原语实现 SQL 智能体。

连接这些文档到 Claude、VSCode 等工具，通过 MCP 获取实时答案。

在 GitHub 上编辑此页面或提交 issue。

Documentation Index

​概述

​概念

​设置

​安装

​LangSmith

​1. 选择 LLM

​2. 配置数据库

​3. 添加数据库交互工具

​4. 使用 create_agent

​5. 运行智能体

​（可选）使用 Studio

​6. 实现人机协作审查

​后续步骤

概述

概念

设置

安装

LangSmith

1. 选择 LLM

2. 配置数据库

3. 添加数据库交互工具

4. 使用 `create_agent`

5. 运行智能体

（可选）使用 Studio

6. 实现人机协作审查

后续步骤