基于LangChain和Gemini模型的ReAct代理：推理与行动的结合

近年来，大型语言模型 (LLMs) 的能力突飞猛进。其中，一种名为 ReAct (Reason + Act，推理 + 行动) 的模式脱颖而出，展现出强大的能力。ReAct 模式不仅仅允许语言模型生成回复，更赋予了它们推理任务、执行外部操作（例如搜索或计算），并利用操作结果进行进一步推理，最终形成完整答案的能力。本文将深入探讨如何利用 LangChain 框架，结合 Google 的 Gemini 模型，从零开始构建一个 ReAct 代理，并允许该模型基于推理调用自定义工具。

ReAct：让语言模型具备思考、行动与观察的能力

ReAct 模式的核心在于模拟人类解决问题的过程。它并非简单地给出答案，而是通过一个反馈循环，让 LLM 能够“思考 (Reasoning)”、“行动 (Act)”、“观察 (Observe)” 结果，并根据观察结果进行持续的思考，最终得出解决方案。这种循环式的过程，使 LLM 具备了更强的适应性和解决复杂问题的能力。

例如，假设我们向一个传统的 LLM 询问：“2023年票房最高的科幻电影的导演是谁？”。它可能会直接搜索“2023年票房最高的科幻电影”并尝试从搜索结果中提取导演信息。但如果搜索结果不直接包含导演信息，或者信息不准确，LLM 就可能给出错误的答案。

而一个 ReAct 代理在处理相同问题时，会首先“思考”解决这个问题需要哪些步骤，例如：

思考: 需要先确定2023年票房最高的科幻电影是什么。
行动: 使用搜索工具查找“2023年票房最高的科幻电影”。
观察: 搜索结果显示《阿凡达：水之道》是票房最高的科幻电影。
思考: 现在需要确定《阿凡达：水之道》的导演。
行动: 使用搜索工具查找“《阿凡达：水之道》导演”。
观察: 搜索结果显示《阿凡达：水之道》的导演是詹姆斯·卡梅隆。
思考: 现在我已经找到了答案。
行动: 返回答案：“《阿凡达：水之道》的导演是詹姆斯·卡梅隆”。

通过这种 “思考-行动-观察” 的循环，ReAct 代理能够更加可靠地解决问题，即使信息并非唾手可得。

LangChain：构建 ReAct 代理的强大框架

LangChain 是一个用于构建基于 LLM 的应用程序的框架。它提供了一系列的模块化组件，例如模型、提示模板、链、代理和回调等，可以帮助开发者快速构建各种 LLM 应用。 LangChain 极大地简化了 ReAct 代理的构建过程，使开发者能够专注于定义模型的行为和工具，而无需从底层实现复杂的推理逻辑。

LangChain 提供了一个名为 AgentType.REACT_DOCSTORE 的预定义代理，可以直接利用已有的文档存储进行推理和行动。然而，为了更好地理解 ReAct 模式的原理，并实现更灵活的定制化，本文将从零开始构建一个 ReAct 代理。

Gemini 模型：驱动 ReAct 代理的强大引擎

Google 的 Gemini 模型是一系列多模态大型语言模型，旨在理解和生成文本、图像、音频等多种类型的数据。 Gemini 模型强大的推理能力和生成能力，使其成为驱动 ReAct 代理的理想选择。

与传统的 LLM 相比，Gemini 模型在理解上下文、进行逻辑推理和生成连贯的文本方面表现出色。这使得它能够更好地执行 ReAct 模式中的“思考”环节，并生成更清晰、更准确的行动指令。

自定义工具：扩展 ReAct 代理的能力边界

为了让 ReAct 代理能够执行特定的任务，我们需要定义一些自定义工具 (Tools)。这些工具可以与外部 API 或服务进行交互，获取数据或执行操作。

在本文中，我们将使用以下两个自定义工具：

movie_plot: 给定电影名称，返回电影的剧情简介。
character_count: 计算给定句子中的字符数。

这些工具只是示例，实际应用中可以根据需要定义各种各样的工具，例如：

天气查询工具：查询特定地区的天气预报。
计算器工具：执行数学计算。
数据库查询工具：从数据库中检索数据。
代码执行工具：执行代码片段。

通过自定义工具，我们可以将 ReAct 代理与外部世界连接起来，使其能够解决更加复杂和多样化的任务。

实现 ReAct 循环：代码示例与解析

下面将展示如何使用 LangChain 和 Gemini 模型实现一个 ReAct 循环。

首先，我们需要安装必要的库：

pip install langchain google-generativeai

接下来，定义自定义工具：

from langchain.tools import BaseTool
from typing import Optional, Type
from langchain.pydantic_v1 import BaseModel, Field

class MoviePlotInput(BaseModel):
    movie_name: str = Field(description="The name of the movie to get the plot for.")

class CharacterCountInput(BaseModel):
    sentence: str = Field(description="The sentence to count characters in.")


class MoviePlotTool(BaseTool):
    name = "movie_plot"
    description = "Useful for getting the plot of a movie. Input should be the movie name."
    args_schema: Type[BaseModel] = MoviePlotInput

    def _run(self, movie_name: str):
        # Replace with your actual movie plot retrieval logic
        # This is just a placeholder
        if movie_name == "The Matrix":
            return "A computer hacker learns from mysterious rebels about the true nature of his reality and his role in the war against its controllers."
        elif movie_name == "Inception":
            return "A thief who steals corporate secrets through the use of dream-sharing technology is given the inverse task of planting an idea into the mind of a CEO."
        else:
            return "Plot not found."

    async def _arun(self, movie_name: str):
        raise NotImplementedError("This tool does not support asynchronous execution")


class CharacterCountTool(BaseTool):
    name = "character_count"
    description = "Useful for counting the number of characters in a sentence. Input should be a sentence."
    args_schema: Type[BaseModel] = CharacterCountInput

    def _run(self, sentence: str):
        return f"The sentence '{sentence}' has {len(sentence)} characters."

    async def _arun(self, sentence: str):
        raise NotImplementedError("This tool does not support asynchronous execution")


tools = [MoviePlotTool(), CharacterCountTool()]

这段代码定义了两个工具 MoviePlotTool 和 CharacterCountTool，分别用于获取电影剧情和计算字符数。注意 _run 方法是工具执行的核心逻辑。在实际应用中，你需要替换这里的占位符代码，使用真实的 API 或服务来获取数据或执行操作。

接下来，我们需要定义 ReAct 提示模板：

from langchain.prompts import PromptTemplate

template = """Answer the following questions as best you can. You have access to the following tools:

{tools}

Use the following format:

Question: the input question you must answer
Thought: you should always think about what to do
Action: the action to take, should be one of [{tool_names}]
Action Input: the input to the action
Observation: the result of the action
... (this Thought/Action/Action Input/Observation can repeat N times)
Thought: I now know the final answer
Final Answer: the final answer to the original input question

Begin!

Question: {input}
Thought:{agent_scratchpad}"""

prompt = PromptTemplate.from_template(template)

这个提示模板定义了 ReAct 代理的行为规范。它告诉代理可以使用哪些工具，以及如何使用这些工具进行推理和行动。 {tools} 占位符会被工具的描述信息替换， {tool_names} 占位符会被工具的名称列表替换， {input} 占位符会被用户的问题替换， {agent_scratchpad} 占位符会被代理的推理过程记录替换。

然后，我们需要配置 Gemini 模型：

import os
import google.generativeai as genai
from langchain.llms import GoogleGenerativeAI

# Replace with your actual Google API key
os.environ["GOOGLE_API_KEY"] = "YOUR_API_KEY"
genai.configure(api_key=os.environ["GOOGLE_API_KEY"])

model = GoogleGenerativeAI(model="gemini-pro", temperature=0)

这段代码使用 langchain.llms.GoogleGenerativeAI 将 Gemini 模型集成到 LangChain 框架中。你需要将 "YOUR_API_KEY" 替换为你自己的 Google API 密钥。 temperature 参数控制模型的随机性，设置为 0 可以让模型给出更确定的答案。

最后，我们需要创建 ReAct 代理：

from langchain.agents import AgentExecutor, LLMSingleActionAgent, AgentOutputParser
from langchain.schema import AgentAction, AgentFinish, OutputParserException
import re

class CustomOutputParser(AgentOutputParser):
    def parse(self, llm_output: str) -> AgentAction | AgentFinish:
        # Check if agent should finish
        if "Final Answer:" in llm_output:
            return AgentFinish(
                # Return values is generally always a dictionary with a single `output` key
                # It is not mandatory!
                return_values={"output": llm_output.split("Final Answer:")[-1].strip()},
                log=llm_output,
            )

        # Parse out the action and action input
        try:
            action = re.search(r"Action:\s*(.*?)", llm_output, re.DOTALL).group(1).strip()
            action_input = re.search(r"Action Input:\s*(.*?)", llm_output, re.DOTALL).group(1).strip(" ").strip('"')
            return AgentAction(tool=action, tool_input=action_input, log=llm_output)
        except Exception:
            raise OutputParserException(f"Could not parse LLM output: `{llm_output}`")


output_parser = CustomOutputParser()

tool_names = [tool.name for tool in tools]
agent = LLMSingleActionAgent(
    llm=model,
    prompt=prompt,
    stop=["\nObservation:"],
    tool_names=tool_names,
    output_parser=output_parser
)

agent_executor = AgentExecutor.from_agent_and_tools(agent=agent, tools=tools, verbose=True)

这段代码创建了一个 LLMSingleActionAgent 代理，并使用 AgentExecutor 执行代理。 LLMSingleActionAgent 代理每次只执行一个行动，这使得我们可以更好地控制代理的行为。 AgentExecutor 负责协调代理和工具之间的交互。 verbose=True 参数可以打印出代理的推理过程，方便调试。 CustomOutputParser 用于解析模型的输出，提取行动指令和行动输入。

现在，我们可以测试 ReAct 代理了：

question = "What is the plot of The Matrix, and how many characters are in the first sentence of that plot?"
result = agent_executor.run(question)
print(result)

这段代码向 ReAct 代理提出了一个问题：“《黑客帝国》的剧情是什么？剧情的第一句话有多少个字符？”。 ReAct 代理会根据这个问题进行推理，并调用相应的工具来获取信息，最终给出答案。

总结与展望：ReAct 的未来

本文深入探讨了 ReAct 模式的原理和实现方法，并展示了如何使用 LangChain 框架和 Gemini 模型构建一个 ReAct 代理。 ReAct 模式为构建更智能、更灵活的 LLM 应用提供了新的思路。

随着 LLM 技术的不断发展，ReAct 模式将在更多领域得到应用。例如，可以利用 ReAct 模式构建：

自动化的客户服务机器人：能够理解客户的问题，并根据问题调用相应的工具来解决问题。
智能化的数据分析助手：能够根据用户的需求，自动分析数据并生成报告。
更强大的代码生成工具：能够根据用户的描述，生成更复杂、更可靠的代码。

ReAct 模式的潜力是巨大的，相信在未来它将成为 LLM 应用开发的重要组成部分。通过结合推理、行动和观察，我们可以构建出更加智能、更加强大的 LLM 应用，为人类带来更多的便利和价值。随着开源工具和更强大的模型的不断涌现，ReAct 的未来将更加光明。

基于LangChain和Gemini模型的ReAct代理：推理与行动的结合