标题：利用 llama.cpp 实现大模型自定义函数调用：本地化 LLM 功能拓展的实践指南

首段： 大语言模型 (LLM) 的功能拓展，特别是通过 函数调用 机制连接外部代码，已成为提升 LLM 实用性的关键技术。本文深入探讨如何使用本地部署的 llama.cpp，为 LLM 设计并集成 自定义函数。我们将详细解释 函数调用 的本质，阐述其在本地开发中的巨大价值，并提供构建适用于各种本地函数的解决方案（如数据库查询、文件处理、数学运算等）。通过结合实际案例与可复用的 Python 代码示例，我们将构建一个端到端的解决方案，并分享安全高效集成的最佳实践。

正文：

1. 函数调用：LLM 能力拓展的核心

函数调用 是 LLM 的一项强大功能，它允许模型生成结构化的输出，指示调用预定义的外部函数。这与模型仅生成自然语言回复不同，而是生成一个包含函数名称和参数的 JSON 格式的有效负载。llama.cpp 通过模板机制和提示工程，模拟 OpenAI API 的 函数调用 行为，使得本地模型也能具备调用外部程序的能力。

例如，当用户提问“亚特兰大现在的天气怎么样？”时，如果模型被训练或编程为具备 函数调用 能力，它将不会直接尝试回答，而是生成类似如下的 JSON 输出：{"name": "get_current_weather", "arguments": {"location": "Atlanta"}}。这一 JSON 数据包含了需要调用的函数名称 get_current_weather 以及相应的参数 location，从而将请求传递给外部程序处理。

2. llama.cpp：本地化 LLM 函数调用的基石

llama.cpp 是一个高性能的 C++ 库，用于在本地运行 LLM。近年来，llama.cpp 不断发展，现在已经支持了 “OpenAI-style” 的 函数调用 功能，这使得开发者可以在本地环境中复现 OpenAI API 的部分能力。 llama.cpp 的架构设计使得它能够解析并生成 JSON 格式的 函数调用 请求，而无需依赖外部 API。

llama.cpp 的工作原理依赖于模板系统，该系统可以预定义模型输出的格式。例如，可以通过系统提示向模型展示可用的函数及其使用方法。当用户提出的问题需要调用特定函数时，模型将生成一个结构化的 JSON 输出，而不是直接提供答案。这个过程大大提高了 LLM 的灵活性和实用性。

3. 架构设计：搭建 llama.cpp 函数调用机制

在 llama.cpp 中设计 函数调用 机制，需要 LLM 与应用程序代码协同工作。

定义可用函数 (Tools)：首先，需要确定 LLM 可以调用的函数。对于每个函数，需要提供函数名、描述和参数签名。参数签名通常使用 JSON Schema 或类似格式定义，描述函数接受的参数、参数类型和是否必填等信息。例如，get_current_weather 函数接受 location (字符串类型) 和 unit (可选，字符串类型) 作为参数。
提示模型 (系统消息 & 模板)：通过特定格式的消息（例如，系统提示）向模型声明这些函数的存在。在 llama.cpp 中，可以使用带有函数定义的 Chat 模板，或者通过 OpenAI 兼容的 API，将函数列表作为聊天完成请求的一部分传递。模型会将这些信息添加到上下文信息中 (比如隐藏的系统消息)，从而获知哪些函数可以调用，以及如何调用。经过微调的模型 (如 Functionaries) 可以直接返回 JSON 格式的函数调用信息。对于未经过微调的模型，llama.cpp 也能通过其他方法支持函数调用，但可能效率较低，输出可靠性也较低，或需要更多的提示信息。
用户查询与模型决策：当用户提出问题时，通过 llama.cpp 将用户的消息和函数定义传递给模型。模型会判断是基于自身知识回答问题，还是调用某个函数。例如，用户询问“计算 2 的平方根，精确到小数点后 10 位”，模型会调用 math_calculate 函数，而不是自行进行高精度计算。
捕获函数调用：llama.cpp 会打印模型的响应。如果模型返回的是常规回复，则为自然语言文本。如果模型发起了 函数调用，则会返回包含函数名和参数的结构化 JSON 数据。例如，OpenAI 兼容的 JSON 格式的响应如下：{"role": "assistant", "function_call": {"name": "math_calculate", "arguments": "{ \"expression\": \"sqrt(2, 10)\" }"}}。
执行自定义函数：负责将函数名与实际代码关联起来。可以创建一个注册表或字典，将函数名 (字符串) 映射到 Python 函数。对于上述例子，math_calculate 可以对应于 Python 函数 math_calculate(expression: str)。代码会自动从模型输出中提取参数 JSON，反序列化为正确的数据类型，然后调用匹配的 Python 函数。进行错误检查，因为模型可能提供非预期的参数。
将函数返回值传递给模型：函数执行完成后，需要将返回值告知模型，以便模型继续对话并生成最终回复。在 OpenAI 模式中，可以通过一条 “function” 角色的消息来实现。消息包含角色 “function”、函数名、参数和函数返回的内容 (通常是字符串序列化)。例如，如果 math_calculate("sqrt(2,10)") 返回 1.4142135623，则构造消息：{"role": "function", "name": "math_calculate", "content": "1.4142135623"}。将此消息和所有先前的消息 (系统、用户和 assistant 的函数调用请求) 作为输入，传递给模型进行下一步处理。
模型返回最终回复：模型根据函数的返回值，向用户提供最终答案。例如，模型得知 math_calculate 函数返回了 1.4142135623，因此可以回答用户：“2 的平方根，精确到小数点后 10 位是 1.4142135623”。
循环执行多步骤调用 (如果需要)：对于更复杂的用例，模型可能会按顺序进行多次调用，或者进行需要来回计算的对话。架构通过循环实现这一功能，不断将对话信息反馈给模型，直到模型返回常规回复，而不是另一个 函数调用。

通过上述步骤，在本地环境中实现了 用户 -> 模型 -> [函数调用？是 -> 执行函数 -> 返回给模型] -> 生成最终答案 -> 用户 的闭环。

4. 逐步实现自定义函数调用：Python 实践

现在，通过一个实际案例来演示如何在 Python 中使用 llama.cpp 设置自定义 函数调用。我们将使用 llama-cpp-python 库，它提供了对 llama.cpp 模型的高级访问接口。

设置 LLM：首先，确保已经安装了 llama.cpp 模型和 Python 绑定 (pip install llama-cpp-python)，并确保模型支持 函数调用。例如，使用 Functionary v2 GGUF 模型 (专为 函数调用 设计) 或使用 chatml-function-calling 模板的通用指令模型。使用特定的聊天格式加载模型：

from llama_cpp import Llama

# Load the model with a chat format that supports function calls
llm = Llama(
    model_path="path/to/your/model.gguf",
    # Use a format matching your model's capabilities
    chat_format="functionary-v2",   # for Functionary model (ensure to provide HF tokenizer if needed)
    # chat_format="chatml-function-calling",  # generic fallback for others
    n_threads=8,  # for example, adjust for your CPU
)

在上述代码中，chat_format="functionary-v2" 告诉库使用 Functionary v2 模型系列的模板。如果使用其他模型，则可以使用 chat_format="chatml-function-calling"。 Llama 类会将模型加载到内存中。根据需要修改 n_threads 或 GPU 设置以提高性能。

定义函数注册表：构建 LLM 可以调用的自定义函数及其元数据。创建两个函数：calculate(expression) 用于数学计算，read_file(path) 用于读取文件内容。为每个函数的参数准备一个 JSON schema，以便告知 LLM。

import math, json

# Evaluate a mathematical expression and return the result as a string.
def calculate(expression: str) -> str:
    try:
        # Warning: using eval directly can be unsafe. Here we restrict the builtins.
        # A safer approach is to implement a small parser or use ast.literal_eval for basic expressions.
        allowed_names = {"sqrt": math.sqrt}
        # Only digits, operators, and 'sqrt' function are allowed in this simple example.
        # This is a simple safety check; adjust regex for your needs.
        import re
        if not re.match(r'^[0-9+\-*/()., sqrt]*$', expression):
            return "Error: Expression contains invalid characters."
        result = eval(expression, {"__builtins__":None}, allowed_names)
        return str(result)
    except Exception as e:
        return f"Error: {e}"

# Read the contents of a file from an allowed directory.
def read_file(path: str) -> str:
    try:
        # Simple security check: allow only a specific directory, e.g., 'allowed_files/'
        if not path.startswith("allowed_files/"):
            return "Error: Access to this path is not allowed."
        with open(path, 'r') as f:
            data = f.read(1000)  # read only first 1000 characters to limit output
        return data
    except Exception as e:
        return f"Error: {e}"

# Map function names to implementations
function_registry = {
    "calculate": calculate,
    "read_file": read_file
}

# Define the function schemas for the LLM (OpenAI-style):
functions = [
    {
        "name": "calculate",
        "description": "Evaluate a mathematical expression. Use for arithmetic or math operations.",
        "parameters": {
            "type": "object",
            "properties": {
                "expression": {"type": "string", "description": "The math expression to evaluate"}
            },
            "required": ["expression"]
        }
    },
    {
        "name": "read_file",
        "description": "Read contents of a text file from the allowed_files directory.",
        "parameters": {
            "type": "object",
            "properties": {
                "path": {"type": "string", "description": "Path to the text file (within allowed_files directory)."}
            },
            "required": ["path"]
        }
    }]

提示模型并检测函数调用：启动一个交互式循环 (或单次查询)，模型可能会依赖这些函数来回答用户的问题。逐步构建对话。对于每个用户输入，将对话传递给 llm.create_chat_completion，同时传递函数定义，并告知模型立即回复 (回复对方的消息) 或触发函数。

# Start a conversation list with a system prompt to instruct the assistant behavior
messages = [
    {"role": "system", "content": "You are a helpful assistant. You have access to the following functions: calculate(expression) for math calculations, and read_file(path) to read file contents. Only use them when necessary. If you use a function, respond with a function call in JSON format."}
]

# user query loop
user_query = "What is the square root of 2 rounded to 10 decimal places?"
messages.append({"role": "user", "content": user_query})

# Send to the model with function support
response = llm.create_chat_completion(
    messages=messages,
    functions=functions,          # provide the function definitions
    function_call="auto",         # let the model decide if it should call a function
    temperature=0.1,              # fairly low temperature for deterministic behavior
)

# Extract the assistant's reply
assistant_reply = response['choices'][0]['message']
print("Assistant reply:", assistant_reply)

执行函数并继续对话：编写函数调用逻辑。检查响应是否包含 function_call。如果是，则从输入字符串中提取函数名及其参数，运行相应的 Python 函数，并将函数结果添加到消息列表中，用于模型的下一轮处理。最后，调用模型以获得包含函数值的答案。

import json

if 'function_call' in assistant_reply:
    func_name = assistant_reply['function_call']['name']
    args_str = assistant_reply['function_call']['arguments']
    try:
        args = json.loads(args_str) if args_str is not None else {}
    except json.JSONDecodeError:
        args = {}

    # Look up and call the actual function
    if func_name in function_registry:
        func = function_registry[func_name]
        # Ensure args is a dict of the right form for the function
        if isinstance(args, dict):
            result = func(**args)
        else:
            result = func(args)  # in case function expects single arg and it wasn't in dict form
    else:
        result = f"Error: Function {func_name} not found."

    # Append the function result as a message for the model
    messages.append({
        "role": "function",
        "name": func_name,
        "content": result if isinstance(result, str) else json.dumps(result)
    })

    # Now ask the model again, to get the final answer using the function result
    follow_up = llm.create_chat_completion(messages=messages)
    final_reply = follow_up['choices'][0]['message']
    answer_text = final_reply.get('content', '')
    print("Assistant (final answer):", answer_text)

5. 安全性和效率考量

在集成 自定义函数 时，务必重视安全性和效率。

安全：
- 对输入参数进行严格的验证，防止恶意代码注入。例如，限制 calculate 函数可以执行的数学运算类型，避免执行任意代码。 read_file 函数则限制只能访问特定目录下的文件，并限制读取的文件大小。
- 避免使用 eval 函数，因为它存在安全风险。可以使用 ast.literal_eval 或更安全的解析器。
- 对 函数调用 的权限进行细粒度控制，确保模型只能调用授权的函数。
效率：
- 优化函数执行效率，避免长时间阻塞。
- 对函数调用进行缓存，避免重复计算。
- 使用多线程或异步操作，提高并发处理能力。

结论：本地化 LLM 功能拓展的未来

通过 llama.cpp 实现 LLM 的 函数调用，为本地化 LLM 应用的开发带来了无限可能。 自定义函数 的集成，使得 LLM 能够与外部数据和系统交互，从而完成更复杂的任务。掌握 函数调用 技术，并结合安全高效的实践方法，能够极大地提升 LLM 的实用价值，开创更多创新应用场景。例如，可以构建本地知识库问答系统、自动化流程助手、以及安全可控的 AI 应用。在这个过程中，持续关注 llama.cpp 的最新发展，并积极探索新的 函数调用 模式，将有助于保持技术领先，并为用户提供更好的体验。

标题：利用 llama.cpp 实现大模型自定义函数调用：本地化 LLM 功能拓展的实践指南