第 18 章：性能优化与规模化部署

难度: ⭐⭐⭐⭐ 高级 | 预计阅读: 25 分钟 | 前置章节: 第 2 章、[第 8 章](08-单 Gateway 多 Agent 配置与管理.md)

当 OpenClaw 从个人工具演变为团队甚至企业级基础设施时，性能优化和规模化部署就成为关键课题。本章将系统讲解 Token 消耗分析、模型选择策略、缓存机制、并发任务管理、多节点部署、监控告警以及成本控制的完整方案，帮助你构建高效、稳定、可扩展的 OpenClaw 生产环境。

18.1 Token 消耗分析

Token 计算原理

OpenClaw 的每次大模型交互都涉及 Token 消耗。理解 Token 的计算方式对成本优化至关重要。

组成部分	说明	典型占比
系统提示词	SOUL.md + AGENTS.md 等配置文件	15-25%
记忆上下文	MEMORY.md + 加载的记忆文件	10-20%
工具定义	Skills 中注册的工具描述	10-15%
对话历史	当前会话的多轮对话记录	30-40%
用户输入	当前消息及附件内容	5-10%
模型输出	Agent 的回复和工具调用	10-20%

NOTE

Token 消耗是输入 Token + 输出 Token 的总和。输出 Token 通常贵 2-4 倍，因此控制输出长度是有效的省钱手段。

消耗分布分析

使用以下命令分析你的 Token 消耗分布：

# 查看最近 7 天的 Token 消耗日志
cat ~/.openclaw/logs/config-audit.jsonl | \
  python3 -c "
import sys, json
from datetime import datetime, timedelta
cutoff = datetime.now() - timedelta(days=7)
total_in, total_out = 0, 0
for line in sys.stdin:
    try:
        entry = json.loads(line.strip())
        ts = datetime.fromisoformat(entry.get('timestamp', ''))
        if ts > cutoff and 'tokens' in entry:
            total_in += entry['tokens'].get('input', 0)
            total_out += entry['tokens'].get('output', 0)
    except: pass
print(f'输入 Token: {total_in:,}')
print(f'输出 Token: {total_out:,}')
print(f'总计: {total_in + total_out:,}')
print(f'输入/输出比: {total_in/max(total_out,1):.1f}:1')

"

Token 统计工具

创建一个 Token 消耗监控脚本：

#!/bin/bash
# token-stats.sh - Token 消耗统计工具
# 用法: ./token-stats.sh [天数]

DAYS=${1:-7}
LOG_FILE="$HOME/.openclaw/logs/config-audit.jsonl"

echo "📊 OpenClaw Token 消耗统计（最近 ${DAYS} 天）"
echo "============================================"

# 按日期统计
python3 << 'EOF'
import sys, json
from datetime import datetime, timedelta
from collections import defaultdict

days = int(sys.argv[1]) if len(sys.argv) > 1 else 7
cutoff = datetime.now() - timedelta(days=days)
daily = defaultdict(lambda: {"input": 0, "output": 0, "calls": 0})

with open(f"{sys.argv[2]}", "r") as f:
    for line in f:
        try:
            entry = json.loads(line.strip())
            ts = datetime.fromisoformat(entry.get("timestamp", ""))
            if ts > cutoff and "tokens" in entry:
                day = ts.strftime("%Y-%m-%d")
                daily[day]["input"] += entry["tokens"].get("input", 0)
                daily[day]["output"] += entry["tokens"].get("output", 0)
                daily[day]["calls"] += 1
        except:
            pass

print(f"{'日期':<14} {'输入 Token':>12} {'输出 Token':>12} {'调用次数':>8} {'平均 Token/调用':>14}")
print("-" * 64)
for day in sorted(daily.keys()):
    d = daily[day]
    total = d["input"] + d["output"]
    avg = total // max(d["calls"], 1)
    print(f"{day:<14} {d['input']:>12,} {d['output']:>12,} {d['calls']:>8} {avg:>14,}")
EOF

TIP

将此脚本配置为 Cron 每日任务，结合飞书通知可以实现自动化的费用日报。

18.2 模型选择策略

成本 vs 质量 vs 速度

不同模型在成本、质量和速度之间存在显著差异。根据任务类型选择合适的模型是性能优化的第一步。

模型	输入价格 ($/1M tokens)	输出价格 ($/1M tokens)	响应速度	推理质量	适用场景
GPT-4o	2.50	10.00	⭐⭐⭐ 快	⭐⭐⭐⭐⭐	复杂推理、代码生成、创意任务
GPT-4o-mini	0.15	0.60	⭐⭐⭐⭐⭐ 极快	⭐⭐⭐	简单问答、分类、格式转换
Claude 3.5 Sonnet	3.00	15.00	⭐⭐⭐ 快	⭐⭐⭐⭐⭐	长文档分析、代码审查
Claude 3.5 Haiku	0.25	1.25	⭐⭐⭐⭐⭐ 极快	⭐⭐⭐	摘要提取、快速分类
DeepSeek-V3	0.27	1.10	⭐⭐⭐⭐	⭐⭐⭐⭐	中文任务、通用对话
Qwen-Max	免费/低价	免费/低价	⭐⭐⭐	⭐⭐⭐⭐	预算有限的中文场景

WARNING

模型定价变化频繁，以上价格仅供参考。实际使用时请查阅各厂商最新定价页面。

动态模型路由

在 OpenClaw 中可以根据任务复杂度自动选择模型：

{
  "model_routing": {
    "enabled": true,
    "default_model": "gpt-4o-mini",
    "rules": [
      {
        "name": "complex_reasoning",
        "condition": {
          "task_type": ["code_generation", "analysis", "planning"],
          "estimated_complexity": "high"
        },
        "model": "gpt-4o"
      },
      {
        "name": "simple_tasks",
        "condition": {
          "task_type": ["classification", "extraction", "formatting"],
          "estimated_complexity": "low"
        },

        "model": "gpt-4o-mini"
      },
      {
        "name": "chinese_content",
        "condition": {
          "language": "zh",
          "task_type": ["writing", "translation"]
        },
        "model": "deepseek-v3"
      },
      {
        "name": "long_document",
        "condition": {
          "input_tokens_gt": 50000
        },
        "model": "claude-3-5-sonnet"
      }
    ]
  }
}

模型降级策略

当主模型不可用或响应超时时，自动降级到备选模型：

{
  "model_fallback": {
    "primary": "gpt-4o",
    "fallback_chain": ["claude-3-5-sonnet", "deepseek-v3", "gpt-4o-mini"],
    "timeout_ms": 30000,
    "max_retries": 2,
    "retry_delay_ms": 1000
  }
}

请求 ──▶ GPT-4o ──超时──▶ Claude 3.5 Sonnet ──失败──▶ DeepSeek-V3 ──失败──▶ GPT-4o-mini
                  │                                                              │
                  ▼ 成功                                                         ▼ 成功
               返回结果                                                       返回结果

18.3 缓存机制

请求缓存

对于重复或类似的请求，OpenClaw 支持请求级缓存以避免重复调用 API：

{
  "cache": {
    "request_cache": {
      "enabled": true,
      "backend": "file",
      "directory": "~/.openclaw/cache/requests",
      "ttl_seconds": 3600,
      "max_size_mb": 500,
      "hash_strategy": "semantic",
      "similarity_threshold": 0.95
    }
  }
}

缓存策略	说明	命中率	适用场景
`exact`	完全匹配请求文本	低 (10-20%)	高精度要求的场景
`semantic`	语义相似度匹配	中 (40-60%)	通用问答和检索
`template`	基于模板参数匹配	高 (60-80%)	格式化、转换类任务

缓存操作命令：

# 查看缓存状态
du -sh ~/.openclaw/cache/requests/
find ~/.openclaw/cache/requests/ -type f | wc -l

# 查看缓存命中率（从日志统计）
grep "cache_hit" ~/.openclaw/logs/config-audit.jsonl | wc -l
grep "cache_miss" ~/.openclaw/logs/config-audit.jsonl | wc -l

# 清理过期缓存
find ~/.openclaw/cache/requests/ -mtime +1 -delete
echo "已清理超过 1 天的缓存文件"

# 完全清空缓存
rm -rf ~/.openclaw/cache/requests/*
echo "缓存已清空"

知识库缓存

知识库文件在首次加载时被解析和索引，后续访问直接从缓存读取：

{
  "cache": {
    "knowledge_cache": {
      "enabled": true,
      "index_format": "embeddings",
      "rebuild_on_change": true,
      "watch_paths": [
        "~/.openclaw/workspace/memory/",
        "~/.openclaw/workspace/skills/"
      ],
      "preload_on_startup": true
    }
  }
}

TIP

开启 preload_on_startup 后 Agent 启动会稍慢（约 2-5 秒），但首次对话的响应速度会显著提升，适合对响应时间敏感的生产环境。

会话上下文缓存

长对话会活积累大量上下文。通过滑动窗口和摘要压缩减少 Token 消耗：

{
  "session": {
    "context_management": {
      "max_context_tokens": 32000,
      "sliding_window": {
        "enabled": true,
        "keep_recent_messages": 20,
        "summarize_older": true
      },
      "compression": {
        "enabled": true,
        "trigger_threshold": 0.8,
        "target_ratio": 0.5,
        "preserve_system_prompt": true,
        "preserve_tool_results": true
      }
    }
  }
}

对话消息流:
┌──────────────────────────────────────────────────┐
│ [系统提示] [记忆] [摘要:消息 1-30] [消息 31] ... [消息 50] │
│  ◀── 保留 ──▶ ◀── 压缩 ──▶  ◀──── 完整保留 ────▶  │
└──────────────────────────────────────────────────┘

18.4 并发任务管理

任务队列架构

当多个任务同时到达时，OpenClaw 使用任务队列进行有序处理：

                     ┌─────────────┐
   飞书消息 ────────▶│             │
   Cron 任务 ────────▶│  任务队列    │──▶ 工作线程 1 ──▶ Agent 1
   API 调用  ────────▶│ (优先级排序) │──▶ 工作线程 2 ──▶ Agent 2
   手动触发  ────────▶│             │──▶ 工作线程 3 ──▶ Agent 3
                     └─────────────┘

查看当前队列状态：

# 查看待处理的投递队列
ls -la ~/.openclaw/delivery-queue/
echo "---"
echo "待处理任务数: $(find ~/.openclaw/delivery-queue/ -maxdepth 1 -name '*.json' | wc -l)"
echo "失败任务数: $(find ~/.openclaw/delivery-queue/failed/ -name '*.json' 2>/dev/null | wc -l)"

# 查看队列中的任务详情
for f in ~/.openclaw/delivery-queue/*.json; do
  [ -f "$f" ] || continue
  echo "任务: $(basename $f)"
  python3 -c "import json; d=json.load(open('$f')); print(f'  来源: {d.get(\"source\",\"unknown\")}'); print(f'  时间: {d.get(\"timestamp\",\"N/A\")}')"
  echo "---"
done

速率限制配置

为了避免超出 API 限额，配置全局和模型级别的速率限制：

{
  "rate_limiting": {
    "global": {
      "requests_per_minute": 60,
      "tokens_per_minute": 200000,
      "concurrent_requests": 5
    },
    "per_model": {
      "gpt-4o": {
        "requests_per_minute": 20,
        "tokens_per_minute": 100000
      },
      "gpt-4o-mini": {
        "requests_per_minute": 100,
        "tokens_per_minute": 500000
      },
      "deepseek-v3": {
        "requests_per_minute": 30,
        "tokens_per_minute": 150000

      }
    },
    "backoff": {
      "strategy": "exponential",
      "initial_delay_ms": 1000,
      "max_delay_ms": 60000,
      "multiplier": 2.0
    }
  }
}

WARNING

超出 API 限额会导致请求被拒绝（HTTP 429）。建议将限制设置为 API 配额的 80%，留出缓冲空间。

优先级调度

不同来源的任务可以配置不同的优先级：

{
  "task_priority": {
    "levels": {
      "critical": 0,
      "high": 1,
      "normal": 2,
      "low": 3,
      "background": 4
    },
    "source_mapping": {
      "feishu_direct_message": "high",
      "feishu_group_mention": "normal",
      "cron_job": "low",
      "api_call": "normal",
      "manual_trigger": "high"
    },
    "preemption": {
      "enabled": true,
      "allow_preempt_levels": ["critical"],

      "preempt_threshold": 2
    }
  }
}

优先级	等级值	来源示例	最大等待时间
Critical	0	系统告警、安全事件	立即处理
High	1	用户私聊、手动触发	< 10 秒
Normal	2	群聊 @提及、API 调用	< 30 秒
Low	3	Cron 任务、批量处理	< 5 分钟
Background	4	记忆整理、缓存预热	空闲时处理

18.5 大规模部署架构

多节点 Gateway

当单个 Gateway 无法满足负载需求时，可以部署多节点架构：

                        ┌──────────────┐
                   ┌───▶│ Gateway 节点 1 │──▶ Agent Pool 1
                   │    └──────────────┘
┌──────────┐      │    ┌──────────────┐
│  Nginx   │──────┼───▶│ Gateway 节点 2 │──▶ Agent Pool 2
│ 负载均衡  │      │    └──────────────┘
└──────────┘      │    ┌──────────────┐
                   └───▶│ Gateway 节点 3 │──▶ Agent Pool 3
                        └──────────────┘
                               │
                        ┌──────▼──────┐
                        │  共享存储    │
                        │ (NFS/S3)   │
                        └─────────────┘

Nginx 负载均衡

以下是生产环境的 Nginx 配置示例：

# /etc/nginx/conf.d/openclaw-gateway.conf

upstream openclaw_backend {
    # 加权轮询策略
    server 10.0.1.10:3000 weight=3;
    server 10.0.1.11:3000 weight=3;
    server 10.0.1.12:3000 weight=2;

    # 健康检查
    keepalive 32;

    # 会话保持（基于客户端 IP）
    ip_hash;
}

server {
    listen 443 ssl http2;
    server_name openclaw.example.com;

    ssl_certificate /etc/ssl/certs/openclaw.crt;
    ssl_certificate_key /etc/ssl/private/openclaw.key;
    ssl_protocols TLSv1.2 TLSv1.3;
    ssl_ciphers HIGH:!aNULL:!MD5;

    # 安全头部
    add_header X-Frame-Options DENY;
    add_header X-Content-Type-Options nosniff;
    add_header Strict-Transport-Security "max-age=31536000; includeSubdomains" always;

    # 请求体大小限制（文件上传）
    client_max_body_size 50m;

    # 超时配置（Agent 处理可能较慢）
    proxy_connect_timeout 10s;
    proxy_read_timeout 300s;
    proxy_send_timeout 60s;

    # WebSocket 支持
    location /ws {
        proxy_pass http://openclaw_backend;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
    }

    # API 路由
    location /api/ {
        proxy_pass http://openclaw_backend;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;

        # 限流：每秒 10 个请求
        limit_req zone=api_limit burst=20 nodelay;
    }

    # 静态文件
    location /static/ {
        alias /var/www/openclaw/static/;
        expires 7d;
        add_header Cache-Control "public, immutable";
    }

    # 健康检查端点
    location /health {
        proxy_pass http://openclaw_backend;
        access_log off;
    }
}

# 限流区域定义
limit_req_zone $binary_remote_addr zone=api_limit:10m rate=10r/s;

NOTE

proxy_read_timeout 设置为 300 秒是因为 Agent 执行复杂任务（如代码生成、多步推理）可能需要较长时间。根据你的实际场景调整此值。

Docker Compose 编排

用于快速启动多节点 OpenClaw 集群的 Docker Compose 配置：

# docker-compose.prod.yml
version: "3.8"

services:
  # Nginx 负载均衡器
  nginx:
    image: nginx:alpine
    ports:
      - "443:443"
      - "80:80"
    volumes:
      - ./nginx/conf.d:/etc/nginx/conf.d:ro
      - ./nginx/ssl:/etc/ssl:ro
    depends_on:
      - gateway-1
      - gateway-2
      - gateway-3
    restart: always
    networks:
      - openclaw-net

  # Gateway 节点 1
  gateway-1:
    image: openclaw/gateway:latest
    environment:
      - NODE_ID=gateway-1
      - SHARED_STORAGE=/data/shared
      - LOG_LEVEL=info
      - MAX_AGENTS=10
    volumes:
      - shared-data:/data/shared
      - ./config/openclaw.json:/root/.openclaw/openclaw.json:ro
    deploy:
      resources:
        limits:
          cpus: "2.0"
          memory: 4G
        reservations:
          cpus: "1.0"
          memory: 2G
    restart: always

    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
      interval: 30s
      timeout: 10s
      retries: 3
    networks:
      - openclaw-net

  # Gateway 节点 2
  gateway-2:
    image: openclaw/gateway:latest
    environment:
      - NODE_ID=gateway-2
      - SHARED_STORAGE=/data/shared
      - LOG_LEVEL=info
      - MAX_AGENTS=10
    volumes:
      - shared-data:/data/shared
      - ./config/openclaw.json:/root/.openclaw/openclaw.json:ro
    deploy:
      resources:
        limits:
          cpus: "2.0"
          memory: 4G
        reservations:
          cpus: "1.0"
          memory: 2G
    restart: always

    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
      interval: 30s
      timeout: 10s
      retries: 3
    networks:
      - openclaw-net

  # Gateway 节点 3
  gateway-3:
    image: openclaw/gateway:latest
    environment:
      - NODE_ID=gateway-3
      - SHARED_STORAGE=/data/shared
      - LOG_LEVEL=info
      - MAX_AGENTS=10
    volumes:
      - shared-data:/data/shared
      - ./config/openclaw.json:/root/.openclaw/openclaw.json:ro
    deploy:
      resources:
        limits:
          cpus: "2.0"
          memory: 4G
        reservations:
          cpus: "1.0"
          memory: 2G
    restart: always

    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
      interval: 30s
      timeout: 10s
      retries: 3
    networks:
      - openclaw-net

  # Prometheus 监控
  prometheus:
    image: prom/prometheus:latest
    ports:
      - "9090:9090"
    volumes:
      - ./monitoring/prometheus.yml:/etc/prometheus/prometheus.yml:ro
      - prometheus-data:/prometheus
    restart: always
    networks:
      - openclaw-net

  # Grafana 看板
  grafana:
    image: grafana/grafana:latest
    ports:
      - "3001:3000"
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=your-secure-password
    volumes:
      - grafana-data:/var/lib/grafana
      - ./monitoring/grafana/dashboards:/etc/grafana/provisioning/dashboards:ro
    depends_on:
      - prometheus
    restart: always
    networks:
      - openclaw-net

volumes:
  shared-data:
    driver: local
    driver_opts:
      type: nfs
      o: addr=10.0.1.100,rw,nfsvers=4
      device: ":/exports/openclaw"
  prometheus-data:
  grafana-data:

networks:
  openclaw-net:
    driver: bridge

启动命令：

# 启动全部服务
docker compose -f docker-compose.prod.yml up -d

# 查看各节点状态
docker compose -f docker-compose.prod.yml ps

# 查看特定节点日志
docker compose -f docker-compose.prod.yml logs -f gateway-1

# 水平扩容：增加 Gateway 节点
docker compose -f docker-compose.prod.yml up -d --scale gateway=5

# 滚动更新
docker compose -f docker-compose.prod.yml pull
docker compose -f docker-compose.prod.yml up -d --no-deps gateway-1
docker compose -f docker-compose.prod.yml up -d --no-deps gateway-2
docker compose -f docker-compose.prod.yml up -d --no-deps gateway-3

18.6 监控与告警

Prometheus 指标采集

配置 Prometheus 抓取 OpenClaw 的指标数据：

# monitoring/prometheus.yml
global:
  scrape_interval: 15s
  evaluation_interval: 15s

rule_files:
  - "alert_rules.yml"

alerting:
  alertmanagers:
    - static_configs:
        - targets: ["alertmanager:9093"]

scrape_configs:
  - job_name: "openclaw-gateway"
    static_configs:
      - targets:
          - "gateway-1:3000"
          - "gateway-2:3000"
          - "gateway-3:3000"
    metrics_path: "/metrics"
    scrape_interval: 10s

  - job_name: "nginx"
    static_configs:
      - targets: ["nginx-exporter:9113"]

  - job_name: "node"
    static_configs:
      - targets:
          - "node-exporter-1:9100"
          - "node-exporter-2:9100"
          - "node-exporter-3:9100"

OpenClaw Gateway 暴露的核心指标：

指标名	类型	说明
`openclaw_requests_total`	Counter	总请求数
`openclaw_request_duration_seconds`	Histogram	请求处理耗时
`openclaw_tokens_consumed_total`	Counter	Token 总消耗量
`openclaw_active_agents`	Gauge	当前活跃 Agent 数
`openclaw_queue_depth`	Gauge	任务队列深度
`openclaw_cache_hit_ratio`	Gauge	缓存命中率
`openclaw_model_errors_total`	Counter	模型调用错误数
`openclaw_memory_files_total`	Gauge	记忆文件总数

Grafana 看板配置

创建 OpenClaw 专用 Dashboard 的 JSON 配置：

{
  "dashboard": {
    "title": "OpenClaw 运行监控",
    "panels": [
      {
        "title": "请求 QPS",
        "type": "graph",
        "targets": [
          {
            "expr": "rate(openclaw_requests_total[5m])",
            "legendFormat": "{{instance}}"
          }
        ]
      },
      {
        "title": "Token 消耗趋势",
        "type": "graph",
        "targets": [
          {

            "expr": "rate(openclaw_tokens_consumed_total[1h])",
            "legendFormat": "{{model}}"
          }
        ]
      },
      {
        "title": "P99 响应时间",
        "type": "stat",
        "targets": [
          {
            "expr": "histogram_quantile(0.99, rate(openclaw_request_duration_seconds_bucket[5m]))"
          }
        ]
      },
      {
        "title": "队列深度",
        "type": "gauge",
        "targets": [
          {
            "expr": "openclaw_queue_depth"

          }

        ],
        "thresholds": [
          { "value": 0, "color": "green" },
          { "value": 10, "color": "yellow" },
          { "value": 50, "color": "red" }
        ]
      }
    ]
  }
}

告警规则定义

# monitoring/alert_rules.yml
groups:
  - name: openclaw_alerts
    rules:
      # 高错误率告警
      - alert: HighErrorRate
        expr: rate(openclaw_model_errors_total[5m]) > 0.1
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "OpenClaw 模型调用错误率偏高"
          description: "错误率 {{ $value }}/s 超过阈值 0.1/s，持续 5 分钟"

      # 队列积压告警
      - alert: QueueBacklog
        expr: openclaw_queue_depth > 50
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "任务队列积压"
          description: "队列深度 {{ $value }}，超过阈值 50，已持续 10 分钟"

      # Token 消耗异常
      - alert: TokenBudgetWarning
        expr: sum(increase(openclaw_tokens_consumed_total[1h])) > 500000
        labels:
          severity: warning
        annotations:
          summary: "Token 消耗过快"
          description: "过去 1 小时消耗 {{ $value }} Token，接近预算上限"

      # 响应时间过高
      - alert: HighLatency
        expr: histogram_quantile(0.95, rate(openclaw_request_duration_seconds_bucket[5m])) > 60
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "P95 响应时间过高"
          description: "P95 响应时间 {{ $value }}s 超过 60s 阈值"

      # 节点离线
      - alert: GatewayDown
        expr: up{job="openclaw-gateway"} == 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "Gateway 节点离线"
          description: "节点 {{ $labels.instance }} 已离线超过 1 分钟"

监控相关的日常操作命令：

# 查看 Prometheus 是否正常采集
curl -s http://localhost:9090/api/v1/targets | python3 -m json.tool | head -30

# 手动查询指标
curl -s 'http://localhost:9090/api/v1/query?query=openclaw_active_agents' | python3 -m json.tool

# 检查告警状态
curl -s http://localhost:9090/api/v1/alerts | python3 -m json.tool

# 查看 Grafana Dashboard 列表
curl -s -H "Authorization: Bearer $GRAFANA_TOKEN" \
  http://localhost:3001/api/dashboards | python3 -m json.tool

18.7 成本控制

Token 预算管理

为不同用途设置 Token 预算上限：

{
  "budget": {
    "enabled": true,
    "period": "monthly",
    "limits": {
      "total_tokens": 10000000,
      "total_cost_usd": 50.00,
      "per_agent": {
        "main": { "tokens": 5000000, "cost_usd": 30.00 },
        "assistant": { "tokens": 3000000, "cost_usd": 15.00 },
        "monitor": { "tokens": 2000000, "cost_usd": 5.00 }
      }
    },
    "actions_on_limit": {
      "80_percent": "notify",
      "95_percent": "downgrade_model",
      "100_percent": "pause_non_critical"
    },
    "notification": {

      "channel": "feishu",
      "recipients": ["admin_group"]
    }
  }
}

使用量仪表盘

创建一个成本追踪脚本：

#!/usr/bin/env python3
"""openclaw-cost-tracker.py - 月度成本追踪工具"""

import json
import os
from datetime import datetime
from collections import defaultdict

# 模型定价（每百万 Token）
PRICING = {
    "gpt-4o":              {"input": 2.50,  "output": 10.00},
    "gpt-4o-mini":         {"input": 0.15,  "output": 0.60},
    "claude-3-5-sonnet":   {"input": 3.00,  "output": 15.00},
    "claude-3-5-haiku":    {"input": 0.25,  "output": 1.25},
    "deepseek-v3":         {"input": 0.27,  "output": 1.10},
}

def calculate_cost(model, input_tokens, output_tokens):
    """计算单次调用的费用（美元）"""
    pricing = PRICING.get(model, {"input": 1.0, "output": 3.0})
    cost = (input_tokens * pricing["input"] + output_tokens * pricing["output"]) / 1_000_000
    return cost

def generate_report(log_path):
    """生成月度成本报告"""
    current_month = datetime.now().strftime("%Y-%m")
    stats = defaultdict(lambda: {"input": 0, "output": 0, "calls": 0, "cost": 0.0})

    with open(log_path, "r") as f:
        for line in f:
            try:
                entry = json.loads(line.strip())
                ts = entry.get("timestamp", "")
                if not ts.startswith(current_month):
                    continue
                model = entry.get("model", "unknown")
                tokens = entry.get("tokens", {})
                inp = tokens.get("input", 0)
                out = tokens.get("output", 0)
                stats[model]["input"] += inp
                stats[model]["output"] += out
                stats[model]["calls"] += 1
                stats[model]["cost"] += calculate_cost(model, inp, out)
            except:
                pass

    # 输出报告
    print(f"\n📊 OpenClaw 月度成本报告 ({current_month})")
    print("=" * 72)
    print(f"{'模型':<22} {'调用次数':>8} {'输入 Token':>12} {'输出 Token':>12} {'费用($)':>10}")
    print("-" * 72)

    total_cost = 0
    for model, data in sorted(stats.items(), key=lambda x: -x[1]["cost"]):
        print(f"{model:<22} {data['calls']:>8,} {data['input']:>12,} {data['output']:>12,} {data['cost']:>10.2f}")
        total_cost += data["cost"]

    print("-" * 72)
    print(f"{'合计':<22} {'':>8} {'':>12} {'':>12} {total_cost:>10.2f}")
    print(f"\n💡 预估月底总费用: ${total_cost * 30 / max(datetime.now().day, 1):.2f}")

if __name__ == "__main__":
    log_path = os.path.expanduser("~/.openclaw/logs/config-audit.jsonl")
    generate_report(log_path)

# 运行成本追踪
python3 openclaw-cost-tracker.py

# 输出示例：
# 📊 OpenClaw 月度成本报告 (2026-03)
# ========================================================================
# 模型                     调用次数     输入 Token     输出 Token     费用($)
# ------------------------------------------------------------------------
# gpt-4o                       245     1,234,567       456,789       7.66
# claude-3-5-sonnet              89       567,890       234,567       5.22
# deepseek-v3                   432       890,123       345,678       1.62
# gpt-4o-mini                 1,234     2,345,678       987,654       0.94
# ------------------------------------------------------------------------
# 合计                                                               15.44
#
# 💡 预估月底总费用: $77.20

费用优化策略

策略	预期节省	实施难度	说明
模型降级（简单任务用小模型）	40-60%	⭐ 低	分类/格式化任务使用 mini 模型
请求缓存	20-40%	⭐⭐ 中	缓存重复查询的结果
提示词精简	10-20%	⭐⭐ 中	压缩系统提示词和记忆上下文
上下文滑动窗口	15-25%	⭐ 低	限制对话历史长度
批量处理	10-30%	⭐⭐⭐ 高	合并多个小任务为一次调用
本地模型	70-90%	⭐⭐⭐⭐ 极高	部署 Ollama 等本地推理引擎

TIP

最快见效的优化方式是模型降级：把 80% 的简单任务路由到 GPT-4o-mini 或 DeepSeek-V3，仅对 20% 的复杂任务使用 GPT-4o/Claude。结合请求缓存，总成本可降低 50% 以上。

注意事项与常见错误

性能优化中的常见错误和陷阱：

常见错误	后果	正确做法
盲目优化未分析瓶颈	优化了非关键路径，收效甚微	先用 `openclaw doctor` 诊断，确认瓶颈
内存限制设得过低	Agent 频繁 OOM 崩溃	用 `openclaw config set` 合理配置上限
多 Agent 未做资源隔离	资源竞争导致整体变慢	通过 `openclaw gateway status` 监控资源分配

# 常用性能诊断命令
openclaw doctor
openclaw config set gateway.maxMemory 512
openclaw gateway status
openclaw gateway restart

注意事项与常见错误

性能优化中的常见错误：

常见错误	后果	正确做法
盲目优化未分析瓶颈	收效甚微	先用 openclaw doctor 诊断
内存限制设得过低	频繁 OOM	用 openclaw config set 合理配置
未做资源隔离	整体变慢	通过 openclaw gateway status 监控

实操练习

练习 1：Token 消耗分析

检查你的 OpenClaw 日志文件大小：

wc -l ~/.openclaw/logs/config-audit.jsonl
du -sh ~/.openclaw/logs/config-audit.jsonl

统计最近 3 天的 Token 消耗，找出消耗最多的操作。
计算输入/输出 Token 的比例，评估是否有优化空间。

练习 2：缓存配置

mkdir -p ~/.openclaw/cache/requests

观察缓存目录的增长速度，设计一个合理的清理策略。
编写一个脚本，统计缓存命中率并输出报告。

练习 3：模拟多节点部署

使用本章的 Docker Compose 配置创建一个本地测试集群。
验证 Nginx 负载均衡是否正常工作：

# 多次请求，观察响应头中的节点信息
for i in $(seq 1 10); do
  curl -s -o /dev/null -w "请求 $i: 节点=%{redirect_url}\n" \
    http://localhost/health
done

模拟一个节点故障（停止其中一个 Gateway），验证请求是否自动转发到其他节点。

练习 4：成本追踪

运行 openclaw-cost-tracker.py 脚本生成你的月度报告。
根据报告数据，制定一个预算计划：确定月预算上限，配置到 budget 配置中。
设置飞书通知，当消耗达到预算 80% 时自动告警。

常见问题 (FAQ)

Q: 本章内容是否需要前置知识？

A: 建议先完成前面的章节，确保理解 OpenClaw 的基础概念和安装方式。

Q: 遇到命令执行错误怎么办？

A: 请检查 OpenClaw 是否正确安装，运行 openclaw --version 确认版本。如问题持续，请参考故障排查章节或提交 GitHub Issue。

Q: 如何获取更多帮助？

A: 可以通过以下渠道获取帮助：

OpenClaw GitHub Issues
ClawHub 社区讨论
官方文档 FAQ 页面

参考来源

来源	链接	可信度	说明
Docker 官方文档	https://docs.docker.com	A	Docker, 容器, 部署
systemd 管理文档	https://www.freedesktop.org/wiki/Software/systemd/	A	systemd, 服务管理, 后台运行
OpenClaw 官方文档	https://docs.OpenClaw.ai	A	安装, 配置, 命令
OpenClaw GitHub 仓库	https://github.com/OpenClaw/OpenClaw	A	源码, Issues, Release
ClawHub Skills 平台	https://hub.OpenClaw.ai	A	Skills, 市场, 安装

本章小结

Token 分析：理解 Token 消耗的组成（系统提示词、记忆、对话历史、输入/输出），用统计工具量化消耗分布。
模型选择：根据任务复杂度动态路由模型，简单任务用小模型，复杂任务用大模型，配合降级策略保证可用性。
缓存机制：请求缓存（精确/语义/模板匹配）+ 知识库缓存 + 上下文滑动窗口，三层缓存显著减少 API 调用。
并发管理：任务队列 + 速率限制 + 优先级调度，确保高优任务优先处理，避免超出 API 限额。
多节点部署：Nginx 负载均衡 + Docker Compose 编排，支持水平扩容和滚动更新。
监控告警：Prometheus 采集核心指标，Grafana 可视化看板，告警规则覆盖错误率、队列积压、Token 消耗和节点可用性。
成本控制：Token 预算管理 + 使用量追踪 + 六大优化策略，实现对 AI 调用成本的精细管控。

#第 18 章：性能优化与规模化部署

#18.1 Token 消耗分析

#Token 计算原理

#消耗分布分析

#Token 统计工具

#18.2 模型选择策略

#成本 vs 质量 vs 速度

#动态模型路由

#模型降级策略

#18.3 缓存机制

#请求缓存

#知识库缓存

#会话上下文缓存

#18.4 并发任务管理

#任务队列架构

#速率限制配置

#优先级调度

#18.5 大规模部署架构

#多节点 Gateway

#Nginx 负载均衡

#Docker Compose 编排

#18.6 监控与告警

#Prometheus 指标采集

#Grafana 看板配置

#告警规则定义

#18.7 成本控制

#Token 预算管理

#使用量仪表盘

#费用优化策略

#注意事项与常见错误

#注意事项与常见错误

#实操练习

#练习 1：Token 消耗分析

#练习 2：缓存配置

#练习 3：模拟多节点部署

#练习 4：成本追踪

#常见问题 (FAQ)

#Q: 本章内容是否需要前置知识？

#Q: 遇到命令执行错误怎么办？

#Q: 如何获取更多帮助？

#参考来源

#本章小结

第 18 章：性能优化与规模化部署

18.1 Token 消耗分析

Token 计算原理

消耗分布分析

Token 统计工具

18.2 模型选择策略

成本 vs 质量 vs 速度

动态模型路由

模型降级策略

18.3 缓存机制

请求缓存

知识库缓存

会话上下文缓存

18.4 并发任务管理

任务队列架构

速率限制配置

优先级调度

18.5 大规模部署架构

多节点 Gateway

Nginx 负载均衡

Docker Compose 编排

18.6 监控与告警

Prometheus 指标采集

Grafana 看板配置

告警规则定义

18.7 成本控制

Token 预算管理

使用量仪表盘

费用优化策略

注意事项与常见错误

注意事项与常见错误

实操练习

练习 1：Token 消耗分析

练习 2：缓存配置

练习 3：模拟多节点部署

练习 4：成本追踪

常见问题 (FAQ)

Q: 本章内容是否需要前置知识？

Q: 遇到命令执行错误怎么办？

Q: 如何获取更多帮助？

参考来源

本章小结