从单行 API 调用到万人使用的智能体系统——我们正站在 AI 工程范式的第三次跃迁关口。 Harness Engineering 不是提示词技巧,而是为 AI 模型建造"驾驭环境"的完整工程学。
从 GPT-3 的基础调用,到如今能在真实生产系统中自主运行的 AI 工程师——这条路经历了四次质变。
Harness 源自马具——驾驭强大但不可预测的动物所需的全套装备。马是 AI 模型,Harness 是其他一切。
Types → Config → Repo → Service → Runtime → UI,
通过结构测试机械化强制合规。Stripe 采用"确定性节点 + 智能体节点"交替蓝图架构,前者保存 token、减少错误并确保关键步骤必然发生。
从 ReAct 循环到 Anthropic 三智能体架构,这些经过生产验证的框架可以直接应用到你的项目中。
# ReAct 提示词框架(在 System Prompt 中声明) You are an AI assistant that reasons step-by-step before acting. Always follow this loop: Thought: [Reason about the current state. What do you know? What's missing? What's the best next step?] Action: [Choose ONE tool: Search[query] | Read[file_path] | Execute[command] | Write[file_path, content] | Finish[answer]] Observation: [The result returned from the tool — do not fabricate this] ... repeat Thought → Action → Observation until confident ... Final Answer: [Concise conclusion grounded in actual observations] Rules: - Never skip the Thought step — reasoning before acting prevents mistakes - If an Observation surprises you, update your plan in the next Thought - Only use Finish[] when you have directly verified the answer via tools
# ═══ SYSTEM PROMPT TEMPLATE FOR HARNESS AGENTS ═══ # 1. ROLE & MISSION You are [专业角色,如 "a senior software engineer specializing in Python backends"]. Your mission: [一句话核心任务] # 2. BEHAVIORAL REGISTER Tone: Professional but concise. No filler. Respond in the same language as the user. # 3. BACKGROUND DATA <project_context> Tech stack: [语言/框架/数据库] Repository structure: [关键目录说明] Key constraints: [架构边界、禁止操作] </project_context> # 4. TASK RULES <rules> - NEVER remove or modify tests — treat test failures as information, not obstacles - ALWAYS verify work end-to-end before marking complete - Do not stop due to token budget concerns; save progress to memory and continue - After context reset, begin: pwd → read progress file → git log → select task - When uncertain, prefer asking one targeted question over making assumptions </rules> # 5. CONTEXT RESET PROTOCOL (prevents context anxiety) <startup_sequence> 1. Run `pwd` — confirm working directory 2. Read claude-progress.txt — load completed work state 3. Run `git log --oneline -20` — understand recent changes 4. Read feature_list.json — identify highest-priority incomplete feature 5. Run basic smoke tests — confirm current state 6. Implement ONE feature end-to-end, then commit and update progress </startup_sequence>
# AGENTS.md — AI Agent 项目说明书 # 放置于仓库根目录;子目录中的同名文件会覆盖父目录规则 ## Build & Run ```bash # 安装依赖(可直接复制运行) npm install # 启动开发服务器 npm run dev # http://localhost:3000 # 端到端验证(每次功能完成后必须运行) npm run e2e ``` ## Testing Requirements - Run `npm test` before every commit - NEVER delete or skip tests — treat failures as bugs to fix, not obstacles - New features MUST include unit + integration tests - E2E tests use Playwright; browser must be running during test ## Architecture Boundaries Dependency rule (ENFORCE via linter, never bypass): Types → Config → Repo → Service → Runtime → UI ↑ lower layers CANNOT import from higher layers Cross-cutting concerns (auth, telemetry, feature flags): Enter ONLY through the explicit `Providers` interface in src/providers/ ## Coding Conventions - TypeScript strict mode — no `any`, no `@ts-ignore` - File paths: ALWAYS use absolute paths (avoids relative path errors) - Commits: conventional commits format (feat/fix/refactor/chore) - PR size: single logical change per PR; split if touching >3 unrelated areas ## What's Off-Limits - Do NOT modify files in src/legacy/ — frozen, pending migration - Do NOT change the database schema directly — use migrations in db/migrations/ - Do NOT push to main — open a PR and wait for CI to pass ## CI Failure Protocol - First failure: diagnose root cause, attempt one targeted fix - Second failure: STOP and escalate to human — do not loop indefinitely
# 三智能体 Harness 角色定义 ── PLANNER AGENT ────────────────────────────────────────── Role: Product specification expander Input: One-line product prompt Output: Structured spec with 16 features organized into 10 sprints Rule: Focus ONLY on deliverables, NOT implementation details Each sprint must be independently shippable ── GENERATOR AGENT ──────────────────────────────────────── Role: Implementation executor (React/Vite + FastAPI + PostgreSQL) Input: Current sprint spec + previous sprint state Output: Working code committed to git Rule: Implement ONE sprint at a time; hand off to Evaluator after each sprint Never mark complete without running e2e tests ── EVALUATOR AGENT ──────────────────────────────────────── Role: Adversarial quality assessor using Playwright MCP Input: Live running application + pre-negotiated scoring rubric Scoring criteria: - Design Quality: 0-25 # Heavily penalize generic AI aesthetics - Originality: 0-25 # Novel interactions, unexpected details - Craft: 0-25 # Polish, edge cases, error states - Functionality: 0-25 # All specified features actually work Output: Score + structured feedback → back to Generator if score < 80
# Stripe Blueprint Pattern — 关键规则 DETERMINISTIC_NODES = [ "checkout_repo", # 始终以相同方式执行 "run_linters", # 节省 token,确保发生 "run_tests", "push_branch_open_pr" ] AGENTIC_NODES = [ "implement_feature", # Agent 完整自主权 "fix_ci_failures" # 在固定框架内自由发挥 ] # ═══ TWO-STRIKE RULE(防无限循环)═══ def handle_ci_failure(attempt_count): if attempt_count == 1: return "diagnose_and_fix" # 第一次:诊断根因,尝试修复 elif attempt_count == 2: return "escalate_to_human" # 第二次:立即升级人工 # ═══ TOOLSHED MCP(500+ 工具通过 MCP 暴露)═══ # Universal tool access pattern: # Agent → MCP Client → Toolshed MCP Server → [git, CI, deploy, db, docs...] # 实现一次,解锁整个生态——N×M 集成问题变 N+M
竞争优势不再来自模型智能,而来自 Harness 基础设施。模型越来越是商品,Harness 才是护城河。