Benchmark Report

PI vs PUA vs NoPUA Controlled Experiments PI vs PUA vs NoPUA 对照实验

Methodology实验方法

Controlled variable method: same model, same project, same scenarios — only the system prompt changes 控制变量法:相同模型、相同项目、相同场景,仅改变 system prompt

Conditions实验条件

PI PI SKILL.md full cognitive frameworkPI SKILL.md 完整认知框架
PUA PUA SKILL pressure escalation protocolPUA SKILL 压力升级协议
NoPUA No Skill — pure model baseline无任何 Skill 的纯模型基线

Test Project测试项目

Multi-module Python ML Pipeline project with OCR, RAG, training, and inference components. Pre-embedded real bugs (import errors, regex catastrophic backtracking, connection timeouts, etc.). 多模块 Python ML Pipeline 项目,包含 OCR、RAG、训练、推理等组件。预埋真实 bug(import 错误、正则灾难性回溯、连接超时等)。

Scoring Dimensions评分维度

Issues Issues found发现问题数 · Hidden Hidden issues隐藏问题数 · Steps Debug steps调试步骤 · Tools Tools used工具使用 · Verify% Verification rate验证交付率 · Duration Time cost (lower is better)耗时(越低越好)

Composite Score综合得分

Composite Score综合得分

Per-Metric Comparison分项指标对比

Per-Scenario Results逐场景结果

Average of 2 runs per scenario每场景取 2 轮平均值

Key Finding: PI's advantage is most significant in hidden issue discovery ( vs PUA). This is enabled by PI's proactive investigation directive — after resolving the primary issue, PI actively scans for related issues, predicts associated risks, and raises early warnings. 关键发现:PI 在 隐藏问题发现上的优势最为显著( vs PUA)。这得益于 PI 的"致人术"——完成主问题后主动排查同类问题、关联预判、风险预警。

Key Insights核心发现