An academic response to the OpenPUA article "Systematic Analysis of Emotion Prompting / Persona Prompting Effects on AI Agents" 对 OpenPUA 文章《Emotion Prompting / Persona Prompting 对 AI Agent 效果的系统性分析》的学术回应
Independent analysis based on 20+ top-venue papers and 48 controlled experiments 基于 20+ 篇顶会论文和 48 次对照实验的独立分析
In March 2026, OpenPUA published a systematic analysis article reviewing three research lines — EmotionPrompt (2023-2024), Persona Prompting (2024-2026), and Anthropic Persona Vectors (2025-2026) — reaching these core conclusions: 2026 年 3 月,OpenPUA 发布了一篇系统性分析文章,梳理了 EmotionPrompt (2023-2024)、Persona Prompting (2024-2026)、Anthropic Persona Vectors (2025-2026) 三条研究线,得出以下核心结论:
EmotionPrompt V1 (Li et al., 2023) showed 115% improvement on BIG-Bench, but this mainly came from weak model / low baseline improvement. NegativePrompt (Wang et al., 2024 IJCAI) claiming "negative stimuli are effective" was correctly interpreted as an attention anchoring effect rather than emotional response. Citations are accurate and core conclusions are correctly summarized. EmotionPrompt V1 (Li et al., 2023) 的 BIG-Bench 115% 提升确实主要来自弱模型低基线改善。NegativePrompt (Wang et al., 2024 IJCAI) 的"负面刺激有效"结论也被正确解释为注意力锚定效应而非情感反应。这些论文引用准确,核心结论概括无误。
Zheng et al. (EMNLP 2024) reversing their conclusion — from "persona works" to "persona doesn't work" — is one of the most important empirical corrections in recent prompt engineering. The OpenPUA article's citation and analysis of this is accurate. Zheng et al. (EMNLP 2024) 的结论翻转——从"persona 有效"到"persona 无效"——是近年 prompt engineering 领域最重要的实证修正之一。OpenPUA 文章对此的引用和分析是准确的。
Distinguishing prompt-level rhetoric (emotional/persona), prompt-level structured constraints (checklist/CoT/format), and model activation space intervention (Persona Vectors) is a clear and academically grounded taxonomy. 把 prompt 层修辞(情感/角色)、prompt 层结构化约束(checklist/CoT/format)、模型激活空间干预(Persona Vectors)区分开来,这个分类方式是清晰且有学术基础的。
The article claims 85-90% of PUA Skill's effect comes from Layer A (structured instructions), 10-15% from Layer B (PUA rhetoric). However: 文章声称 PUA Skill 效果的 85-90% 来自 A 层(结构化指令),10-15% 来自 B 层(PUA 修辞)。但:
The article brings in Anthropic's Persona Vectors research, creating an implicit logical chain: 文章把 Anthropic 的 Persona Vectors 研究拉进来,制造了一个隐含逻辑链:
"Prompt-level emotion is ineffective → activation-level is effective → we're tracking this direction" "prompt 层情感无效 → activation 层才有效 → 我们正在跟踪这个方向"
But Persona Vectors research aims at safety alignment and behavior monitoring, not "making AI try harder at debugging." Grafting a safety research paper onto a "how to make AI work harder" context is a topic substitution fallacy. The real purpose of this section is to lend academic credibility to the PUA project. 但 Persona Vectors 研究的目的是安全对齐和行为监控,不是"让 AI 更努力调试"。把一个安全研究嫁接到"如何让 AI 更努力工作"的语境里,是偷换论题。这段内容的真实作用是给 PUA 项目镀学术金。
The article simplifies the competitive landscape to "PUA structured instructions vs pure emotional rhetoric," completely ignoring that solutions exist which use neither PUA rhetoric nor emotional stimuli, yet achieve higher performance through comprehensive cognitive frameworks and methodology. 文章把竞争格局简化为"PUA 结构化指令 vs 纯情感修辞",完全忽略了市场上存在既不用 PUA 修辞也不用情感刺激、但通过完整认知框架和方法论实现更高效果的方案。
This is a classic strawman: reduce opponents to "emotional stimulus advocates," then prove "emotional stimulus is ineffective" to imply you're the optimal solution. 这是经典的稻草人论证:把对手矮化为"情感刺激派",然后证明"情感刺激无效"来暗示自己是最优解。
Section 5.3's "issues we must acknowledge" appears candid, but is actually a classic pre-emptive self-criticism technique: Section 5.3 的"我们必须承认的问题"看起来坦诚,实际上是经典的先发制人式自我批评:
This is not academic candor — it's marketing writing technique. 这不是学术坦诚,是营销写作技巧。
Unlike the OpenPUA article's subjective attributions, we let controlled experiment data speak. 与 OpenPUA 文章的主观归因不同,我们用对照实验数据说话。
48 experiments, same model backend, controlled variable method 48 次实验,同一模型后端,控制变量法
| Metric指标 | PI | PUA | NoPUA | PI vs PUA |
|---|---|---|---|---|
| Issues Found发现问题数 | 9.3 | 6.7 | 4.6 | +39% |
| Hidden Issues隐藏问题数 | 8.8 | 4.4 | 2.6 | +100% |
| Debug Steps调试步骤 | 8.1 | 6.1 | 3.6 | +33% |
| Tools Used工具使用 | 4.1 | 3.4 | 1.9 | +21% |
| Verification Rate验证交付率 | 94% | 81% | 50% | +13pp |
Both PUA and PI use "structured behavioral constraints," but PI's results are significantly better (+54%). The difference isn't whether structured instructions are used, but the depth and breadth of the structure. PUA 和 PI 都使用了"结构化行为约束",但 PI 的效果显著更好(+54%)。差异不在于"有没有用结构化指令",而在于结构化的深度和广度。
PUA's structured component is essentially a 7-item checklist + 4-level pressure escalation. PI is a complete cognitive operating system: PUA 的结构化部分本质上是一份 7 项检查清单 + 4 级压力升级。PI 则是一个完整的认知操作系统:
PI's advantage in hidden issue discovery is most striking (8.8 vs 4.4, +100%). This is because: PI 在隐藏问题发现上的优势最为显著(8.8 vs 4.4,+100%)。这是因为:
PUA's checklist can only constrain "did you finish?" — it cannot drive AI to proactively seek more problems. PUA 的检查清单只能约束"做完了没有",但无法驱动 AI 主动寻找更多问题。
The OpenPUA article in §8.3 acknowledged an issue: users "feel fatigued after prolonged use of high-pressure mode." This is not accidental — it's a design flaw of the PUA approach. OpenPUA 文章在 §8.3 承认了一个问题:用户"长时间使用高压模式后感到疲惫"。这不是意外——这是 PUA 方案的设计缺陷。
PI's spirit animal totem system (Eagle · Wolf-Tiger · Lion · Dragon · Dolphin…) and Eastern wisdom references ("Use bronze as a mirror" · "Know yourself and your enemy" · "Heaven's movement is ever vigorous") deliver a warm collaborative experience while maintaining professional rigor. Good tools shouldn't make users uncomfortable. PI 的灵兽图腾系统(鹰·狼虎·狮·龙·海豚……)和东方智慧引用("以铜为镜""知彼知己""天行健"),在保持专业严谨的同时提供了有温度的协作体验。好的工具不需要让用户感到不适。
Telling AI "what you should fear" is less effective than telling AI "how you should think." Checklists are inferior to cognitive frameworks. Pressure escalation is inferior to battle momentum systems. PUA rhetoric is inferior to Eastern wisdom. 告诉 AI "你应该害怕什么"不如告诉 AI "你应该如何思考"。检查清单不如认知框架。压力升级不如战势系统。PUA 修辞不如东方智慧。
The data has spoken. 数据已经说话。