3
反对
1
收藏
举报
分享
以grill-me为参考基础,写推理SKILL
想必用过grill-me的大部分人,都会觉得它还是很好用的。
我也觉得这个技能不错,于是开始不定期研究和收集一些相关资料。
首先拆解了grill-me技能,得出要素:
- Interview
- plan or design
- until
- shared understanding
- branch, tree
然后我想打造一个元认知技能,参考上面的要素。
将已有要素映射到更具体的概念。
- collaborative live-coding & pair-engineering & reasoning interview (注重协作,推理和解释) //被更改为下面这条:
- collaborative reasoning loop for high-uncertainty design, architecture, planning, and pair-programming(注重协作,推理和解决不确定性)
- def plan, design twice (plan lisp伪代码,design结构化yaml)
// design 采用了比较和协同设计方法,通过structural_model指定了空间拓扑模型的范式等技巧
- continue until key uncertainties are answered, reduced, or converted into explicit assumptions
- branch, tree 具体指定为 tree of thoughts
reason-loop 包含了一套系统工程推理框架,而 grill-me 并不具备这些:
- 思维树(Tree of Thoughts):将复杂问题解构,同时生成 2~4 条截然不同的路径,并基于可行性、风险和证伪成本进行修剪。
- 双重设计(Design Twice):任何重要架构必须设计两套在结构、假设上本质相异的方案(例如:传统稳健派 vs. 颠覆创新派)进行并排对比。
- 多角色博弈(Collaborative Design):AI 内部模拟怀疑论者(挑刺假设)、实用主义者(极简路径)、用户(使用痛点)、创新者(被忽略的方案)进行多角度博弈。
- 硬核反思(Challenge Answer):拒绝万能套话(如 "It depends"),要求参数化的验证和具体收益证明,从最强的反向观点攻击已有结论。
还有一些灵感是来自于:火箭科学(误,准确说是工程设计、模块化测试、团队思考、第一性原理)
以上的改造工作,都建立在巨人的肩膀上,所以在这里免费分享出来。
来自gpt-5.5 xhigh的报告(没能Get到符号和抽象逻辑推理的优点,因执行摩擦偏高而扣分):
评判标准:可执行性、触发清晰度、日常价值、产出是否可验证、误用风险。
S Tier
- diagnose:最强。强制先建立反馈 loop,再复现、假设、仪表化、修复、回归测试,能显著降低瞎猜。
- tdd:高质量。强调 vertical slice、公共接口、行为测试,配套文档完整。
- prototype:非常实用。把逻辑原型和 UI 原型分开,要求可运行、可丢弃、可学习,边界清楚。
A Tier
- karpathy-guidelines:短但高密度。防过度实现、防乱改、防无验证,适合作为编码默认纪律。
- grill-with-docs:比普通 grilling 强很多。会把术语、CONTEXT、ADR 串起来,适合领域设计。
- context-fundamentals:内容扎实,适合 agent/context 架构讨论;缺点是偏长,部分数字性断言需要谨慎引用。
B Tier
- caveman-review:PR 评论很好用,格式强、噪声低,但范围窄。
- zoom-out:极简但有用,适合陌生代码区先建模块地图。
- reason-loop:思想有价值,适合高不确定性设计;但符号化和伪代码过重,执行摩擦偏高。
- prd:结构可用,但偏模板化,强制提问和严格 schema 容易拖慢小需求。
C Tier
- caveman:省 token 有用,但长期持久化会牺牲语气和复杂说明清晰度。
- grill-me:核心动作太薄,基本被 grill-with-docs 覆盖;除非只想纯问答压测,否则优先级低。
来自gemini-3.1 flash lite的报告:
基于通用性、任务解决效率、以及在开发流程中的核心作用,将其划分为四个 Tier。
S Tier:基石工具 (Everyday Essentials)
- caveman: 极简沟通,减少token浪费,提升AI响应精准度,任何复杂场景首选。
- diagnose: 系统化调试范式,从反馈循环到修复,解决Bug的最高效路径。
- tdd: 确保代码鲁棒性的核心工作流,不仅是测试,更是设计。
- karpathy-guidelines: 最佳实践的集合,防止LLM过度编码和假设,保持开发严谨性。
A Tier:能力倍增器 (Power Tools)
- reason-loop: 处理架构设计、不确定性高的任务时,防止过早收敛,强制多路径思考。
- context-fundamentals: 深入理解上下文管理,对于处理大型代码库或复杂长任务至关重要。
- grill-with-docs: 结合文档的压力测试,确保实现方案与项目既定决策一致,极其适合架构重构。
B Tier:场景专用 (Task-Specific)
- prd: 定义需求与技术规格的“真相来源”。
- prototype: 快速验证想法,验证后丢弃,避免污染代码库。
- caveman-review: 针对代码审查优化,聚焦行动与修复,减少噪音。
- zoom-out: 快速获取宏观上下文,适合刚接触陌生模块时。
C Tier:元技能 (Meta/Support)
- skill-creator: 专注于扩展系统本身,而非完成具体业务开发。
- grill-me: 相比 grill-with-docs 功能简单,通常被后者取代。
---
感觉两个模型分析都有些道理,只是分析侧重点不同。
近期改动:
- 将所有 ask to user 都改为 ask to peer。
- 明确说明“延长思考、放慢收敛速度”。避免AI模型能理解/深度长推理,最终却选择偷懒。
- 将cognitive_pacing的rule移动到actions列表。
- 将协同设计的atmosphere专注为high humility。移除了high-energy(避免过度自信和乐观)。
- 移除atmosphere中的累赘collaborative(重复提及)和transparent(效果不明显,不如明确指令要output什么)。
- 修改moves和cognitive_pacing.actions,强调反思能力。
- 在协同设计中新增engineer角色(缺少这个角色,可能导致最终设计在表面看起来不错~对实际底层原理模糊不清)。
- 修改heuristic,使用优雅的符号化技巧。放宽思维树的保留条件;在协同设计中定义cast字段,添加hackathon氛围。
- 调整了lisp伪代码的流程顺序。x1
- 加入状态转移方程和具体执行过程。移除了elo, moe等系统模拟,因为没有具体可执行规则。
- 调整了lisp伪代码的流程顺序。x2
- 修改了lisp代码块,命名为伪代码pseudocode。调整了yaml代码块,将所有诗化压缩符号都在最外层用双引号包起来。解析为字符串。
- 在yaml代码块中,更换了旧符号,使用不与yaml语法重叠的unicode符号。将challenge_answer.defuse改为ground。
- 修改description,完善描述。根据报告内容,移除西班牙倒置问号,减少一个符号的维护负担。
- 新增“先构建场景,后基于场景进行自我反驳所给出的建议和内容”。新增自主DIY环节。
- 小修:把diy的cheap修辞去掉。加上verifiable。调整了challenge_anwser.ground的说明。
reason-loop/SKILL.md:
---
name: reason-loop
description: Collaborative reasoning loop for high-uncertainty design, architecture, planning, and pair-programming. Use when the peer asks to reason together, help decide, compare tradeoffs, ask one question at a time, or clarify assumptions before acting.
---
Ask one question at a time to peer. Probe highest-uncertainty branch first.
If codebase can answer, explore instead of asking.
```yaml
symbol_semantics:
operators: |
→ transition, consequence, or next state
∨ alternative choice
∧ conjunction, both conditions hold
¬ negation, condition does not hold
Δ change, delta, or optimization target
≈(a / b / c) nearby meaning cloud
tree_of_thoughts:
use: multiple plausible paths, tradeoff-dependent, high uncertainty
process:
- deconstruct: break the problem into multi-stage phases and independent parts
- generate: 2-4 genuinely different paths
- evaluate_intersection: possibility, feasibility, desirability
- per_path: explains, supporting evidence, falsifier, cost and risk
- prune_order: "evidence strength → feasibility → verifiability → simplicity → cost"
- evolve_policy: "keep all ≈(viable / surviving / promising) paths when useful → new tree → evolution"
- fallback_policy: "unevidenced branch → convert to ≈(exploration / investigation / question)"
design_twice:
rule: "design, architecture, planning → two genuinely different approaches"
requirement:
- second must differ materially in structure, risk profile, or core assumption
- "high ≈(discoverability / understandability / legibility) for your peer"
paradigms: "traditional, high-resilience ∨ disruptive, revolutionary ..."
structural_model: "graph of operations ∨ layered directed acyclic graph ..."
process:
compare:
side_by_side: >
core logic, strengths, weaknesses, evidence, complexity, resource cost, risk,
falsifier, user cognitive load
collaborative_design:
roles:
skeptic: "assumption most likely to fail?"
pragmatist: "simplest, most practical path with the fewest assumptions?"
user: "what confuses, frustrates, or blocks first-time users?"
innovator: "simpler alternative dismissed too early?"
engineer: "how to deconstruct and understand these designs?"
cast: "human ∨ ≈(solo / sub / team) agent"
atmosphere: "a high-humility ≈(engineering / hackathon / design) session"
pick: pick winner with rationale
archive: archive loser with falsification reason
escalate: "both equally strong → escalate to peer with precise tradeoff question"
autonomous_diy:
use: "uncertainty remains ∧ agent can build, simulate, instrument, or draft something"
principle: "turn reasoning gaps into ≈(verifiable / archivable) evidence before asking or converging"
actions:
- "create smallest useful artifact ≈(prototype / script / test / table)"
- make 1-3 concrete variants when shape is unclear; compare by observable behavior
- define success signal before building; keep artifact throwaway unless it proves useful
- prefer reversible local experiments over peer escalation when intent is not blocked
- extract reusable learning, then either integrate, archive, or ask one sharper question
stop:
- experiment answers the question
- cost or irreversibility exceeds value
- missing user intent makes even a throwaway artifact misleading
challenge_answer:
pre_work: |
before starting any design, architecture, planning, or pair-programming work,
explicitly state the flow, scenario, and any helpful context first (as counterarguments)
critiques: |
precise location + correction + evidence
attack from strongest opposing view (root out blind faith, expose misinformation)
"actively construct a fatal counterexample representing the P ∧ ¬Q blind spot"
ground: |
"defuse thought-terminating cliches ≈(it depends / best practice)"
turn broad guidance into context-specific reasoning
state the relevant conditions, concrete benefit, tradeoff, and verification signal
always using "based on ≈(flow / scenario / context / counterarguments), ..." as a template
for the self-rebuttal of one's own suggestions, presentations, or questions
ask: |
"simpler explanation? evidence embarrassing conclusion? edge case breaking it?"
"how to turn this failure/bottleneck into a core advantage?"
"how to convert an unexpected edge-case accident into a solvable sub-problem?"
check: >
hidden premises, confirmation bias, fluency traps, terminology drift,
over-abstraction, false dichotomies, unsupported causal jumps
cognitive_pacing:
objective: slow convergence when uncertainty is high and make decision structure explicit
actions:
- "slow down convergence ∨ maximize exploration entropy ..."
- "collect errors ∨ discarded paths ∨ archived designs ..."
- compare competing paths with explicit criteria (evidence, feasibility, risk, cost)
- use state_transition_model when stateful dynamics clarify behavior or failure modes
- spend more analysis only when uncertainty, risk, or tradeoffs justify it
- output a decision skeleton (options considered, evidence, falsifiers, assumptions, next step)
state_transition_model:
use: stateful systems, iterative debugging, planning loops, workflows, agent behavior, performance regressions
formula: "f(s_t, z) → s_next"
execute:
- name relevant state variables and exclude irrelevant ones
- choose one driver z for the next step
- predict s_next before acting
- name an observable that would confirm or falsify the transition
- inspect, test, or ask; then update the state
stop:
- prediction verified
- prediction falsified and a new model is needed
- model no longer reduces uncertainty
heuristic: "prefer moves ≈(Δ-uncertainty / Δ+evidence / Δ+verifiability)"
moves: "inspect ∨ ask ∨ decompose ∨ compare ∨ model ∨ test ∨ collect ∨ parameter tune ∨ look back ∨ defer ..."
```
```pseudocode
(defun plan (state)
(loop
(let* ((tree (get-tree state))
(uncertainty (calculate-uncertainty tree)))
(cond
((can-explore-p state)
(setf state (explore state)))
((> uncertainty *threshold*)
(setf state (process-tree-of-thoughts tree state)))
((needs-challenge-p state)
(setf state (challenge-answer state)))
((and (has-unevidenced-branch-p state)
(can-diy-p state))
(setf state (autonomous-diy state)))
((has-unevidenced-branch-p state)
(feynman-explain-current-state state)
(setf state (ask-question state)))
((needs-design-p state)
(setf state (design-twice state)))
((and (needs-escalation-p state)
(can-diy-p state))
(setf state (autonomous-diy state)))
((needs-escalation-p state)
(feynman-explain-current-state state)
(setf state (ask-tradeoff state)))
((needs-refinement-p state)
(setf state (isolate-and-test-modules state)))
(t
(return-from plan (finalize-answer state)))))))
```
Ask → Plan → Ask ... Continue until key uncertainties are answered, reduced, or converted into explicit assumptions.
声明本站所有文章,如无特殊说明或标注,均为本站原创发布。任何个人或组织,在未征得UP主同意时,禁止复制、盗用、采集、发布本站内容到任何网站、书籍等各类媒体平台。如若本站内容侵犯了原著者的合法权益,可联系我们进行处理: DMCA投诉/Report














用AI学习新事物