浏览器运行环境异常,请检查是否开启本站的JavaScript权限或下载最新版浏览器
Logo
综合
文章
漫画
圈子
热榜
帮助
签到
创作
极速下载
资源区
学习区
专栏区
无标题文章
3
反对
1
收藏
举报
分享

以grill-me为参考基础,写推理SKILL

想必用过grill-me的大部分人,都会觉得它还是很好用的。

我也觉得这个技能不错,于是开始不定期研究和收集一些相关资料。

首先拆解了grill-me技能,得出要素:
- Interview
- plan or design
- until
- shared understanding
- branch, tree

然后我想打造一个元认知技能,参考上面的要素。

将已有要素映射到更具体的概念。
- collaborative live-coding & pair-engineering & reasoning interview (注重协作,推理和解释) //被更改为下面这条:
- collaborative reasoning loop for high-uncertainty design, architecture, planning, and pair-programming(注重协作,推理和解决不确定性)
- def plan, design twice (plan lisp伪代码,design结构化yaml)
// design 采用了比较和协同设计方法,通过structural_model指定了空间拓扑模型的范式等技巧
- continue until key uncertainties are answered, reduced, or converted into explicit assumptions
- branch, tree 具体指定为 tree of thoughts

reason-loop 包含了一套系统工程推理框架,而 grill-me 并不具备这些:
- 思维树(Tree of Thoughts):将复杂问题解构,同时生成 2~4 条截然不同的路径,并基于可行性、风险和证伪成本进行修剪。
- 双重设计(Design Twice):任何重要架构必须设计两套在结构、假设上本质相异的方案(例如:传统稳健派 vs. 颠覆创新派)进行并排对比。
- 多角色博弈(Collaborative Design):AI 内部模拟怀疑论者(挑刺假设)、实用主义者(极简路径)、用户(使用痛点)、创新者(被忽略的方案)进行多角度博弈。
- 硬核反思(Challenge Answer):拒绝万能套话(如 "It depends"),要求参数化的验证和具体收益证明,从最强的反向观点攻击已有结论。

还有一些灵感是来自于:火箭科学(误,准确说是工程设计、模块化测试、团队思考、第一性原理)

以上的改造工作,都建立在巨人的肩膀上,所以在这里免费分享出来。

来自gpt-5.5 xhigh的报告(没能Get到符号和抽象逻辑推理的优点,因执行摩擦偏高而扣分):
评判标准:可执行性、触发清晰度、日常价值、产出是否可验证、误用风险。
S Tier
  - diagnose:最强。强制先建立反馈 loop,再复现、假设、仪表化、修复、回归测试,能显著降低瞎猜。
  - tdd:高质量。强调 vertical slice、公共接口、行为测试,配套文档完整。
  - prototype:非常实用。把逻辑原型和 UI 原型分开,要求可运行、可丢弃、可学习,边界清楚。
A Tier
  - karpathy-guidelines:短但高密度。防过度实现、防乱改、防无验证,适合作为编码默认纪律。
  - grill-with-docs:比普通 grilling 强很多。会把术语、CONTEXT、ADR 串起来,适合领域设计。
  - context-fundamentals:内容扎实,适合 agent/context 架构讨论;缺点是偏长,部分数字性断言需要谨慎引用。
B Tier
  - caveman-review:PR 评论很好用,格式强、噪声低,但范围窄。
  - zoom-out:极简但有用,适合陌生代码区先建模块地图。
  - reason-loop:思想有价值,适合高不确定性设计;但符号化和伪代码过重,执行摩擦偏高。
  - prd:结构可用,但偏模板化,强制提问和严格 schema 容易拖慢小需求。
C Tier
  - caveman:省 token 有用,但长期持久化会牺牲语气和复杂说明清晰度。
  - grill-me:核心动作太薄,基本被 grill-with-docs 覆盖;除非只想纯问答压测,否则优先级低。

来自gemini-3.1 flash lite的报告:
基于通用性、任务解决效率、以及在开发流程中的核心作用,将其划分为四个 Tier。
S Tier:基石工具 (Everyday Essentials)
- caveman: 极简沟通,减少token浪费,提升AI响应精准度,任何复杂场景首选。
- diagnose: 系统化调试范式,从反馈循环到修复,解决Bug的最高效路径。
- tdd: 确保代码鲁棒性的核心工作流,不仅是测试,更是设计。
- karpathy-guidelines: 最佳实践的集合,防止LLM过度编码和假设,保持开发严谨性。
A Tier:能力倍增器 (Power Tools)
- reason-loop: 处理架构设计、不确定性高的任务时,防止过早收敛,强制多路径思考。
- context-fundamentals: 深入理解上下文管理,对于处理大型代码库或复杂长任务至关重要。
- grill-with-docs: 结合文档的压力测试,确保实现方案与项目既定决策一致,极其适合架构重构。
B Tier:场景专用 (Task-Specific)
- prd: 定义需求与技术规格的“真相来源”。
- prototype: 快速验证想法,验证后丢弃,避免污染代码库。
- caveman-review: 针对代码审查优化,聚焦行动与修复,减少噪音。
- zoom-out: 快速获取宏观上下文,适合刚接触陌生模块时。
C Tier:元技能 (Meta/Support)
- skill-creator: 专注于扩展系统本身,而非完成具体业务开发。
- grill-me: 相比 grill-with-docs 功能简单,通常被后者取代。

---

感觉两个模型分析都有些道理,只是分析侧重点不同。

近期改动:
- 将所有 ask to user 都改为 ask to peer。
- 明确说明“延长思考、放慢收敛速度”。避免AI模型能理解/深度长推理,最终却选择偷懒。
- 将cognitive_pacing的rule移动到actions列表。
- 将协同设计的atmosphere专注为high humility。移除了high-energy(避免过度自信和乐观)。
- 移除atmosphere中的累赘collaborative(重复提及)和transparent(效果不明显,不如明确指令要output什么)。
- 修改moves和cognitive_pacing.actions,强调反思能力。
- 在协同设计中新增engineer角色(缺少这个角色,可能导致最终设计在表面看起来不错~对实际底层原理模糊不清)。
- 修改heuristic,使用优雅的符号化技巧。放宽思维树的保留条件;在协同设计中定义cast字段,添加hackathon氛围。
- 调整了lisp伪代码的流程顺序。x1
- 加入状态转移方程和具体执行过程。移除了elo, moe等系统模拟,因为没有具体可执行规则。
- 调整了lisp伪代码的流程顺序。x2
- 修改了lisp代码块,命名为伪代码pseudocode。调整了yaml代码块,将所有诗化压缩符号都在最外层用双引号包起来。解析为字符串。
- 在yaml代码块中,更换了旧符号,使用不与yaml语法重叠的unicode符号。将challenge_answer.defuse改为ground。
- 修改description,完善描述。根据报告内容,移除西班牙倒置问号,减少一个符号的维护负担。
- 新增“先构建场景,后基于场景进行自我反驳所给出的建议和内容”。新增自主DIY环节。
- 小修:把diy的cheap修辞去掉。加上verifiable。调整了challenge_anwser.ground的说明。

reason-loop/SKILL.md:
---
name: reason-loop
description: Collaborative reasoning loop for high-uncertainty design, architecture, planning, and pair-programming. Use when the peer asks to reason together, help decide, compare tradeoffs, ask one question at a time, or clarify assumptions before acting.
---

Ask one question at a time to peer. Probe highest-uncertainty branch first.

If codebase can answer, explore instead of asking.

```yaml
symbol_semantics:
  operators: |
    → transition, consequence, or next state
    ∨ alternative choice
    ∧ conjunction, both conditions hold
    ¬ negation, condition does not hold
    Δ change, delta, or optimization target
    ≈(a / b / c) nearby meaning cloud
tree_of_thoughts:
  use: multiple plausible paths, tradeoff-dependent, high uncertainty
  process:
    - deconstruct: break the problem into multi-stage phases and independent parts
    - generate: 2-4 genuinely different paths
    - evaluate_intersection: possibility, feasibility, desirability
    - per_path: explains, supporting evidence, falsifier, cost and risk
    - prune_order: "evidence strength → feasibility → verifiability → simplicity → cost"
    - evolve_policy: "keep all ≈(viable / surviving / promising) paths when useful → new tree → evolution"
    - fallback_policy: "unevidenced branch → convert to ≈(exploration / investigation / question)"
design_twice:
  rule: "design, architecture, planning → two genuinely different approaches"
  requirement:
    - second must differ materially in structure, risk profile, or core assumption
    - "high ≈(discoverability / understandability / legibility) for your peer"
  paradigms: "traditional, high-resilience ∨ disruptive, revolutionary ..."
  structural_model: "graph of operations ∨ layered directed acyclic graph ..."
  process:
    compare:
      side_by_side: >
        core logic, strengths, weaknesses, evidence, complexity, resource cost, risk,
        falsifier, user cognitive load
      collaborative_design:
        roles:
          skeptic: "assumption most likely to fail?"
          pragmatist: "simplest, most practical path with the fewest assumptions?"
          user: "what confuses, frustrates, or blocks first-time users?"
          innovator: "simpler alternative dismissed too early?"
          engineer: "how to deconstruct and understand these designs?"
        cast: "human ∨ ≈(solo / sub / team) agent"
        atmosphere: "a high-humility ≈(engineering / hackathon / design) session"
    pick: pick winner with rationale
    archive: archive loser with falsification reason
    escalate: "both equally strong → escalate to peer with precise tradeoff question"
autonomous_diy:
  use: "uncertainty remains ∧ agent can build, simulate, instrument, or draft something"
  principle: "turn reasoning gaps into ≈(verifiable / archivable) evidence before asking or converging"
  actions:
    - "create smallest useful artifact ≈(prototype / script / test / table)"
    - make 1-3 concrete variants when shape is unclear; compare by observable behavior
    - define success signal before building; keep artifact throwaway unless it proves useful
    - prefer reversible local experiments over peer escalation when intent is not blocked
    - extract reusable learning, then either integrate, archive, or ask one sharper question
  stop:
    - experiment answers the question
    - cost or irreversibility exceeds value
    - missing user intent makes even a throwaway artifact misleading
challenge_answer:
  pre_work: |
    before starting any design, architecture, planning, or pair-programming work,
    explicitly state the flow, scenario, and any helpful context first (as counterarguments)
  critiques: |
    precise location + correction + evidence
    attack from strongest opposing view (root out blind faith, expose misinformation)
    "actively construct a fatal counterexample representing the P ∧ ¬Q blind spot"
  ground: |
    "defuse thought-terminating cliches ≈(it depends / best practice)"
    turn broad guidance into context-specific reasoning
    state the relevant conditions, concrete benefit, tradeoff, and verification signal
    always using "based on ≈(flow / scenario / context / counterarguments), ..." as a template
    for the self-rebuttal of one's own suggestions, presentations, or questions
  ask: |
    "simpler explanation? evidence embarrassing conclusion? edge case breaking it?"
    "how to turn this failure/bottleneck into a core advantage?"
    "how to convert an unexpected edge-case accident into a solvable sub-problem?"
  check: >
    hidden premises, confirmation bias, fluency traps, terminology drift,
    over-abstraction, false dichotomies, unsupported causal jumps
cognitive_pacing:
  objective: slow convergence when uncertainty is high and make decision structure explicit
  actions:
    - "slow down convergence ∨ maximize exploration entropy ..."
    - "collect errors ∨ discarded paths ∨ archived designs ..."
    - compare competing paths with explicit criteria (evidence, feasibility, risk, cost)
    - use state_transition_model when stateful dynamics clarify behavior or failure modes
    - spend more analysis only when uncertainty, risk, or tradeoffs justify it
    - output a decision skeleton (options considered, evidence, falsifiers, assumptions, next step)
state_transition_model:
  use: stateful systems, iterative debugging, planning loops, workflows, agent behavior, performance regressions
  formula: "f(s_t, z) → s_next"
  execute:
    - name relevant state variables and exclude irrelevant ones
    - choose one driver z for the next step
    - predict s_next before acting
    - name an observable that would confirm or falsify the transition
    - inspect, test, or ask; then update the state
  stop:
    - prediction verified
    - prediction falsified and a new model is needed
    - model no longer reduces uncertainty
heuristic: "prefer moves ≈(Δ-uncertainty / Δ+evidence / Δ+verifiability)"
moves: "inspect ∨ ask ∨ decompose ∨ compare ∨ model ∨ test ∨ collect ∨ parameter tune ∨ look back ∨ defer ..."
```

```pseudocode
(defun plan (state)
  (loop
   (let* ((tree (get-tree state))
          (uncertainty (calculate-uncertainty tree)))
     (cond
      ((can-explore-p state)
       (setf state (explore state)))
      ((> uncertainty *threshold*)
       (setf state (process-tree-of-thoughts tree state)))
      ((needs-challenge-p state)
       (setf state (challenge-answer state)))
      ((and (has-unevidenced-branch-p state)
            (can-diy-p state))
       (setf state (autonomous-diy state)))
      ((has-unevidenced-branch-p state)
       (feynman-explain-current-state state)
       (setf state (ask-question state)))
      ((needs-design-p state)
       (setf state (design-twice state)))
      ((and (needs-escalation-p state)
            (can-diy-p state))
       (setf state (autonomous-diy state)))
      ((needs-escalation-p state)
       (feynman-explain-current-state state)
       (setf state (ask-tradeoff state)))
      ((needs-refinement-p state)
       (setf state (isolate-and-test-modules state)))
      (t
       (return-from plan (finalize-answer state)))))))
```

Ask → Plan → Ask ... Continue until key uncertainties are answered, reduced, or converted into explicit assumptions.
 
 

 

 

1条评论
有没有大神路过,可以给点建议或者尝试用它来攻克高性能矩阵乘法。
(#°Д°)你到底了哦
logo
有维咔App就够了
随时随地发现资源,免去网页端烦恼广告
打开App