
🤖 SkillOpt: Skills That Train Themselves#
What if AI agents could improve their own procedures without modifying their weights?
That’s exactly what SkillOpt, a Microsoft Research project, proposes. Instead of fine-tuning the model, SkillOpt optimizes a text document —called a “skill”— that tells the agent how to solve tasks. 🧠
How does the loop work?#
- 🔄 Rollout → The agent executes tasks with the current skill and records results.
- 🔍 Reflect → An optimizer model analyzes successes and failures.
- ✏️ Edit → Edits (add, delete, replace) are proposed under a bounded budget.
- ✅ Gate → Changes are accepted only if they improve held-out validation performance.
📊 Real Results#
The results are impressive:
- GPT-5.5 improves by +23.5% on average across 6 benchmarks
- GPT-5.4-nano: +24.9%
- The exported skill transfers across models and harnesses without retraining
❓Some questions I’m asking myself#
- What is the real computational cost of the optimization process in SkillOpt?
- Where do the training and validation datasets required for the method come from?
- To what extent does an optimized skill generalize when the task or domain changes?
- How much does the final performance depend on the power of the optimizer model?
- How sensitive is the process to the choice of the validation set?
- What guarantees of stability exist during the skill’s self‑editing process?
💡 Explanation in a nutshell#
Imagine you have a chef (the AI agent) and a recipe (the skill). Instead of modifying the chef’s abilities, SkillOpt automatically improves the recipe: it tests variants, discards the ones that fail, and keeps the ones that work. The result is an improved recipe that any chef can follow.
The question is: how expensive could this methodology be?
More information at the link 👇
Also published on LinkedIn.

