Researchers from multiple universities have developed a framework that allows artificial intelligence agents to update and rewrite their own operational skills on the fly, without modifying the underlying large language model they run on.
The system, called Memento-Skills, addresses a persistent bottleneck in deploying AI agents at scale: retraining model weights or manually rebuilding skills each time an agent's capabilities need updating is expensive, data-intensive, and difficult to sustain in production environments.
Rather than altering the core model, Memento-Skills stores skills as structured markdown files containing specifications, specialised prompts, and executable code, which the system can modify independently of the frozen model beneath it.
The framework operates through what the researchers describe as a Read-Write Reflective Learning loop, in which a skill router selects the most relevant skills for a given task, the agent executes them, receives feedback on the outcome, and an orchestrating layer then rewrites or creates new skill files when failures occur.
The approach effectively treats memory updates as a form of policy iteration, a technique borrowed from reinforcement learning in which an agent progressively refines its decision-making strategy.
In benchmark testing using Gemini-3.1-Flash as the frozen base model, Memento-Skills lifted accuracy on the GAIA benchmark, a standard test for general AI agent capability, to 66% from a static baseline of 52.3%, a gain of 13.7 percentage points.
Performance on Humanity's Last Exam, a more demanding evaluation designed to test the limits of frontier AI, improved to 38.7% from 17.9%.
End-to-end task success reached 80% using the framework's custom skill router, compared with 50% for a standard retrieval method known as BM25.
The system began with five seed skills in testing and expanded autonomously to 41 skills on GAIA and 235 on Humanity's Last Exam.
Jun Wang, a co-author of the paper published on arXiv, the academic preprint server, said the framework adds continuous learning capability to existing agent tooling currently available in the market.
Related reading
- Anthropic spots 'emotion vectors' inside Claude
- Microsoft warns AI agents risk becoming "double agents" as it unveils security controls at RSAC
- Gmail says Gemini won’t train on personal email
The team has released the code publicly via GitHub but urged enterprises to proceed carefully.
Wang cautioned that the framework is best suited to structured workflows and that robust governance is essential, stressing that reliable self-improvement requires a well-designed evaluation system capable of assessing performance and providing consistent guidance.
The recap
- Researchers released Memento-Skills, a self-evolving agent framework on GitHub.
- Improved GAIA accuracy to 66.0%, a 13.7 percentage-point gain.
- Code released on GitHub; researchers recommend workflow-aligned enterprise deployment.