Microsoft Research Asia unveils Agent Lightning to add reinforcement learning to AI agents without code changes

Microsoft Research Asia in Shanghai has introduced Agent Lightning, an open-source framework designed to let developers apply reinforcement learning to artificial intelligence agents without rewriting existing code.

The research team said the framework separates task execution from model training, allowing reinforcement learning to be layered on top of existing agent systems rather than built directly into their logic. The approach is intended to lower the technical barrier to deploying adaptive agents in production environments.

Agent Lightning works by recording agent execution as sequences of states and actions. Each large language model call is treated as a discrete action, while each transition captures the model’s input, output and an associated reward. These records are then used for reinforcement learning training, enabling models to improve based on task-level outcomes rather than isolated prompt responses.

At the centre of the framework is LightningRL, an algorithm that assigns credit after a task has been completed. Instead of rewarding individual model calls in real time, the algorithm distributes rewards retrospectively across the large language model requests involved in completing the task. This design allows the use of single-step reinforcement learning methods such as Proximal Policy Optimisation and Group Relative Policy Optimisation.

Agent Lightning functions as middleware and is built around three main components. An agent runner distributes workloads and collects execution results. An algorithm component handles model training and hosts large language models. A third element, LightningStore, manages the exchange and storage of execution traces, rewards and training data.

The researchers said the system is designed so that the agent runner and training components can operate on different hardware. This separation allows developers to keep existing agent code unchanged while switching model calls to the Agent Lightning application programming interface, introducing reinforcement learning without altering agent behaviour.

Related reading

Microsoft Research Asia tested the framework across three use cases. These included text-to-structured query language tasks built with LangChain, retrieval-augmented generation using the OpenAI Agents software development kit, and mathematical question answering with tool use implemented via AutoGen. The team reported consistent performance improvements across all three scenarios when reinforcement learning was applied through Agent Lightning.

The researchers said future development will focus on expanding the framework to include automatic prompt optimisation and support for additional reinforcement learning algorithms. They added that the goal is to make reinforcement learning a practical and modular component of real-world agent systems, rather than a technique confined to experimental settings.