Subscribe to Our Newsletter

Success! Now Check Your Email

To complete Subscribe, click the confirmation link in your inbox. If it doesn’t arrive within 3 minutes, check your spam folder.

Ok, Thanks

Microsoft Research Asia unveils Agent Lightning to add reinforcement learning to AI agents without code changes

Agent Lightning is an open-source framework that lets developers add reinforcement learning to AI agents without rewriting their code.

Defused News Writer profile image
by Defused News Writer
Microsoft Research Asia unveils Agent Lightning to add reinforcement learning to AI agents without code changes
Photo by Simon Ray / Unsplash

Microsoft Research Asia in Shanghai has introduced Agent Lightning, an open-source framework designed to let developers apply reinforcement learning to artificial intelligence agents without rewriting existing code.

The research team said the framework separates task execution from model training, allowing reinforcement learning to be layered on top of existing agent systems rather than built directly into their logic. The approach is intended to lower the technical barrier to deploying adaptive agents in production environments.

Agent Lightning works by recording agent execution as sequences of states and actions. Each large language model call is treated as a discrete action, while each transition captures the model’s input, output and an associated reward. These records are then used for reinforcement learning training, enabling models to improve based on task-level outcomes rather than isolated prompt responses.

At the centre of the framework is LightningRL, an algorithm that assigns credit after a task has been completed. Instead of rewarding individual model calls in real time, the algorithm distributes rewards retrospectively across the large language model requests involved in completing the task. This design allows the use of single-step reinforcement learning methods such as Proximal Policy Optimisation and Group Relative Policy Optimisation.

Agent Lightning functions as middleware and is built around three main components. An agent runner distributes workloads and collects execution results. An algorithm component handles model training and hosts large language models. A third element, LightningStore, manages the exchange and storage of execution traces, rewards and training data.

The researchers said the system is designed so that the agent runner and training components can operate on different hardware. This separation allows developers to keep existing agent code unchanged while switching model calls to the Agent Lightning application programming interface, introducing reinforcement learning without altering agent behaviour.

Microsoft Research Asia tested the framework across three use cases. These included text-to-structured query language tasks built with LangChain, retrieval-augmented generation using the OpenAI Agents software development kit, and mathematical question answering with tool use implemented via AutoGen. The team reported consistent performance improvements across all three scenarios when reinforcement learning was applied through Agent Lightning.

The researchers said future development will focus on expanding the framework to include automatic prompt optimisation and support for additional reinforcement learning algorithms. They added that the goal is to make reinforcement learning a practical and modular component of real-world agent systems, rather than a technique confined to experimental settings.

The Recap

  • Agent Lightning enables RL for AI agents without code rewrites.
  • The framework standardizes agent runs into state-action transition records.
  • Researchers plan automatic prompt optimization and additional RL algorithms.
Defused News Writer profile image
by Defused News Writer

Read More