Microsoft is adding Critique and Council to Researcher, Microsoft 365 Copilot’s deep research agent for work, to boost accuracy, analytical breadth and citation quality in complex research tasks. The company said in a blog post that the features introduce multi-model orchestration across generation and evaluation roles within Researcher.
Critique separates generation from review, running one model to plan and draft responses and a second model to validate claims and refine structure. The blog post notes the system leverages models from Frontier labs, including Anthropic and OpenAI, and applies rubric-based review to enforce evidence grounding and report completeness.
"Critique is a new multi model deep research system designed for complex research tasks," Microsoft said in a blog post. The company reported performance on the DRACO (Deep Research Accuracy, Completeness, and Objectivity) benchmark: Researcher with Critique improves the aggregated score by +7.0 points (SEM ±1.90), a +13.88% lead over Perplexity Deep Research (Claude Opus 4.6 model).
Related reading
- Microsoft introduces Critique in M365 Copilot
- Nvidia says the open versus proprietary AI debate is the wrong argument
- Google opens its music generation model to developers, with watermarking built in from the start
Council runs Anthropic and OpenAI models side-by-side, produces independent reports, and uses a judge model to distill agreements, divergences and unique insights. Microsoft says evaluations used OpenAI’s GPT-5.2 as the LLM judge and found statistically significant gains in Breadth and Depth (+3.33), Presentation Quality (+3.04) and Factual Accuracy (+2.58) (paired t-test, p < 0.0001); domain-level improvements appear in 8 of 10 DRACO domains.
Microsoft says Critique will be the default experience in Researcher when Auto is selected in the model picker, while Council is available under Model Council. Both features are broadly available in the Frontier program and aimed at improving deep research workflows inside Copilot.
The recap
- Microsoft adds Critique multi-model system to Researcher experience
- Critique raises DRACO aggregated score by +7.0 points versus peers
- Critique and Council are broadly available in the Frontier program