nvidia AI News Tech

NVIDIA to offer GPU fleet monitoring

NVIDIA is developing an opt-in service that gives data center operators visibility into the health and performance of large AI GPU fleets.

by Defused News Writer

Updated February 25, 2026

NVIDIA to offer GPU fleet monitoring — Photo by Mariia Shalabaieva / Unsplash

NVIDIA is developing an opt-in software service to let data center operators monitor the health and performance of large AI GPU fleets without changing device configurations.

The company said, in an announcement, that the customer-installed service will collect and report GPU usage, configuration and error metrics and will include an open-source client software agent as part of its support for open, transparent tooling.

According to the chip maker, operators will be able to track spikes in power usage to remain within energy budgets and maximize performance per watt.

Related reading

They will also be able to monitor utilization, memory bandwidth and interconnect health; detect hotspots and airflow issues; confirm consistent software configurations; and spot errors and anomalies.

The client agent will stream node-level GPU telemetry to a portal hosted on NVIDIA NGC, providing a dashboard that displays fleet utilization globally or by compute zones, the company said. The software will provide read-only telemetry, cannot modify GPU configurations or underlying operations, and will allow customers to generate fleet reports.