Subscribe to Our Newsletter

Success! Now Check Your Email

To complete Subscribe, click the confirmation link in your inbox. If it doesn’t arrive within 3 minutes, check your spam folder.

Ok, Thanks

AI hardware explained: GPUs, specialised chips, and why supply matters

The single most important takeaway for the global economy is that artificial intelligence (AI) is no longer a software story but a physical one

Mr Moonlight profile image
by Mr Moonlight
AI hardware explained: GPUs, specialised chips, and why supply matters
Photo by Rafael Pol / Unsplash

The ability of a nation or a corporation to participate in the next industrial revolution is now dictated by its access to a specific and scarce class of high-end silicon. This hardware is the only medium through which the complex mathematics of modern machine learning can be translated into reality. Without it, the most sophisticated algorithms remain purely theoretical.

How it works: the silicon engine

To understand why a single company like Nvidia has seen its valuation rival the gross domestic product (GDP) of entire nations, one must understand what happens inside a data centre.

At the base of the stack is the graphics processing unit (GPU). Originally designed to render pixels in video games, these chips are uniquely suited to AI because they excel at parallel processing.

While a standard computer chip, the central processing unit (CPU), is like a brilliant scholar who solves one complex problem at a time, a GPU is like a stadium of thousands of primary school students performing simple addition simultaneously.

For AI, which requires billions of simple mathematical operations to happen at once, the stadium of students is far more efficient.

Specialisation has moved beyond the GPU. Google uses the tensor processing unit (TPU), an application-specific integrated circuit (ASIC) designed specifically for its own AI models.

More recently, we have seen the rise of the neural processing unit (NPU), which is a smaller version of this technology designed to fit inside a smartphone or laptop to handle AI tasks locally rather than in the cloud.

However, the brain of the chip is useless if it cannot be fed data quickly enough. This is where high-bandwidth memory (HBM) becomes critical. Standard memory is too slow to keep up with an AI chip, so engineers stack memory chips vertically on top of the processor to shorten the physical distance data must travel.

These are joined together by interconnects, the ultra-fast digital motorways that allow thousands of individual chips to talk to each other as if they were one giant, planet-sized computer.

The great split: training vs inference

The hardware requirements change depending on what the AI is doing.

Training is the schooling phase. This is when an AI model like ChatGPT is fed the entire internet to learn patterns. It is incredibly hardware-intensive, requiring tens of thousands of GPUs (often called a cluster) running at full power for months. This is where the world’s supply of high-end chips is currently being swallowed up.

Inference is the working phase. This occurs every time you ask an AI a question. The model uses what it learned during training to generate an answer. While inference requires less raw power than training, it must happen instantly. As AI becomes integrated into every app and device, the sheer volume of inference tasks is expected to eventually exceed the demand for training.

Bottlenecks: why supply is the battleground

The AI sector currently faces three major structural bottlenecks that are causing ripples in capital markets and international diplomacy.

The first is the memory wall and HBM shortages. Manufacturing high-bandwidth memory is notoriously difficult, with high failure rates during production.

Analysts at IDC and McKinsey have warned that the shift towards HBM is cannibalising the supply of standard memory for PCs and smartphones, leading to a global shortage often nicknamed "RAMpocalypse".

By early 2026, prices for basic memory modules will have doubled in some regions as manufacturers like SK Hynix and Samsung prioritise AI contracts. Without this memory, even the fastest GPUs sit idle.

The second bottleneck involves export controls and the new geopolitics of silicon. As of January 2026, the US government has transitioned to a rigorous annual licensing framework for major chipmakers operating in China.

These rules target not just raw speed but interconnect speed, effectively preventing rivals from building the massive clusters needed to train world-leading models. Washington’s move to yearly approvals has created significant uncertainty for global giants like TSMC and Samsung, while Beijing has retaliated by restricting the export of critical minerals essential for high-end electronics.

The third challenge is the power crunch. A single AI data centre can consume as much power as a small city. This has led to a massive surge in capital expenditure as tech giants like Microsoft and Google spend hundreds of billions of pounds on infrastructure.

Nvidia’s latest Blackwell chips, for instance, draw roughly 1 kilowatt (kW) per processor, nearly a 50 per cent increase over the previous generation. This is forcing operators to move away from standard racks towards specialised cooling systems, with some 2026 facilities projected to draw over 1 gigawatt (GW) of electricity.

What to watch: a checklist for chip claims

When a company or startup announces a breakthrough in AI hardware, readers can evaluate the news by asking five key questions.

First, is the chip designed for training or inference? A chip that is good at one is rarely the best at the other. Second, consider the memory bandwidth. Does the chip have enough HBM to keep the processor busy, or will it be waiting for data?

Third, investigate the interconnect. Can this chip be linked with 50,000 others? If it cannot scale into a cluster, it cannot train a foundation model. Fourth, look at the software layer. Most AI researchers use Nvidia’s CUDA software; a new chip with no software support is like a car with no roads to drive on. Finally, assess the power efficiency. In a world of limited electricity, the number of "tokens per watt" is often more important than raw speed.

Glossary of terms

GPU (Graphics Processing Unit): A chip designed for parallel processing, now the industry standard for AI.

TPU (Tensor Processing Unit): A custom AI chip designed by Google for its own cloud services.

NPU (Neural Processing Unit): A specialised processor for AI tasks on consumer devices like phones.

HBM (High-Bandwidth Memory): Fast, vertically stacked memory that feeds the AI processor.

Interconnect: High-speed links that connect multiple chips into a single computing cluster.

Cluster: A group of interconnected GPUs working together as a single supercomputer.

Mr Moonlight profile image
by Mr Moonlight

Read More