Challengers Are Coming for Nvidia’s Crown

Matthew S. Smith

from IEEE Spectrum on 2024-09-16 14:00 (#6QS00)

an-illustration-of-archers-shotting-arrows-at-a-large-man-with-a-chip-as-a-shield.gif?id=53630255&width=2000&height=1500&coordinates=221%2C0%2C221%2C0

It's hard to overstate Nvidia's AI dominance. Founded in 1993, Nvidia first made its mark in the then-new field of graphics processing units (GPUs) for personal computers. But it's the company's AI chips, not PC graphics hardware, that vaulted Nvidia into the ranks of the world's most valuable companies. It turns out that Nvidia's GPUs are also excellent for AI. As a result, its stock is more than 15 times as valuable as it was at the start of 2020; revenues have ballooned from roughly US $12 billion in its 2019 fiscal year to $60 billion in 2024; and the AI powerhouse's leading-edge chips are as scarce, and desired, as water in a desert.

Access to GPUs has become so much of a worry for AI researchers, that the researchers think about this on a day-to-day basis. Because otherwise they can't have fun, even if they have the best model," says Jennifer Prendki, head of AI data at Google DeepMind. Prendki is less reliant on Nvidia than most, as Google has its own homespun AI infrastructure. But other tech giants, like Microsoft and Amazon, are among Nvidia's biggest customers, and continue to buy its GPUs as quickly as they're produced. Exactly who gets them and why is the subject of an antitrust investigation by the U.S. Department of Justice, according to press reports.

Nvidia's AI dominance, like the explosion of machine learning itself, is a recent turn of events. But it's rooted in the company's decades-long effort to establish GPUs as general computing hardware that's useful for many tasks besides rendering graphics. That effort spans not only the company's GPU architecture, which evolved to include tensor cores" adept at accelerating AI workloads, but also, critically, its software platform, called Cuda, to help developers take advantage of the hardware.

They made sure every computer-science major coming out of university is trained up and knows how to program CUDA," says Matt Kimball, principal data-center analyst at Moor Insights & Strategy. They provide the tooling and the training, and they spend a lot of money on research."

Released in 2006, CUDA helps developers use an Nvidia GPU's many cores. That's proved essential for accelerating highly parallelized compute tasks, including modern generative AI. Nvidia's success in building the CUDA ecosystem makes its hardware the path of least resistance for AI development. Nvidia chips might be in short supply, but the only thing more difficult to find than AI hardware is experienced AI developers-and many are familiar with CUDA.

That gives Nvidia a deep, broad moat with which to defend its business, but that doesn't mean it lacks competitors ready to storm the castle, and their tactics vary widely. While decades-old companies like Advanced Micro Devices (AMD) and Intel are looking to use their own GPUs to rival Nvidia, upstarts like Cerebras and SambaNova have developed radical chip architectures that drastically improve the efficiency of generative AI training and inference. These are the competitors most likely to challenge Nvidia.

Nvidia's Armory

an-illustration-of-a-bar-chart.png?id=53625823&width=980 While Nvidia has several types of GPUs deployed, the big guns found in data centers are the H100 and H200. As soon as the end of 2024, they will be joined by the B200, which nearly quadruples the H100's performance on a per-GPU basis.Sources: Nvidia, MLPerf inferencing v4.1 results for Llama2-70B

AMD: The other GPU maker

Pro: AMD GPUs are convincing Nvidia alternatives

Con: Software ecosystem can't rival Nvidia's CUDA

AMD has battled Nvidia in the graphics-chip arena for nearly two decades. It's been, at times, a lopsided fight. When it comes to graphics, AMD's GPUs have rarely beaten Nvidia's in sales or mindshare. Still, AMD's hardware has its strengths. The company's broad GPU portfolio extends from integrated graphics for laptops to AI-focused data-center GPUs with over 150 billion transistors. The company was also an early supporter and adopter of high-bandwidth memory (HBM), a form of memory that's now essential to the world's most advanced GPUs.

If you look at the hardware...it stacks up favorably" to Nvidia, says Kimball, referring to AMD's Instinct MI325X, a competitor of Nvidia's H100. AMD did a fantastic job laying that chip out."

The MI325X, slated to launch by the end of the year, has over 150 billion transistors and 288 gigabytes of high-bandwidth memory, though real-world results remain to be seen. The MI325X's predecessor, the MI300X, earned praise from Microsoft, which deploys AMD hardware, including the MI300X, to handle some ChatGPT 3.5 and 4 services. Meta and Dell have also deployed the MI300X, and Meta used the chips in parts of the development of its latest large language model, Llama 3.1.

There's still a hurdle for AMD to leap: software. AMD offers an open-source platform, ROCm, to help developers program its GPUs, but it's less popular than CUDA. AMD is aware of this weakness, and in July 2024, it agreed to buy Europe's largest private AI lab, Silo AI, which has experience doing large-scale AI training using ROCm and AMD hardware. AMD has also plans to purchase ZT Systems, a company with expertise in data-center infrastructure, to help the company serve customers looking to deploy its hardware at scale. Building a rival to CUDA is no small feat, but AMD is certainly trying.

Intel: Software success

Pro: Gaudi 3 AI accelerator shows strong performance

Con: Next big AI chip doesn't arrive until late 2025

Intel's challenge is the opposite of AMD's.

While Intel lacks an exact match for Nvidia's CUDA and AMD's ROCm, it launched an open-source unified programming platform, OneAPI, in 2018. Unlike CUDA and ROCm, OneAPI spans multiple categories of hardware, including CPUs, GPUs, and FPGAs. So it can help developers accelerate AI tasks (and many others) on any Intel hardware. Intel's got a heck of a software ecosystem it can turn on pretty easily," says Kimball.

Hardware, on the other hand, is a weakness, at least when compared to Nvidia and AMD. Intel's Gaudi AI accelerators, the fruit of Intel's 2019 acquisition of AI hardware startup Habana Labs, have made headway, and the latest, Gaudi 3, offers performance that's competitive with Nvidia's H100.

However, it's unclear precisely what Intel's next hardware release will look like, which has caused some concern. Gaudi 3 is very capable," says Patrick Moorhead, founder of Moor Insights & Strategy. But as of July 2024 there is no Gaudi 4," he says.

Intel instead plans to pivot to an ambitious chip, code-named Falcon Shores, with a tile-based modular architecture that combines Intel x86 CPU cores and Xe GPU cores; the latter are part of Intel's recent push into graphics hardware. Intel has yet to reveal details about Falcon Shores' architecture and performance, though, and it's not slated for release until late 2025.

Cerebras: Bigger is better

Pro: Wafer-scale chips offer strong performance and memory per chip

Con: Applications are niche due to size and cost

Make no mistake: AMD and Intel are by far the most credible challengers to Nvidia. They share a history of designing successful chips and building programming platforms to go alongside them. But among the smaller, less proven players, one stands out: Cerebras.

The company, which specializes in AI for supercomputers, made waves in 2019 with the Wafer Scale Engine, a gigantic, wafer-size piece of silicon packed with 1.2 trillion transistors. The most recent iteration, Wafer Scale Engine 3, ups the ante to 4 trillion transistors. For comparison, Nvidia's largest and newest GPU, the B200, has just" 208 billion transistors. The computer built around this wafer-scale monster, Cerebras's CS-3, is at the heart of the Condor Galaxy 3, which will be an 8-exaflop AI supercomputer made up of 64 CS-3s. G42, an Abu Dhabi-based conglomerate that hopes to train tomorrow's leading-edge large language models, will own the system.

It's a little more niche, not as general purpose," says Stacy Rasgon, senior analyst at Bernstein Research. Not everyone is going to buy [these computers]. But they've got customers, like the [United States] Department of Defense, and [the Condor Galaxy 3] supercomputer."

Cerebras's WSC-3 isn't going to challenge Nvidia, AMD, or Intel hardware in most situations; it's too large, too costly, and too specialized. But it could give Cerebras a unique edge in supercomputers, because no other company designs chips on the scale of the WSE.

SambaNova: A transformer for transformers

Pro: Configurable architecture helps developers squeeze efficiency from AI models

Con: Hardware still has to prove relevance to mass market

SambaNova, founded in 2017, is another chip-design company tackling AI training with an unconventional chip architecture. Its flagship, the SN40L, has what the company calls a reconfigurable dataflow architecture" composed of tiles of memory and compute resources. The links between these tiles can be altered on the fly to facilitate the quick movement of data for large neural networks.

Prendki believes such customizable silicon could prove useful for training large language models, because AI developers can optimize the hardware for different models. No other company offers that capability, she says.

SambaNova is also scoring wins with SambaFlow, the software stack used alongside the SN40L. At the infrastructure level, SambaNova is doing a good job with the platform," says Moorhead. SambaFlow can analyze machine learning models and help developers reconfigure the SN40L to accelerate the model's performance. SambaNova still has a lot to prove, but its customers include SoftBank and Analog Devices.

Groq: Form for function

Pro: Excellent AI inference performance

Con: Application currently limited to inference

Yet another company with a unique spin on AI hardware is Groq. Groq's approach is focused on tightly pairing memory and compute resources to accelerate the speed with which a large language model can respond to prompts.

Their architecture is very memory based. The memory is tightly coupled to the processor. You need more nodes, but the price per token and the performance is nuts," says Moorhead. The token" is the basic unit of data a model processes; in an LLM, it's typically a word or portion of a word. Groq's performance is even more impressive, he says, given that its chip, called the Language Processing Unit Inference Engine, is made using GlobalFoundries' 14-nanometer technology, several generations behind the TSMC technology that makes the Nvidia H100.

In July, Groq posted a demonstration of its chip's inference speed, which can exceed 1,250 tokens per second running Meta's Llama 3 8-billion parameter LLM. That beats even SambaNova's demo, which can exceed 1,000 tokens per second.

Qualcomm: Power is everything

Pro: Broad range of chips with AI capabilities

Con: Lacks large, leading-edge chips for AI training

Qualcomm, well known for the Snapdragon system-on-a-chip that powers popular Android phones like the Samsung Galaxy S24 Ultra and OnePlus 12, is a giant that can stand toe-to-toe with AMD, Intel, and Nvidia.

But unlike those peers, the company is focusing its AI strategy more on AI inference and energy efficiency for specific tasks. Anton Lokhmotov, a founding member of the AI benchmarking organization MLCommons and CEO of Krai, a company that specializes in AI optimization, says Qualcomm has significantly improved the inference of the Qualcomm Cloud AI 100 servers in an important benchmark test. The servers' performance increased from 180 to 240 samples-per-watt in ResNet-50, an image-classification benchmark, using essentially the same server hardware," Lokhmotov notes.

Efficient AI inference is also a boon on devices that need to handle AI tasks locally without reaching out to the cloud, says Lokhmotov. Case in point: Microsoft's Copilot Plus PCs. Microsoft and Qualcomm partnered with laptop makers, including Dell, HP, and Lenovo, and the first Copilot Plus laptops with Qualcomm chips hit store shelves in July. Qualcomm also has a strong presence in smartphones and tablets, where its Snapdragon chips power devices from Samsung, OnePlus, and Motorola, among others.

Qualcomm is an important player in AI for driver assist and self-driving platforms, too. In early 2024, Hyundai's Mobius division announced a partnership to use the Snapdragon Ride platform, a rival to Nvidia's Drive platform, for advanced driver-assist systems.

The Hyperscalers: Custom brains for brawn

Pros: Vertical integration focuses design

Cons: Hyperscalers may prioritize their own needs and uses first

Hyperscalers-cloud-computing giants that deploy hardware at vast scales-are synonymous with Big Tech. Amazon, Apple, Google, Meta, and Microsoft all want to deploy AI hardware as quickly as possible, both for their own use and for their cloud-computing customers. To accelerate that, they're all designing chips in-house.

Google began investing in AI processors much earlier than its competitors: The search giant's Tensor Processing Units, first announced in 2015, now power most of its AI infrastructure. The sixth generation of TPUs, Trillium, was announced in May and is part of Google's AI Hypercomputer, a cloud-based service for companies looking to handle AI tasks.

Prendki says Google's TPUs give the company an advantage in pursuing AI opportunities. I'm lucky that I don't have to think too hard about where I get my chips," she says. Access to TPUs doesn't entirely eliminate the supply crunch, though, as different Google divisions still need to share resources.

And Google is no longer alone. Amazon has two in-house chips, Trainium and Inferentia, for training and inference, respectively. Microsoft has Maia, Meta has MTIA, and Apple is supposedly developing silicon to handle AI tasks in its cloud infrastructure.

None of these compete directly with Nvidia, as hyperscalers don't sell hardware to customers. But they do sell access to their hardware through cloud services, like Google's AI Hypercomputer, Amazon's AWS, and Microsoft's Azure. In many cases, hyperscalers offer services running on their own in-house hardware as an option right alongside services running on hardware from Nvidia, AMD, and Intel; Microsoft is thought to be Nvidia's largest customer.

an-illustration-of-a-knight-holding-a-crown-surrounded-by-arrows.png?id=53625862&width=980 David Plunkert

Chinese chips: An opaque future

Another category of competitor is born not of technical needs but of geopolitical realities. The United States has imposed restrictions on the export of AI hardware that prevents chipmakers from selling their latest, most-capable chips to Chinese companies. In response, Chinese companies are designing homegrown AI chips.

Huawei is a leader. The company's Ascend 910B AI accelerator, designed as an alternative to Nvidia's H100, is in production at Semiconductor Manufacturing International Corp., a Shanghai-based foundry partially owned by the Chinese government. However, yield issues at SMIC have reportedly constrained supply. Huawei is also selling an AI-in-a-box" solution, meant for Chinese companies looking to build their own AI infrastructure on-premises.

To get around the U.S. export control rules, Chinese industry could turn to alternative technologies. For example, Chinese researchers have made headway in photonic chips that use light, instead of electric charge, to perform calculations. The advantage of a beam of light is you can cross one [beam with] another," says Prendki. So it reduces constraints you'd normally have on a silicon chip, where you can't cross paths. You can make the circuits more complex, for less money." It's still very early days for photonic chips, but Chinese investment in the area could accelerate its development.

Room for more

It's clear that Nvidia has no shortage of competitors. It's equally clear that none of them will challenge-never mind defeat-Nvidia in the next few years. Everyone interviewed for this article agreed that Nvidia's dominance is currently unparalleled, but that doesn't mean it will crowd out competitors forever.

Listen, the market wants choice," says Moorhead. I can't imagine AMD not having 10 or 20 percent market share, Intel the same, if we go to 2026. Typically, the market likes three, and there we have three reasonable competitors." Kimball says the hyperscalers, meanwhile, could challenge Nvidia as they transition more AI services to in-house hardware.

And then there's the wild cards. Cerebras, SambaNova, and Groq are the leaders in a very long list of startups looking to nibble away at Nvidia with novel solutions. They're joined by dozens of others, including d-Matrix, Untether, Tenstorrent, and Etched, all pinning their hopes on new chip architectures optimized for generative AI. It's likely many of these startups will falter, but perhaps the next Nvidia will emerge from the survivors.

Source	RSS or Atom Feed
Feed Location	http://feeds.feedburner.com/IeeeSpectrum
Feed Title	IEEE Spectrum
Feed Link	https://spectrum.ieee.org/