Intel's mysterious new datacenter GPU is what Nvidia's Rubin CPX nearly was

from www.theregister.com - Articles on 2026-06-04 13:00 (#7634Y)

COMPUTEX 2026 Intel offered new insights into its next-gen datacenter GPU codenamed Crescent Island. Alongside supporting enterprise AI deployments, the GPU could fill the void left by Nvidia's Rubin CPX GPUs, which were seemingly shelved late last year following its acquisition of Groq. As datacenter GPUs go, Intel's Crescent Island is certainly an odd duck. It'll ship in a PCIe form factor when most high-end GPUs are now using socketed designs. It also won't use HBM or even GDDR memory. Instead, Intel has opted for LPDDR5x memory - the same kind used in high-end notebooks and smartphones - and quite a bit of it too. Crescent Island will be offered with up to 480 GB of memory, significantly more than you'll find on Nvidia's flagship GPUs, which currently top out at 288 GB. It's also cheap, at least relative to HBM or GDDR, which should keep prices down in spite of the global semiconductor supply chain, which has seen memory prices surge by more than 3x since last year. The one thing that LPDDR5x isn't is fast. Intel hasn't shared bandwidth figures just yet but, assuming a large 1024-bit memory bus, we're looking at around 1.2 TB/s. Crescent Island's actual bandwidth will depend heavily on how wide the memory bus actually ends up being, but for reference, Nvidia and AMD's latest GPUs are pushing 20 TB/s. How quickly a GPU can churn out tokens is largely determined by how fast the memory is, making bandwidth a major bottleneck. Or at least that was the case. The past year we've seen a shift toward disaggregated compute architectures which break inference into two phases: prefill and decode. Prefill is a compute-heavy phase of the pipeline. If you've ever used an AI chatbot, you've experienced prefill as the wait between submitting a prompt and when the model starts to respond. The faster the compute, the shorter the wait. While prefill operations still consume a large quantity of memory, they're mostly compute bound, which means you can get away with using slower GDDR or LPDDR memory rather than pricy HBM. This was the idea behind Nvidia's Rubin CPX when it was announced late last summer. The Accelerator promised 128 GB of GDDR7 memory and up to 30 petaFLOPS of NVFP4 performance. For context, heavy workloads that required ingesting massive quantities of tokens - code assistants for example - prefill operations would be offloaded to CPX accelerators while token generation would continue to run on Nvidia's HBM4-equipped Vera Rubin Superchips. With AI agents rapidly driving up the number of input tokens, the architecture made a lot of sense. Yet, by March Nvidia had shelved the idea in order to prioritize its new Groq LPU-based LPX racks. Announced at GTC, LPX addressed the opposite end of the spectrum. Rather than accelerating prefill, Nvidia's Groq accelerators aimed to improve user experiences and inference economics by juicing token generation. But, the use case for something like a Rubin CPX hasn't gone away. In a round table with press this Spring, Ian Buck VP of Hyperscale and HPC at Nvidia said CPX was still a good idea and we may see the concept resurface in future generations. Intel clearly sees an opportunity to fill the void. The company, which has grown closer to Nvidia since CEO Lip Bu Tan took the reins last year, hasn't said much about Crescent Island's intended use case but Intel has suggested that Nvidia Dynamo was coming to the platform. Dynamo is Nvidia's framework for disaggregating prefill and decode across multiple GPUs. Whether Crescent Island actually makes sense for this use case will depend heavily on its performance profile, something for which we have very few data points right now. Intel hasn't shared FLOPS figures yet, but we know the GPU will use its Xe-3P microarchitecture which adds support for FP8 and FP4 datatypes, and will ship as a 350 watt Air-cooled PCIe card. While Intel has signaled support for disaggregated inference via Dynamo, it's not the company's only option. Back in February, Intel and friends funneled $350 million into AI chip startup SambaNova. Then in April, the company revealed plans for a disaggregated inference platform using Intel Xeons, SambaNova RDUs, and what turned out to be Nvidia GPUs. That platform went live this week. However, there is no reason that Intel couldn't use something like LLMd - the open source, open vendor contemporary to Dynamo - to combine its own GPUs with SambaNova RDUs instead. (R)

Source	RSS or Atom Feed
Feed Location	http://www.theregister.co.uk/headlines.atom
Feed Title	www.theregister.com - Articles
Feed Link	https://www.theregister.com/