Intel and pals cram 36,864 CPU cores into a 100kW rack while chasing the agentic AI dragon

from www.theregister.com - Articles on 2026-06-02 09:37 (#761CE)

COMPUTEX 2026 Intel is working with Foxconn and other infrastructure providers to develop rack-scale reference designs based on the chipmaker's Xeon processors. Announced during Intel's Computex keynote on Tuesday, these blueprints aim to provide greater CPU compute densities for running AI agents at scale. While AI models predominantly run on GPUs and other AI accelerators, the agent harnesses, like OpenClaw, which are used to connect them to tools, terminal shells, code interpreters, and other APIs, still run on CPUs. Our customers are asking us to think at the system level to help them serve real agentic workloads at scale," Intel CEO Lip Bu Tan said. On stage, Tan revealed two examples of these blueprints. One is aimed at latency-sensitive agentic workloads and another designed for maximum density. Both designs support up to 128 of either Intel's 128-core Granite Rapids Xeon 6 or 288-core Clearwater Forest Xeon 6+ processors, totaling between 16,384 P-cores and 36,864 E-cores, alongside up to 384 TB of DDR5 in a 100kW power envelope. The reference designs come just months after Nvidia announced a similar rack-scale CPU platform packing 256 of its 88-core Vera CPUs. Arm is also working on a pair of rack-scale reference designs for agentic workloads based on its new AGI CPUs: a 36 kW air-cooled system with 8,160 cores and a 200 kW liquid cooled rack with 45,696 cores. Tan expects systems based on these reference designs to be broadly available from its ODM and OEM partners. Alongside agentic AI workloads, the company also revealed that newly launched inference cloud provider Vector Core Compute will be among the first to deploy the platform, and that Together.AI is its first commercial customer. The approach is based on Intel's earlier disaggregated AI blueprint it co-developed with partner SambaNova. The architecture desegregates compute heavy prefill operations to Nvidia GPUs while using SambaNova's AI accelerators for bandwidth-intensive decode operations to boost per-user token output by between 2-3x. If that sounds familiar it's not dissimilar to what Nvidia is doing with Groq's LPUs or what AWS is doing with Trainium and Cerebra's waferscale AI accelerators.(R)

Source	RSS or Atom Feed
Feed Location	http://www.theregister.co.uk/headlines.atom
Feed Title	www.theregister.com - Articles
Feed Link	https://www.theregister.com/