Article 75RCD AMD says its $4K Ryzen AI Halo workstation practically pays for itself

AMD says its $4K Ryzen AI Halo workstation practically pays for itself

by
from www.theregister.com - Articles on (#75RCD)
Story ImageAMD's answer to Nvidia's DGX Spark AI workstations, codenamed the Ryzen AI Halo, will be available for pre-order later next month for anyone with $3,999 burning a hole in their pocket. That might sound like a lot for an AI mini PC, but don't worry. Compared to cloud APIs, it practically pays for itself. Or, well, that's AMD's sales pitch. The House of Zen argues that if you spend eight hours a day vibe coding, the system could save you $750 a month. Whether this helps you justify paying for hardware that less than a year ago could be found for between $2,200 and $2,999 or not, it's (probably) not AMD being greedy here; the RAMpocalypse has been hard on everyone. Much like the DGX Spark, which now retails for $4,699, up from $3,999 when we reviewed it last fall, AMD's rendition aims to provide a curated developer environment for running local models and agentic AI frameworks. This is really the core value proposition behind either of these devices. They aren't the most powerful or fastest AI systems, but they're able to run models that a few years ago would have cost $20K or more. A little box of TOPS The diminutive system measures in at 5.9 x 5.9 x 1.7 inches (150 x 150 x 43 mm) and is powered by a 120 watt Ryzen AI Max+ 395 APU, better known by its codename Strix Halo. The chip is backed by 128 GB of LPDDR5x 8000 MT/s memory, which feeds both its 16 Zen 5 cores and 40 RDNA 3.5 GPU compute units, providing up to 256 GB/s of bandwidth, more than a Ryzen 9000 Threadripper (non-Pro) system. For local AI enthusiasts, that's enough to run models up to 200 billion parameters in size at 4-bit precision - just like the more expensive Spark. The bulk of the Ryzen AI Halo's compute comes from its integrated graphics, which are capable of delivering roughly 56 teraFLOPS at 16-bit precision. While impressive for onboard graphics, that's still between 55 and 88 percent slower than what the DGX Spark advertises. Unlike the Spark's Blackwell-based GB10 APU, Strix Halo doesn't support FP8 or FP4 data types in hardware. At BF16, the Spark delivers 125, at FP8 250, and FP4 500 teraFLOPS. Double those figures if you happen to find a workload that can leverage Nvidia's 4:2 sparsity. That performance discrepancy won't necessarily be obvious in every workload. In fact, in LLM inference, AMD claims the AI Halo generates tokens 4-14 percent faster than the Spark. The lower end of that roughly matches what we saw when we pitted the Spark against a similarly equipped HP Z2 Mini G1a back in December. The G1a packs the same silicon as AI Halo, and in Llama.cpp with the Vulkan backend, eked out a small but meaningful lead over the Spark in tokens per second generated. However, the speed any GPU can generate tokens at is largely dictated by effective memory bandwidth, not floating point performance. GPU compute has a much bigger impact on things like prompt processing time. In our testing, the Spark's more capable tensor cores gave it a 2-3x lead in prompt processing. For shorter prompts, this isn't all that noticeable, usually the difference between waiting 100 ms versus 200 ms or 300 ms, but for longer prompts, it did become more pronounced. We saw the Spark take similar leads in our image generation and fine tuning benchmarks, but it's worth noting that AMD's software stack has matured greatly since our initial review and the performance gap has likely closed somewhat since then. AMD's AI Halo does have two things going for it that can't be said of the Spark. Alongside the GPU is an XDNA 2-based neural processing unit (NPU) that AMD rates for 50 TOPS. What good that'll do you depends heavily on the application in question. Many content creation apps have now been updated to take advantage of it, but the number of generative AI inference engines that could properly harness it was quite limited the last time we looked. The second thing AMD's Ryzen AI Halo has going for it is that it's a standard x86 box at its heart, and you can run Windows or your preferred flavor of Linux on it if that's more your style. On the Spark, you're stuck with a lightly customized version of Ubuntu 24.04. Beyond that, you're coloring outside the lines. Particularly for developers building for Microsoft's NPU-accelerated AI PC ecosystem, this is an obvious advantage. In terms of networking, AMD's Spark-clone falls a bit flat. One of the hallmark features of Nvidia's AI workstation is a 200 Gbps ConnectX-7 NIC, which allows for clustering of up to two and eventually four systems. AMD's AI Halo has a single 10 Gbps NIC, which should help with downloading large model files in a timely manner. In theory, the system should be able to achieve high-speed networking over USB-4, but it's not clear whether this is actually a supported use case. That said, Apple has already demonstrated just this using RDMA over Thunderbolt, so it should work so long as AMD has a playbook for configuring RDMA on its systems. AMD's own AI lab As we mentioned earlier, much of the Ryzen AI Halo's value proposition comes from being validated hardware with well documented playbooks for common use cases and known good software. Finding the right combination of device drivers, ROCm, HIP, SYCL, CUDA, PyTorch, TensorFlow, and JAX has long plagued the AI/ML devs regardless of which ecosystem you opt for. Having validated environments for workloads, whether it be vLLM, Llama.cpp, Ollama, ComfyUI, or something else ensures users spend more time doing something productive than debugging mismatched dependencies. At launch, AMD says the Ryzen AI Halo will ship with five preinstalled playbooks, with another 10 available online and additional playbooks to be added monthly. Additionally, customers will gain access to AMD's developer program, cloud credits, and exclusive playbooks. More memory on the way The 128 GB Ryzen AI Halo will be available for pre-order next month starting at $3,999, but if that isn't enough for you, AMD is already prepping a higher capacity version of the system with 192 GB of memory on board. That system will feature a refreshed Ryzen APU in the AI Max+ 495, which just like the rest of AMD's 400-series lineup gets a modest clock bump to the CPU, GPU, and NPU, and not a whole lot else. Still, 192 GB of unified memory opens the door to even larger, more capable models, if you can stomach the presumably higher asking price. (R)
External Content
Source RSS or Atom Feed
Feed Location http://www.theregister.co.uk/headlines.atom
Feed Title www.theregister.com - Articles
Feed Link https://www.theregister.com/
Reply 0 comments