China's Secretive Sunway Pro CPU Quadruples Performance Over Its Predecessor
An anonymous reader shares a report: Earlier this year, the National Supercomputing Center in Wuxi (an entity blacklisted in the U.S.) launched its new supercomputer based on the enhanced China-designed Sunway SW26010 Pro processors with 384 cores. Sunway's SW26010 Pro CPU not only packs more cores than its non-Pro SW26010 predecessor, but it more than quadrupled FP64 compute throughput due to microarchitectural and system architecture improvements, according to Chips and Cheese. However, while the manycore CPU is good on paper, it has several performance bottlenecks. The first details of the manycore Sunway SW26010 Pro CPU and supercomputers that use it emerged back in 2021. Now, the company has showcased actual processors and disclosed more details about their architecture and design, which represent a significant leap in performance, recently at SC23. The new CPU is expected to enable China to build high-performance supercomputers based entirely on domestically developed processors. Each Sunway SW26010 Pro has a maximum FP64 throughput of 13.8 TFLOPS, which is massive. For comparison, AMD's 96-core EPYC 9654 has a peak FP64 performance of around 5.4 TFLOPS. The SW26010 Pro is an evolution of the original SW26010, so it maintains the foundational architecture of its predecessor but introduces several key enhancements. The new SW26010 Pro processor is based on an all-new proprietary 64-bit RISC architecture and packs six core groups (CG) and a protocol processing unit (PPU). Each CG integrates 64 2-wide compute processing elements (CPEs) featuring a 512-bit vector engine as well as 256 KB of fast local store (scratchpad cache) for data and 16 KB for instructions; one management processing element (MPE), which is a superscalar out-of-order core with a vector engine, 32 KB/32 KB L1 instruction/data cache, 256 KB L2 cache; and a 128-bit DDR4-3200 memory interface.
Read more of this story at Slashdot.