Cerebras To Enable 'Condor Galaxy' Network of AI Supercomputers

BeauHD

from Slashdot on 2023-07-21 23:40 (#6D669)

Cerebras Systems and G42 have introduced the Condor Galaxy project, a network of nine interconnected supercomputers designed for AI model training with a combined performance of 36 FP16 ExaFLOPs. The first supercomputer, CG-1, located in California, offers 4 ExaFLOPs of FP16 performance and 54 million cores, focusing on Large Language Models and Generative AI without the need for complex distributed programming languages. AnandTech reports: CG-2 and CG-3 will be located in the U.S. and will follow in 2024. The remaining systems will be located across the globe and the total cost of the project will be over $900 million. The CG-1 supercomputer, situated in Santa Clara, California, combines 64 Cerebras CS-2 systems into a single user-friendly AI supercomputer, capable of providing 4 ExaFLOPs of dense, systolic FP16 compute for AI training. Based around Cerebras's 2.6 trillion transistor second-generation wafer scale engine processors, the machine is designed specifically for Large Language Models and Generative AI. It supports up to 600 billion parameter models, with configurations that can be expanded to support up to 100 trillion parameter models. Its 54 million AI-optimized compute cores and massivefabric network bandwidth of 388 Tb/s allow for nearly linear performance scaling from 1 to 64 CS-2 systems, according to Cerebras. The CG-1 supercomputer also offers inherent support for long sequence length training (up to 50,000 tokens) and does not require any complex distributed programming languages, which is common in case of GPU clusters. This supercomputer is provided as a cloud service by Cerebras and G42 and since it is located in the U.S., Cerebras and G42 assert that it will not be used by hostile states. CG-1 is the first of three 4 FP16 ExaFLOP AI supercomputers (CG-1, CG-2, and CG-3) created by Cerebras and G42 in collaboration and located in the U.S. Once connected, these three AI supercomputers will form a 12 FP16 ExaFLOP, 162 million core distributed AI supercomputer, though it remains to be seen how efficient this network will be. In 2024, G42 and Cerebras plan to launch six additional Condor Galaxy supercomputers across the world, which will increase the total compute power to 36 FP16 ExaFLOPs delivered by 576 CS-2 systems. The Condor Galaxy project aims to democratize AI by offering sophisticated AI compute technology in the cloud. "Delivering 4 exaFLOPs of AI compute at FP16, CG-1 dramatically reduces AI training timelines while eliminating the pain of distributed compute," said Andrew Feldman, CEO of Cerebras Systems. "Many cloud companies have announced massive GPU clusters that cost billions of dollars to build, but that are extremely difficult to use. Distributing a single model over thousands of tiny GPUs takes months of time from dozens of people with rare expertise. CG-1 eliminates this challenge. Setting up a generative AI model takes minutes, not months and can be done by a single person. CG-1 is the first of three 4 ExaFLOP AI supercomputers to be deployed across the U.S. Over the next year, together with G42, we plan to expand this deployment and stand up a staggering 36 exaFLOPs of efficient, purpose-built AI compute."

Source	RSS or Atom Feed
Feed Location	https://rss.slashdot.org/Slashdot/slashdotMain
Feed Title	Slashdot
Feed Link	https://slashdot.org/
Feed Copyright	Copyright Slashdot Media. All Rights Reserved.