Patents Enabling Millions of GPUs for AI With Microsecond Timing WITHOUT Software Control
by Brian Wang from NextBigFuture.com on (#6SXYF)
Tesla and xAI have been to scale coherent GPU AI clusters beyond the 33,000 GPU limit by NOT synchronize all nodes simultaneously. Synchronizing all nodes becomes increasingly challenging at scale - the system implements a partition-based architecture with coordinated timing offsets. They communicate with an ethernet based network using a transport layer without software control ...