Microsoft and Nvidia Create 105-Layer, 530 Billion Parameter Language Model That Needs 280 A100 GPUs
by martyb on (#5QN6R)
takyon writes:Microsoft and Nvidia create 105-layer, 530 billion parameter language model that needs 280 A100 GPUs, but it's still biased