NVIDIA Reports Blackwell Surpasses 1000 TPS/User Barrier with Meta’s Llama 4 Maverick
by staff from High-Performance Computing News Analysis | insideHPC on (#6XGCK)

NVIDIA said it has achieved a record large language model (LLM) inference speed, announcing that an NVIDIA DGX B200 node with eight NVIDIA Blackwell GPUs achieved more than 1,000tokens per second (TPS) per user on the 400-billion-parameter Llama 4 Maverick model. NVIDIA said the model is the largest and most powerful in the Llama 4 [...]
The post NVIDIA Reports Blackwell Surpasses 1000 TPS/User Barrier with Meta's Llama 4 Maverick appeared first on High-Performance Computing News Analysis | insideHPC.