NVIDIA Reports Blackwell Surpasses 1000 TPS/User Barrier with Meta’s Llama 4 Maverick

staff

from on 2025-05-23 16:10 (#6XGCK)

NVIDIA said it has achieved a record large language model (LLM) inference speed, announcing that an NVIDIA DGX B200 node with eight NVIDIA Blackwell GPUs achieved more than 1,000tokens per second (TPS) per user on the 400-billion-parameter Llama 4 Maverick model. NVIDIA said the model is the largest and most powerful in the Llama 4 [...]

The post NVIDIA Reports Blackwell Surpasses 1000 TPS/User Barrier with Meta's Llama 4 Maverick appeared first on High-Performance Computing News Analysis | insideHPC.

Source	RSS or Atom Feed
Feed Location	http://insidehpc.com/feed/
Feed Title
Feed Link	http://insidehpc.com/