Cerebras Claims Fastest AI Inference

staff

from Inside HPC & AI News | High-Performance Computing & Artificial Intelligence on 2024-08-27 19:40 (#6Q9FW)

AI compute company Cerebras Systems today announced what it said is the fastest AI inference solution. Cerebras Inference delivers 1,800 tokens per second for Llama3.1 8B and 450 tokens per second for Llama3.1 70B, according to the company, making it 20 times faster than GPU-based solutions in hyperscale clouds.

The post Cerebras Claims Fastest AI Inference appeared first on High-Performance Computing News Analysis | insideHPC.

Source	RSS or Atom Feed
Feed Location	http://insidehpc.com/feed/
Feed Title	Inside HPC & AI News \| High-Performance Computing & Artificial Intelligence
Feed Link	https://insidehpc.com/