Cerebras Claims Fastest AI Inference
by staff from High-Performance Computing News Analysis | insideHPC on (#6Q9FW)
AI compute company Cerebras Systems today announced what it said is the fastest AI inference solution. Cerebras Inference delivers 1,800 tokens per second for Llama3.1 8B and 450 tokens per second for Llama3.1 70B, according to the company, making it 20 times faster than GPU-based solutions in hyperscale clouds.
The post Cerebras Claims Fastest AI Inference appeared first on High-Performance Computing News Analysis | insideHPC.