Cerebras Inference now 3x faster: Llama3.1-70B breaks 2,100 tokens/s from Hacker News on 2024-10-25 03:04 (#6RQRE) Comments
Llama 3.1 405B now runs at 969 tokens/s on Cerebras Inference from Hacker News on 2024-11-19 00:15 (#6SANS) Comments