Llama 3.1 405B now runs at 969 tokens/s on Cerebras Inference by from Hacker News on 2024-11-19 00:15 (#6SANS) Comments