Article 6Q9B1 Cerebras gives waferscale chips inferencing twist, claims 1,800 token per sec generation rates

Cerebras gives waferscale chips inferencing twist, claims 1,800 token per sec generation rates

by
from The Register on (#6Q9B1)
Story ImageFaster than you can read? More like blink and you'll miss the hallucination

Hot Chips Inference performance in many modern generative AI workloads is usually a function of memory bandwidth rather than compute. The faster you can shuttle bits in and out of a high-bandwidth memory (HBM) the faster the model can generate a response....

External Content
Source RSS or Atom Feed
Feed Location http://www.theregister.co.uk/headlines.atom
Feed Title The Register
Feed Link https://www.theregister.com/
Feed Copyright Copyright © 2024, Situation Publishing
Reply 0 comments