Cerebras gives waferscale chips inferencing twist, claims 1,800 token per sec generation rates

from www.theregister.com - Articles on 2024-08-27 16:00 (#6Q9B1)

Faster than you can read? More like blink and you'll miss the hallucination

Hot Chips Inference performance in many modern generative AI workloads is usually a function of memory bandwidth rather than compute. The faster you can shuttle bits in and out of a high-bandwidth memory (HBM) the faster the model can generate a response....

Source	RSS or Atom Feed
Feed Location	http://www.theregister.co.uk/headlines.atom
Feed Title	www.theregister.com - Articles
Feed Link	https://www.theregister.com/