Benchmarks show even an old Nvidia RTX 3090 is enough to serve LLMs to thousands

from The Register on 2024-08-23 21:00 (#6Q6W0)

For 100 concurrent users, the card delivered 12.88 tokens per second-just slightly faster than average human reading speed

If you want to scale a large language model (LLM) to a few thousand users, you might think a beefy enterprise GPU is a hard requirement. However, at least according to Backprop, all you actually need is a four-year-old graphics card....

Source	RSS or Atom Feed
Feed Location	http://www.theregister.co.uk/headlines.atom
Feed Title	The Register
Feed Link	https://www.theregister.com/
Feed Copyright	Copyright © 2025, Situation Publishing