Thumbnail 1551469
thumbnail
Large (256x256)

Articles

Benchmarks show even an old Nvidia RTX 3090 is enough to serve LLMs to thousands
For 100 concurrent users, the card delivered 12.88 tokens per second-just slightly faster than average human reading speed If you want to scale a large language model (LLM) to a few thousand users, you might think a beefy enterprise GPU is a hard requirement. However, at least according to Backprop, all you actually need is a four-year-old graphics card....
1