AI Models Are Undertrained by 100-1000 Times – AI Will Be Better With More Training Resources

Brian Wang

from NextBigFuture.com on 2024-06-21 23:31 (#6NPQ5)

The Chinchilla compute optimal point for an 8B (8 billion parameter) model would be train it for ~200B (billion) tokens. (if you were only interested to get the most bang-for-the-buck" w.r.t. model performance at that size). So this is training ~75X beyond that point, which is unusual but personally, [Karpathy] thinks this is extremely welcome. ...

Source	RSS or Atom Feed
Feed Location	http://feeds.feedburner.com/blogspot/advancednano
Feed Title	NextBigFuture.com
Feed Link	https://www.nextbigfuture.com/