AI Models Are Undertrained by 100-1000 Times – AI Will Be Better With More Training Resources
by Brian Wang from NextBigFuture.com on (#6NPQ5)
The Chinchilla compute optimal point for an 8B (8 billion parameter) model would be train it for ~200B (billion) tokens. (if you were only interested to get the most bang-for-the-buck" w.r.t. model performance at that size). So this is training ~75X beyond that point, which is unusual but personally, [Karpathy] thinks this is extremely welcome. ...