Red Pajama Is a 1.2 Trillion Token Large Language Model
by Brian Wang from NextBigFuture.com on (#6AZEV)
RedPajama is a project to create a set of leading, fully open-source models. Today, they announced the completion of the first step of this project: the reproduction of the LLaMA training dataset of over 1.2 trillion tokens. AI is having its Linux moment. Stable Diffusion showed that open-source can not only rival the quality of ...