Article 75F3F ZAYA1-8B: An 8B Moe Model with 760M Active Params Matching DeepSeek-R1 on Math

ZAYA1-8B: An 8B Moe Model with 760M Active Params Matching DeepSeek-R1 on Math

by
from Hacker News on (#75F3F)
Story ImageComments
External Content
Source RSS or Atom Feed
Feed Location http://news.ycombinator.com/rss
Feed Title Hacker News
Feed Link https://news.ycombinator.com/
Reply 0 comments