Hierarchical transformers are more efficient language models by from Hacker News on 2021-11-04 21:56 (#5RH9P) Comments