FlashAttention: Fast Transformer training with long sequences by from Hacker News on 2023-10-01 11:23 (#6F75C) Comments