Article 6BF67 MaMMUT: A simple vision-encoder text-decoder architecture for multimodal tasks

MaMMUT: A simple vision-encoder text-decoder architecture for multimodal tasks

by
from Hacker News on (#6BF67)
Comments
External Content
Source RSS or Atom Feed
Feed Location http://news.ycombinator.com/rss
Feed Title Hacker News
Feed Link https://news.ycombinator.com/
Reply 0 comments