Introducing Translatotron: an end-to-end speech-to-speech translation model
In "Direct speech-to-speech translation with a sequence-to-sequence model", we propose an experimental new system that is based on a single attentive sequence-to-sequence model for direct speech-to-speech translation without relying on intermediate text representation. Dubbed Translatotron, this system avoids dividing the task into separate stages, providing a few advantages over cascaded systems, including faster inference speed, naturally avoiding compounding errors between recognition and translation, making it straightforward to retain the voice of the original speaker after translation, and better handling of words that do not need to be translated (e.g., names and proper nouns). As a translator, I feel less and less job-secure every time Google I/O rolls around.