Should we still use Text for Speech-to-Speech Translation? Promise meets Practice

Date:


Abstract

In this talk I describe the current benefits and limitations of techniques for direct speech-to-speech translation (S2ST). I discuss the work I did at Roblox on speaker-preserving cascaded speech to text translation systems and adaptations made to such systems to allow for simultaneous inference. I finish by outlining methods for preserving prosody through text and discuss the steps that need to be taken in order to develop robust direct S2ST systems.

Location

This talk was given on May 5th 2023 at the Human Language Technology Center of Excellence (HLTCOE) at John’s Hopkins University in Baltimore, Maryland as part of their Bi-weekly work in progress talk seminar.

Paper, Code, Poster