Basically using AI to translate speech for spoken language | Tech Rasta


AI-based speech translation has focused primarily on written languages, although the roughly 3,500 living languages ​​are primarily spoken and lack a widely used written system. This makes it impossible to build machine translation tools using standard techniques, which require large amounts of written text to train an AI model.

To address this challenge, we designed an AI-based speech-to-speech translation system for Hokkien, a primarily oral language that is widely spoken in the Chinese diaspora, but has no standard written form. We are open sourcing our Hokkien translation models, evaluation datasets and research papers so that others can reproduce and build on our work.

Chart showing the number of Hokkien speakers worldwide.

The translation system is part of us Universal Speech Translator The project, which is developing new AI techniques, we hope will eventually enable real-time speech-to-speech translation in multiple languages. We believe that spoken communication can bring people together wherever they are – even there Metaverse.

A new modeling approach

Many speech translation systems rely on transcriptions. However, because primarily oral languages ​​do not have standard written forms, the output of the translation is to produce an unworkable transcribed text. Therefore, we focus on speech-to-speech translation.

To do this, we developed several techniques, such as using speech-to-unit translation to translate input speech into a series of phonemes and generating waveforms from them, or relying on text from a related language, in this case Mandarin.

A chart showing the model architecture of the UnitY speech translation system.

Looking to the future of translation

The Hokkien translation model is still a work in progress and can only translate one complete sentence at a time, which is a step towards a future where simultaneous translation between languages ​​is possible. The techniques we have introduced can be extended to many other written and unwritten languages.

We’re also releasing Speech Matrix, a large collection of speech-to-speech translations powered by our innovative natural language processing toolkit. Called a laser. These tools enable other researchers to create their own speech-to-speech translation systems and build on our work. And our progress in what researchers refer to as unsupervised learning demonstrates the feasibility of generating high-quality speech-to-speech translation models without any human annotations. It helps to extend those models to work for languages ​​where labeled training data is not available to train the system.

Our AI research helps break down language barriers in both the physical world and the metaverse to promote connection and mutual understanding. We look forward to expanding our research and making this technology available to more people in the future.

Learn more about us AI-based speech translation.


Source link