Meta (formerly Facebook) has just released Voicebox, a state-of-the-art generative AI model that’s revolutionizing voice generation.
It is a text-to-speech AI tool that is multilingual and the quality is unbelievably good.
What can you do with Voicebox?
1. In-context text-to-speech synthesis
Think of this like a parrot that’s learned to mimic your voice. All it needs is a clip of your speech. Then, you can type anything you want, and it will read it out in your voice.
2. Speech editing and noise reduction
Imagine you’ve recorded a beautiful birthday message for a friend, but a car honked loudly in the background. Instead of re-recording the whole thing, Voicebox can simply ‘erase’ that car honk from your message.
Similarly, if you stumble on a word or say something wrong, you don’t need to start over. Voicebox can fix those mistakes in your original voice.
3. Cross-lingual style transfer
Suppose you speak English, but you want to surprise your Spanish-speaking friend with a birthday message in their language. You can type your message in Spanish, and Voicebox will read it out loud in your voice, even though the original recording you provided was in English.
4. Diverse speech sampling
People all around the world talk differently, right? With different accents, tones, and styles. Voicebox learns from a wide range of these speech patterns in six languages.
So, it can generate a realistic speech that sounds just like a native speaker in English, French, Spanish, German, Polish, or Portuguese. This could make things like your GPS or virtual assistant sound much more natural and familiar.
Who could use this tool?
The applications of Voicebox are wide-ranging and extend to various audiences.
- Content creators: Voicebox can be a powerful tool for audio editing and creation. It can help creators produce high-quality audio tracks for videos without needing to re-record entire segments due to minor disturbances or errors.
- Visually impaired individuals: Voicebox can transform written messages from friends into high-quality audio read in their voices, making digital communication more accessible.
- Podcasters: With its speech editing and noise reduction capabilities, podcasters can seamlessly edit their recorded episodes. Whether it’s removing background noise or correcting mispronounced words, Voicebox can ensure a clean, professional-sounding podcast without the need for re-recording.
Is Voicebox available to the public?
As of now, Meta has not made the Voicebox model or code publicly available.
This is primarily due to concerns about the potential misuse of the technology. Can you imagine what prank calls are going to be like in the future?
For that reason, public access to Voicebox is not yet available.
I want to learn more about AI
If you want to stay up to date with the latest AI tools and updates (and how to use them to your advantage), make sure you are subscribed to the WGMI newsletter.