Meta unveils Voicebox AI: Should we all be worried?

You’ve probably heard about deepfakes for images and videos. Those eerily realistic videos created with AI? Now, it seems Meta (formerly known as Facebook) has developed a new AI model called Voicebox that’s all about audio. It’s like a supercharged text-to-speech system that can create synthetic voices from just a text prompt.

Credit: Meta

What is Voicebox

At its core, Voicebox is an AI model that creates synthetic voices based on simple text prompts. In other words, you give it some text, and it will read it out loud in a voice that sounds human. It’s similar to the text-to-speech function you might use on your phone or computer, but it takes things to a whole new level.

One thing that sets Voicebox apart is its ability to replicate specific voice styles based on a very short audio sample – we’re talking as little as two seconds! This means you could potentially have a synthetic voice that sounds like your favorite celebrity or even your own voice. It’s almost like having a voice actor on demand, ready to read out anything you want in the voice style of your choosing.

Competing AI voice models

Speechify

Speechify and ElevenLabs are also players in the text-to-speech game. Speechify is an app that turns any text into audio. It can read books, articles, notes, emails, PDFs, images, and web pages aloud. Speechify also claims to offer voice cloning, voice editing, and voice sampling features. Speechify offers hundreds of free timeless audiobooks, has a desktop app, and is designed to help people with reading disabilities.

ElevenLabs

ElevenLabs, on the other hand, is a startup that uses AI to generate synthetic voices with context-relevant emotions and natural language understanding. They offer a platform for creating and customizing high-quality spoken audio in any voice and style for various industries, such as video games, animations, digital assistants, education, entertainment, advertising, and podcasting. They also have a tool for detecting synthetic voices and verifying their authenticity. ElevenLabs works with actors who provide their voice samples and get paid when their voice clones are used. They use proprietary deep learning models to create their AI-delivered speeches.

They’re both pretty cool, but they don’t quite have the same versatility as Voicebox, which can mimic real voices from just a few seconds of audio. It’s like comparing a Swiss Army knife to a few really good spoons. They all have their uses, but one is definitely more multipurpose.

The power of Voicebox

But it’s not just about creating fake voices. Voicebox can also tidy up your audio by removing annoying background noise – let’s say, a dog yapping while you’re trying to record. And it’s not just about English. This AI speaks French, Spanish, German, Polish, and Portuguese too, and can even translate passages from one language to another while keeping the same voice style.

MORE: MOVE OVER, SIRI: APPLE’S NEW AUDIOBOOK AI VOICE SOUNDS LIKE A HUMAN

Meta’s Voicebox: a breakthrough or a threat?

Unfortunately, or fortunately, depending on where you stand regarding AI, Meta isn’t planning to open source Voicebox right away. That’s got people wondering if they’re trying to avoid some potential issues. For example, AI voice tech can be used negatively, like in harassment campaigns. Or, it might be that Meta has some future plans to make some money off this model.

The source of Voicebox’s massive training data

One interesting thing about Voicebox is that it’s been trained on a ton of data—over 60,000 hours of speech from English audiobooks and another 50,000 hours from multilingual audiobooks. Meta says they used public domain audiobooks as their main data source, but they also used other sources such as podcasts, speeches, and radio shows. However, there are some challenges and limitations associated with using public-domain audiobooks, such as quality, consistency, alignment, and speaker identity. Meta claims that they have addressed some of these issues with their data processing and model design.

Credit: Mark Zuckerberg Facebook

The double-edged sword of technology

The rise of AI voices is a bit of a touchy subject, especially for voice actors and, more recently, writers. They’re worried about companies using AI to synthesize their voices without paying them. The audiobook market has been growing a lot, and companies are always looking to cut costs, so this could end up being another problem for voice professionals.

Don’t be mistaken, however; it’s not just about jobs. There are some real concerns about how deep fake voices can be used in scams. For instance, there was a case where a synthetic voice impersonating a CEO was used in a major heist. There’s also the worry that deepfake voices could be used to mess with things like voice-biometric systems, which are used for things like online banking.

You see, as cool as this technology sounds, there’s a darker side to it. Imagine getting a call from your boss asking you to transfer a massive sum of money to close out an account. You do as told because, well, it’s your boss. Except, it wasn’t. That’s right; it was a fake, synthetic voice created using AI that sounded just like your boss. Wild, isn’t it? But this isn’t some movie plot; it actually happened! This was one of the first times a fake voice was used in a heist, and it left law enforcement and AI experts scratching their heads.

MORE: DALLE-2 VS. BING CREATOR – WHICH COMES OUT ON TOP IN THIS AI SHOWDOWN?

deepfake definition

And it’s not just heists. Deepfake voices can be used to trick systems that rely on voice recognition. We’re talking about things like online banking, which use your voice as a form of identification. If criminals can create a convincing fake voice of you, they could potentially access your accounts. It’s a bit like forging a signature but with your voice instead.

Countering the deepfake threat

So, while we’re marveling at the amazing things technology can do, it’s also important to be aware of the potential risks and to stay one step ahead. It’s like a high-tech game of cat and mouse, with AI experts and businesses working hard to spot and stop these deepfake voices before they can do any harm.

Luckily, there are folks out there trying to fight back against the potential misuse of deepfake voices. For example, some countries have started to pass laws to regulate deepfakes. Also, there are projects like the Automatic Speaker Verification Spoofing and Countermeasures Challenge (ASVspoof), where scientists and engineers are working on ways to counter deepfake voice attacks.

Kurt’s key takeaways

We’re in an era where tech is evolving at breakneck speed and changing how we work, communicate, and even hear things. While the potential of AI like Meta’s Voicebox is undoubtedly exciting, it’s clear we also need to tread carefully. There’s a fine line between innovation and invasion, a balance we’re all still figuring out.

With all these advancements and potential risks, how do you feel about the future of AI and deepfake technology? Do you see it as a boon or a bane? Let us know in the comments below!

FOR MORE OF MY SECURITY ALERTS, SUBSCRIBE TO MY FREE CYBERGUY REPORT NEWSLETTER HERE