- Open AI has shown a demo of its ambitious voice engine AI model, which can generate natural-sounding speech
- However, no official release date has been announced yet, fearing misuse of voice synthesis
- Open AI is working on ways to authenticate a speaker’s consent to use their voice
OpenAI has unveiled a preview of Voice Engine, an AI tech that can generate natural speech with the help of a 15-second audio sample and text input. The model was developed in late 2022 and is already being used by Chat GPT in its ‘read aloud’ feature.
However, the company has given no release date (yet) considering the risk of misuse of voice synthesis.
We hope to start a dialogue on the responsible deployment of synthetic voices, and how society can adapt to these new capabilities. Based on these conversations, we will make a more informed decision about whether and how to deploy this technology at scale.OpenAI
Although OpenAI has not revealed the pricing of the model, too, a detailed reading of its documents shows that it might be priced at around $15 per 1 million characters (roughly 162,500 words), which boils down to $1 per hour. This is a lot cheaper than competitors like ElevenLabs, which charge around $11 for 1 million characters.
However, here is where the problem also lies. Voice actors usually charge anything between $12 -79 for every hour of work. The adoption of Voice Engine may leave these actors with little to no work.
Read more: OpenAI further empowers ChatGPT with voice and image features
Risks Associated with Voice Synthesis
OpenAI is concerned about the misuse of Voice Engine especially in an election year. The FCC already noted down a case where President Biden’s voice was deepfaked to influence the citizens of New Hampshire. After this, the FCC had to ban AI voice robocalls.
Similarly, 4chan, a controversial content board, spread hateful messages by mimicking the voice of Emma Watson using ElevenLabs.
The introduction of Voice Engine at this time and at this price would empower such miscreants and make their notorious jobs a whole lot easier.
How Does OpenAI Plan to Face Such Threats?
OpenAI has revealed a few steps it will be taking to ensure Voice Engine isn’t exploited.
1. For starters, it is currently only available to 10 trusted partners who have agreed to OpenAI’s strict usage policies.
OpenAI requires these partners to obtain explicit and informed consent from the speakers whose voice is used.
OpenAI has hinted that it might introduce real-time voice authentication to ensure that the speakers are aware of their voice being used. This authentication would require the users to read out a randomly generated paragraph – just like how you fill a CAPTCHA.
2. Secondly, the users will have to inform their audience which voices are generated by AI through proper labels.
3. Lastly, OpenAI plans to use the watermarking technology to trace back audios that have been generated using Voice Engine.
Just like encryption, these watermarks add audible identifiers in AI recordings, which can only be traced by technology owned by the company.
In its blog post, OpenAI stressed the importance of educating the public at large so that they can understand the downside of deceptive AI content. The AI giant looks keen to start conversations around the healthy development of AI tech with stakeholders like researchers, policymakers, and developers.
This measured approach of OpenAI is a welcome move in an industry that will shape how our future technologies look like.