Recently, I experimented with two voice-generating AIs: Resemble AI and ElevenLabs. After spending a few days playing with this technology, I'm completely blown away by how advanced it is. In my mind, we were still years away from anything this good.
Overall, Resemble AI has advanced project management features, and offers a ton of fine-tuning options for your audio. However, the voice output from ElevenLabs was unmatched. Their platform is still in beta, and lacking plenty of features, so keep that in mind. If you haven’t already, check out the full write-up with my impressions and takeaways.
Resemble AI
Founded in 2018, Resemble AI has raised $4M in funding and is based in Toronto.
Pricing (Basic Plan): $0.006 per second of generated audio. Up to 10 voices per account.
Default voices: The platform had reasonable default voices available. Above are two of Resemble AI's default voices, Beth and Justin.
Project management: Resemble AI is built around the idea of managing audio projects. Each project contains multiple “clips,” and each clip can have multiple speakers conversing (with up to 3,000 characters read per statement). This format makes a ton of sense for audiobook narration or script voiceover; projects can be broken out by chapter or scene. Once you’ve perfected your lines and added voice effects (see below), you can generate the entire clip as combined or separate audio files.
Voice effects: After you add a block of speech to a clip, there are an impressive array of options for tweaking the voice audio. For example, you can tune the pitch, intensity, and pacing of individual words. You can change the pronunciation of individual phonemes, or make the voice spell out each letter in a word. You can change the syllable emphasis, or simply add a pause before speaking. A particularly neat feature is the ability to turn a word - say, a character’s name - into a variable. Then, you can edit things like pronunciation and emphasis once, then have it reused everywhere.
While these tools look very impressive, I must say they also feel a bit daunting. As someone who’s exploring voice AI for the first time, I don’t want to have to spend hours and hours fiddling with different voice effects to get a good result. But there are certainly people out there who care that deeply about their final product.
Voice marketplace: A big strength of the platform is the voice marketplace. After cloning their voice, users can add it to the marketplace for anyone else to use in their projects. The marketplace is organized by gender and accent, but also by style (e.g. conversation, narration) and use-case (e.g. games, advertising).
Voice cloning: The platform lets you clone your voice by live-recording a series of sentences (file upload is available for enterprise users). It takes two days to generate the voice, though I’m not entirely sure why.
Localization: I wasn’t able to test this, but their marketing touts the ability to automatically translate a voice into various languages.
API: There are API docs available, but I wasn’t able to try them.
ElevenLabs
Founded in 2022, ElevenLabs has raised $2M in funding and is based in New York and London.
Pricing (Starter Plan): $0.17 per 1000 characters generated. Up to 10 voices per account.
Default voices: The platform had reasonable default voices available. Above are two of ElevenLabs’ default voices, Adam and Antoni.
Voice settings: In contrast to Resemble AI, ElevenLabs only has two dials to turn when generating speech. Lowering stability makes the voice more varied and expressive, but can introduce audio artifacts. Clarity should generally be set to a high value, but setting it too high can also lead to audio artifacts.
Compare the previous Antoni recording with this one, where stability has been turned all the way down:
Voice design: By far the most fun part of generating AI voices. There are two genders (male/female), three ages (young/middle-aged/old), and five accents (American/British/Australian/African/Indian). Between those inputs and the accent slider, it’s possible to generate endless voices, each slightly (or not so slightly) different from the last. When you find a voice that you like, you can add it to your account for future use. If you don’t, you’ll need to keep re-generating voices until you find a similar one again.
Voice cloning: The platform has a very simple way of cloning a voice (only available to paid subscribers). Simply upload a few audio clips, and it will instantly create a new voice. In practice, I found that the voice sounded somewhat natural, but didn’t really sound like me.
API: There are API docs available, but I wasn’t able to try them.