Artificial Ignorance

Share this post

Designing an AI voiceover system for my newsletter

www.ignorance.ai

Discover more from Artificial Ignorance

The AI newsletter for founders. A nuanced exploration of AI with projects, essays, news, and interviews.
Over 4,000 subscribers
Continue reading
Sign in

Designing an AI voiceover system for my newsletter

Meet the cast of Artificial Ignorance.

Charlie Guo
May 31, 2023
8
Share this post

Designing an AI voiceover system for my newsletter

www.ignorance.ai
9
Share
Artwork created with Midjourney.

If you've read the archives, you're familiar with Hugh. Hugh is a prototype smart assistant I built, with a persona and voice to match.

Hugh's voice comes from ElevenLabs, the runner-up in my "most realistic voices" review. Hugh's voice was so realistic, in fact, that it fooled NPR for a hot second:

1×
0:00
-0:40
Audio playback is not supported on your browser. Please upgrade.

I deeply enjoy Hugh's voice because most AI-narrated content is mediocre. It gets the job done but in a sterile, perfunctory way. It's hard to find synthetic speakers with emotion or pauses for dramatic effect. I want fewer robot voices and more realistic AI speakers!

So, in an effort to be the change I want to see, I decided to add voiceovers to the Artificial Ignorance archives. Along the way, I learned a lot about what not to do and built a bunch of custom software to make the process go faster.

Keep reading to learn more about the process. But without further ado, here's the audio cast of Artificial Ignorance:

Hugh will be voicing long-form essays.

1×
0:00
-0:13
Audio playback is not supported on your browser. Please upgrade.

Evelyn is your go-to voice for weekly news roundups.

1×
0:00
-0:09
Audio playback is not supported on your browser. Please upgrade.

Chase walks through the nuts and bolts of AI projects.

1×
0:00
-0:12
Audio playback is not supported on your browser. Please upgrade.

Lauren nicely narrates the product review posts.

1×
0:00
-0:12
Audio playback is not supported on your browser. Please upgrade.

​

​

The quick and dirty approach

I want to acknowledge something upfront: I put way more effort into this project than I had to. There are plenty of voiceover tools available, though they don't have Hugh.

And even if I had to keep using Hugh's voice, I could have pasted the entire text of a post into ElevenLabs' dashboard. But the quick and dirty approach has a few problems:

  1. Fixing mistakes is pretty costly. A word or phrase may get mispronounced, or I might have a typo in my post. Not something everyone will care about! But I did, so fixing it would mean regenerating the entire text - and burning thousands of credits. Instead, I wanted to regenerate small sections to correct the pronunciation errors.

  2. There's no reliable way to add pauses. Like any good narration, I wanted to add silence after the title or between major sections. There isn't a way to control this in ElevenLabs yet, and I didn't want to use an audio editor. Plus, there isn’t a way to test different length pauses without regenerating the entire text.

  3. Each post can only have a single voice. I have a few ideas for multi-speaker content, but creating said content would mean stitching together multiple files.

So, I did what any good engineer would do: I made my own half-baked solution.

Share

​

Building the narration rig

With my use case in mind, the architecture of what I wanted was pretty straightforward.

  1. Given a post URL, break the post down into roughly paragraph-sized clips.

  2. For each clip, configure the default settings: text, voice, and starting/ending pauses.

  3. Create a UI to adjust the settings and generate the audio. Audio clips should be playable via the UI.

  4. Export a final file that stitches together the different clips, including the silences before and after each one.

The end result:

In practice, most of the code was scaffolding to store and easily edit the clips. All of the heavy lifting takes place within a few Python functions:

The core Python functions.

I won't bore you by going into all the coding details; if you're interested, drop a reply/comment - I'm happy to share more. But I wanted to touch on the underlying Python and JavaScript libraries:

Django: I've been writing Django code for over a decade. It has its tradeoffs, but as a solo developer it is the fastest framework I know for getting something off the ground.

Newspaper: In planning this project, I discovered a fascinating library called newspaper. It's gone unmaintained for a while, but it worked remarkably well at extracting the text of an article and breaking it into paragraphs.

HTMX: The first version of the narrator was simple HTML forms, but that became clunky pretty fast. Rather than integrating an entire front-end JS framework, I opted to use HTMX. It's a lightweight library to build more modern user interfaces.

ElevenLabs: After months of hacking together my own library, I was finally able to use the official ElevenLabs Python SDK. At some point I'll need to migrate the original Hugh code to the official library.

PyDub: A pretty versatile tool for managing audio clips in Python.

​


Review: The best AI-generated voices

Charlie Guo
·
May 4
Review: The best AI-generated voices

Have you been curious about AI-generated voices? Not sure where to start or which product to use? I researched dozens of AI voice-generation tools and got hands-on with ten of them. Keep reading for a look at the top pick(s), how AI voice generation works, use cases and abuse cases, and a breakdown of the testing criteria and competition.

Read full story

​

How to get the most from your AI voices

I learned a few tips and tricks for working with AI-generated voices. Some are broadly applicable, but others are specific to the tool I was using.

Use longer clips. Clips composed of a single phrase or short sentence had a tendency to be less accurate than longer clips. I often had to try regenerating a title or header multiple times to get the right results.

Play with formatting. While the voices can’t understand bold or italic text, it did adjust when given quotes, dashes, and ellipses. For what it’s worth, ElevenLabs plans to introduce more tools this year to help control tone and emphasis.

Try a lot of voices. Specifically for ElevenLabs, I generated a ton of different voices. Each one has a unique personality. Many can sound somewhat dull (at least to me), but every so often you find a voice that’s the perfect fit.

Some edits are required. Not all written text translates well to a spoken medium. Terms like “this/that” should probably be converted to “this and that”. Bulletpoints might make more sense a numbers, or section titles might benefit from a “part one” when read aloud.

Be prepared for edge cases. It was interesting to see which words tripped the AI voices up. "GitHub" is pronounced correctly 99% of the time, but "Github" had around a 20% error rate. Spelled-out words like "LLM" or "www" were often slightly incorrect.

When in doubt, regenerate. The audio is non-deterministic, so regenerating will change up the cadence. While there are some basic controls, I find that you can often get wildly different results by regenerating the audio with different settings.

​

Next steps

This is just a prototype, but I’m going to keep working on this as Artificial Ignorance grows. It’s got plenty of rough edges, and I want to make a bunch of usability improvements. Stuff like:

  • Rearranging the order of clips.

  • Regenerating all the clips at once.

  • Uploading separate audio to splice between clips - like the audio from this post.

  • Adding background music or sound effects.

  • Making custom formatting tweaks automatic.

Keep up with the latest projects and experiments.

​

One more thing

A nice bonus of creating these narrations is that Artificial Ignorance articles are now available as a podcast! Find it on Apple, Spotify, or wherever you get your podcasts.

As of today, all published essays have voiceovers, as does the latest AI roundup. Project writeups, product reviews, and the rest of the archives will get voiceovers shortly (as soon as I buy more ElevenLabs credits).

8
Share this post

Designing an AI voiceover system for my newsletter

www.ignorance.ai
9
Share
Previous
Next
9 Comments
Share this discussion

Designing an AI voiceover system for my newsletter

www.ignorance.ai
Bill Ender
Jun 6Liked by Charlie Guo

Charlie: Have you checked out Rob Lennon’s “Mind Meets Machine” podcast?I don’t know if Rob’s using one of the ElevenLabs voices for his podcast co-host, RubyAI; but whatever he’s using, it’s pretty impressive.

Expand full comment
Reply
Share
1 reply by Charlie Guo
AJ Freeman
Writes This Was A Real Job
May 31Liked by Charlie Guo

what an incredible time to be a content creator!

I found out Eleven Labs had the goods when they were entrusted to make Senator Blumenthal's opening address in Congress' recent hearings on AI...the chatbot even had time to write me a letter.

https://cosmiccoffee9.substack.com/p/ai-generated-reader-feedback-mr-robot

...but I really appreciate all the work you put into making sense of what can come across as some truly intimidating tech.

I've actually been toying with the idea of doing an AI-gen podcast episode to explore its capabilities and I think the tools you spotlighted here put that possibility within reach for me...info-rich piece, thanks again.

Expand full comment
Reply
Share
1 reply by Charlie Guo
7 more comments...
Top
New
Community

No posts

Ready for more?

© 2023 Charlie Guo
Privacy ∙ Terms ∙ Collection notice
Start WritingGet the app
Substack is the home for great writing