Artificial Ignorance

Share this post

I made an Alexa

www.ignorance.ai

Discover more from Artificial Ignorance

The AI newsletter for founders. A nuanced exploration of AI with projects, essays, news, and interviews.
Over 4,000 subscribers
Continue reading
Sign in

I made an Alexa

Meet Hugh.

Charlie Guo
Mar 8, 2023
7
Share this post

I made an Alexa

www.ignorance.ai
3
Share

I recently came across this tweet1:

Twitter avatar for @DrJimFan
Jim Fan @DrJimFan
Here’s the recipe to make Siri/Alexa 10x better: 1. Whisper to convert speech to text. Best open-source speech model out there. 2. ChatGPT to generate smart home API calls and/or text response. 3. VALL-E to synthesize speech. It can mimic anyone’s voice sample! Quick figure 1/3
Image
5:08 PM ∙ Jan 9, 2023
5,571Likes1,105Retweets

And I realized that with OpenAI and ElevenLabs, you could do this - make your own smart assistant - just by connecting APIs. No fancy machine learning necessary.

So I built it. Meet Hugh.

Share

The plan

To build Hugh, I first sketched out my plan at a high level.

  1. Write the server code:

    • Transcribe audio and return the resulting text.

    • Generate an AI response given a text prompt.

    • Convert an AI response into speech and save it as an mp3 file.

    • Send an mp3 file to the user to be played.

  2. Write the browser code:

    • Record audio and send it off for transcription.

    • Submit a text box to generate a spoken AI response.

    • Play an audio file.

  3. Add some polish (nice design, images, etc).

I also wanted to try using a few different AI tools:

  • ChatGPT to write the code itself (as much as it could)

  • Whisper to transcribe audio

  • ChatGPT (API) to respond to questions

  • ElevenLabs (and the Hugh voice I created) to generate speech

  • Midjourney to create an avatar

The code

Apart from the newfangled AI bits, I wanted to stick with tools I already knew, so I used Python, HTML, and JavaScript. I started by asking ChatGPT to write the Python server code I needed.

My ChatGPT Python prompt.

I was pleasantly surprised by how good the results were. ChatGPT created backend code that was about 80% of what I needed. And to top it off, it explained the results step by step. I wish I could've had something like this back when I was studying programming!

ChatGPT’s very detailed response.

Ultimately, I did have to edit a decent bit of the code. But I'm guessing it saved at least 20-30 minutes of fiddling around with initial setup work.

I tried the same thing with the HTML and JavaScript, but the results were less accurate. I had to re-word my prompt a few times to get ChatGPT to stop barking up the wrong tree. The issue was partly due to my inability to articulate what I wanted. For example, I had never built a project that records browser audio before, so I didn't know what to ask for.

Then it was time to actually add the AI! There were three pieces to this:

  • OpenAI's Whisper, which transcribes audio to text

  • OpenAI's ChatGPT, which answers questions given a user prompt

  • ElevenLabs, which converts text to spoken audio

The OpenAI software was ridiculously easy to use. Seriously, here's the entire code for both transcribing audio prompts and generating ChatGPT responses:

Using Whisper and ChatGPT in 13 lines of code.

But after filling in the AI gaps, I had a working (if barebones) app.

Ask Hugh V1

Unfortunately, ChatGPT didn’t have a flair for the artistic.

The polish

Luckily, I could ask it to help with that too.

ChatGPT helpfully writes stylish HTML

With some more massaging, I had a reasonable-looking page.

Ask Hugh V2

But the last thing missing was some character. Specifically, an avatar for Hugh. To generate Hugh's Avatar, I turned to Midjourney, an image-generating AI.

If you aren't familiar, the way Midjourney works is you have to join the Midjourney Discord server, then use the `imagine` prompt to whip up an image. Writing an effective prompt is an art form in itself, but even vague prompts usually get decent results.

In this case, I started with “a smart british gentleman, studio ghibli, chibi, digital art.” Midjourney generates four different low-resolution images per prompt, which can be further modified. Here were Hugh's:

The initial Hugh concepts

For any image, you can choose to 1) generate more variations of the image, or 2) upscale the image to a higher resolution. I liked the top right image the best, so I made some variations.

Variations on one of the concepts

I was happy with the bottom left image, so I had Midjourney create an upscaled version.

The final high-resolution avatar

And with a little bit of image editing work, I had my avatar!

Ask Hugh V3

Artificial Ignorance is reader-supported. If you found this interesting or insightful, consider becoming a free or paid subscriber.

Next steps

Building this project was a ton of fun. You can find the code here: https://github.com/IgnoranceAI/hugh.2

Hugh is a decent conversation partner, but there are certainly some upgrades he could benefit from3. What I’d love to do is dig deeper into the mechanics of GPT - prompt engineering is well and good, but what does it take to fine-tune a model? And I'm still planning on building more voice projects, including narration for Artificial Ignorance.

In the meantime, I don’t think I’ll be asking Hugh for help with any coding.

1

Thanks, Fed!

2

You'll need your own API keys for OpenAI and ElevenLabs. They both have a reasonable free tier to play with, but heavy usage will cost a little bit.

3

Streaming the audio rather than saving it to a file would make things faster. And currently, there’s a limit on how long conversations can run.

7
Share this post

I made an Alexa

www.ignorance.ai
3
Share
Previous
Next
3 Comments
Share this discussion

I made an Alexa

www.ignorance.ai
R. S. Mills
Writes All Good Things Come from God
Mar 30

Dope!

Expand full comment
Reply
Share
Alex
Mar 9

Hi, my name is Alejandro from Costa Rica, how can I contact you to talk about a project

Expand full comment
Reply
Share
1 reply by Charlie Guo
1 more comment...
Top
New
Community

No posts

Ready for more?

© 2023 Charlie Guo
Privacy ∙ Terms ∙ Collection notice
Start WritingGet the app
Substack is the home for great writing