Happy 2022 everyone 🎉🎉! I hope this is a year of good health and prosperity for everyone reading! This is a post I was thinking about in my spare time for quite a while and when I sat down to figure out the nuts of bolts of it, I was kind of shocked to see how easy it was to instrument.

Text to Speech

Sometime in mid-2021, I was reading the latest AWS Blogs and noticed something that really made me check my privilege:

voiced by Amazon Polly

It is something very minor to see on a website, but it made me instantly think about two things for my own blog:

  1. My blog is not accessibility (a11y) friendly
  2. Setting up Amazon Polly would be a fun automation experiment to solve this problem.

What better way to be more inclusive and to learn a new skill!

Amazon Polly

Having never used the Amazon Polly service before, I have probably benefitted by checking out the service now after the latest neural voices have been added. I wanted this website to have a close-to-natural sounding speech (you know, the opposite of Microsoft Sam from back in the day) however it would be too much of a time commitment to record these myself unfortunately, as much as I would love to. Enter Amazon Polly.

What struck me most when trying out the service was how easy and intuitive it was to setup. I don’t think I’ve experienced this with any AWS service before. I took the text directly from my website, not the markdown or html, pasted into the “input text” field, hit listen and was pleasantly surprised as to how it sounded! I played around with the language and voice options and settled on “English (US)” and “Matthew, Male” as my preferred options (pretty vanilla, I know). I will periodically check on the Australian voices for future though 😉

I considered how to make the creation process more efficient and streamlined with my CI/CD pipeline, however because my website is still small with only a few blog posts, this would be a diminishing return. The following process is perfectly fine for me when creating and publishing articles:

  1. Write up blog post in markdown
  2. Run hexo generate to preview page locally
  3. Copy text from generated page to Amazon Polly
  4. Save to S3
  5. Add appropriate link to markdown and upload to GitHub

I may investigate an automated process for steps 3-5 in future, but for now these steps really don’t add much overhead for the value they provide.

The Amazon Polly pricing also seems reasonable, with a news article costing around 10 cents. 10 cents per blog post is certainly something I can afford, however I will be keeping a close eye on this to ensure it does not balloon out unexpectedly. Thankfully my budget notification should help out with that!

The final piece of the puzzle was to figure out how to add these recordings to each blog post via Hexo. Luckily, there is a hexo-tag-mmedia plugin that is super easy to configure via npm install hexo-tag-mmedia@1 --save. After that, the following block is added to the _config.yml file:

1
2
3
4
mmedia:
audio:
default:
controls:

After each file is generated from Amazon Polly and saved in S3, it is added to the blog post in markdown like so: {% mmedia "audio" "src:https://s3.amazonaws.com/www.thecloudonmymind.com/audio/Add-Amazon-Polly-to-Hexo-Blog.mp3" %}

And that’s it! I then went back through all my old posts and added text-to-speech audio to each post. Going forward I will be adding text-to-speech for all new blog posts as well. Accessibility is something that I will pay a lot closer attention to in future, but learning a new AWS service along the way is definitely a win-win.

Recap

  • Accessibility is important, and I will be investing more time understanding a11y in future
  • Amazon Polly is quite intuitive, quick and easy to use
  • Hexo websites can support audio playback with some minor plugin additions!