Tuesday, September 23, 2014

Free Speech Synthesis For Your Robot

Newly painted Hero Jr.
I ran across a surprisingly good speech synthesis package. No, it's not Festival.

My Hero Jr had been painted blue, with holes drilled in the head, and other modifications. There's no better Hero Jr to modify and hack than this one. Possibly in time for the NoCo Mini-Maker Faire in Ft. Collins, Oct 4-5.

After painting it, the next step will be to add my recently purchased Raspberry Pi B+, or else the dusty PCduino in my closet, as a brain.

Then, implement speech synthesis. Vision: a robot "tour guide" for my Maker Faire exhibit.

While I adore the stock Votrax SC-01A speech synthesizer (it's the quintessential robot voice) I like the idea of a 20 year old robot with a modern voice even more.

Here's the fruits of my research so far...

Festival

I learned about the Festival speech synthesis system years ago. The now defunct Bot Thoughts podcast was hosted by the rab voice (UK). I thought it was the best of the stock voices, maybe because I'm not used to UK speech patterns and so the flaws aren't as jarring as those of the US voices.

The kal voice (US) is has a few digital-sounding artifacts in a few spots, and the intonation is noticeably wrong in several spots, too.

Other voices are available. The MBROLA US voices are decent. I also tried the enhanced CMU Arctic voices (download here). Here's rms.

rms_festvox.wav
rms_festvox.ogg

Mary TTS

Mary TTS is the package I ran across. Mary TTS is particularly linux friendly and comparatively amazing, primarily because it does such a great job with prosody (intonation, stress, rhythm in speech). Unfortunately it requires a lot of resources. It won't run on a Raspberry Pi because it requires quite a bit of memory. I plan to test it on a pcDuino3 which I just bought.

This is rms, again, clearly the same voice (easily downloaded with a native Mary TTS tool), but... well, see how you think it compares.

rms_marytts.wav
rms_marytts.ogg

Admittedly there may be some tuning parameters in both packages that would improve the results. Out of the box, it seems pretty clear that Mary TTS does a much better job with pitch and timing.

Other CMU arctic voices are available for Mary TTS. I haven't picked out the right voice. I kind of like Poppy, the female British voice.

Free TTS

I also came across Free TTS, also written in Java. Unfortunately, based on the samples I heard, the prosody is pretty poor compared to both Festival and Mary TTS.

Others?

Any other tips on free speech synthesis packages out there?

1 comment:

Note: Only a member of this blog may post a comment.