Questions about scripting with the Festival text-to-speech engine

MirceaKitsune

from LinuxQuestions.org on 2021-08-26 20:08 (#5NTNA)

I'm trying to script a series of spoken messages using the Festival text-to-speech engine, the text2wave command in particular. I took a look at the basics of how Festival scripting works via scm and xml files, yet there are things I can't seem to find any useful information on. If anyone is familiar with the software I wanted to ask about how I'm meant to use the system in this format.

What I essentially want is to have different voices spoken at different locations in the resulting audio, using different voices if possible. Something among the lines of: Wait 5 seconds, say "foo" in voice X, wait 10 seconds, say "bar" in voice Y. Is this possible to script in a single scm / xml definition, any examples of how to do it?

I'd also like to include other sounds in the equation. Can the schematic for the text2wave command take another wav / ogg and throw it in together with the spoken voices? Overlap is okay... was thinking of using this to add music without having to do other changes with a ffmpeg command.

In addition: Is there a way to change the pitch of a voice? I only found a way to set the speed in the scm using the line (Parameter.set 'Duration_Stretch 1). Do I need to make my own variant for a voice to do that, and how is this done if yes?

latest?i=PpempIeWMIQ:KJecY8RbRdA:F7zBnMy

latest?i=PpempIeWMIQ:KJecY8RbRdA:V_sGLiP

latest?i=PpempIeWMIQ:KJecY8RbRdA:gIN9vFw

Source	RSS or Atom Feed
Feed Location	https://feeds.feedburner.com/linuxquestions/latest
Feed Title	LinuxQuestions.org
Feed Link	https://www.linuxquestions.org/questions/