New – Long-Form voices for Amazon Polly

November 17, 2023

We are launching three new voices for Polly. Powered by a new long-form engine, the voices are natural and expressive, with appropriate pauses, emphasis, and tone.

New Voices
The new long-form voices are perfect for blog posts, news articles, training videos, and marketing content. The underlying Machine Learning model extracts meaning from the text, learning about speech segments, prosody (the pattern of rhythm and pauses), intonation, and other aspects of expressive speech, allowing the synthesized audio to express emotions, especially in dialogs. The new long-form engine uses a deep learning text-to-speech (TTS) model trained to acquire a contextual understanding of the text that allows it to express prosody in an appropriate way. This allows the intention of the story to drive the vocal performance and create the correct emphasis, pauses, and tones of a realistic human voice.

Here are the new voices:

Name Locale Gender Language Sample
Danielle en_US Female English (US)
Gregory en_US Male English (US)
Ruth en_US Female English (US)

Using the New Voices
You can access the new voices using the AWS Management Console, AWS Command Line Interface (AWS CLI), or the AWS SDKs. Using the CLI, I start by listing the voices that use the new long-form engine:

$ aws --region us-east-1 polly describe-voices --output json 
  | jq -r '.Voices[] | select(.SupportedEngines | index("long-form")) | .Name'
Danielle
Gregory
Ruth

I can pick one, or I can try all of them:

for v in `aws polly describe-voices --output json 
          | jq -r '.Voices[] | select(.SupportedEngines | index("long-form")) | .Name'`; do
    Text="Hello my name is $v and I can read blog posts, articles, 
and other long-form content for you. I am the best!"
    aws polly synthesize-speech --output-format 'mp3' 
    --text "$Text" --voice-id $v $v.mp3 --engine long-form; 
    aws s3 cp $v.mp3 s3://jbarr-voices; 
done

My shell script had a small quoting bug, but the resulting audio was too funny not to include!

Programmatically, you can reproduce my example by writing code that calls the DescribeVoices and SynthesizeSpeech functions.

Things to Know
Here are some interesting things that you should know about the new voices:

Pricing – Long-form voices are priced at $100 per million characters or Speech Marks requests. Check out the Amazon Polly Pricing page to learn more.

Engines & Voices – Some of the voices that I listed above can be used with more than one engine. For example, the Danielle voice can be used with the new long-form engine and the existing neural engine.

Regions – The new engine and voices are available in the US East (N. Virginia) Region.

Check out the new voices, build something awesome, and let me know what you think!

Jeff;