Speech Services#

Manim Voiceover can plug into various speech synthesizers to generate voiceover audio. Below is a comparison of the available services, their pros and cons, and how to set them up.

Choosing a speech service#

Manim Voiceover defines the SpeechService class for adding new speech synthesizers. The classes introduced below are all derived from SpeechService.

Comparison of available speech services#

Speech service

Quality

Can run offline?

Paid / requires an account?

Notes

RecorderService

N/A

N/A

N/A

This is a utility class to record your own voiceovers with a microphone.

AzureService

Very good, human-like

No

Yes

Azure gives 500min/month free TTS quota. However, registration still needs a credit or debit card. See Azure free account FAQ for more details.

ElevenLabsService

Very good, human-like

No

Yes

Requires ElevenLabs account. Click here to sign up.

CoquiService

Good, human-like

Yes

No

Requires PyTorch to run. May be difficult to set up on certain platforms.

GTTSService

Good

No

No

It’s a free API subsidized by Google, so there is a likelihood it may stop working in the future.

OpenAIService

Very good, human-like

No

Yes

Requires OpenAI developer account. See platform to sign up, and the pricing page for more details.

PyTTSX3Service

Bad

Yes

No

Requires espeak. Does not work reliably on Mac.

It is on our roadmap to provide a high quality TTS engine that runs locally for free. If you have any suggestions, please let us know in the Discord server.

RecorderService#

This is not a speech synthesizer but a utility class to record your own voiceovers with a microphone. It provides a command line interface to record voiceovers during rendering.

Install Manim Voiceover with the recorder extra in order to use RecorderService:

pip install "manim-voiceover[recorder]"

Refer to the example usage to get started.

AzureService#

As of now, the highest quality text-to-speech service available in Manim Voiceover is Microsoft Azure Speech Service. To use it, you will need to create an Azure account.

Tip

Azure currently offers free TTS of 500 minutes/month. This is more than enough for most projects.

Install Manim Voiceover with the azure extra in order to use AzureService:

pip install "manim-voiceover[azure]"

Then, you need to find out your subscription key and service region:

  • Sign in to Azure portal and create a new Speech Service resource.

  • Go to the Azure Cognitive Services page.

  • Click on the resource you created and go to the Keys and Endpoint tab. Copy the Key 1 and Location values.

Create a file called .env that contains your authentication information in the same directory where you call Manim.

AZURE_SUBSCRIPTION_KEY="..." # insert Key 1 here
AZURE_SERVICE_REGION="..."   # insert Location here

Check out Azure docs for more details.

Refer to the example usage to get started.

CoquiService#

Coqui TTS is an open source neural text-to-speech engine. It is a fork of Mozilla TTS, which is an implementation of Tacotron 2. It is a very good TTS engine that produces human-like speech. However, it requires PyTorch to run, which may be difficult to set up on certain platforms.

Install Manim Voiceover with the coqui extra in order to use CoquiService:

pip install "manim-voiceover[coqui]"

If you run into issues with PyTorch or NumPy, try changing your Python version to 3.9.

Refer to the example usage to get started.

GTTSService#

gTTS is a text-to-speech library that wraps Google Translate’s text-to-speech API. It needs an internet connection to work.

Install Manim Voiceover with the gtts extra in order to use GTTSService:

pip install "manim-voiceover[gtts]"

Refer to the example usage to get started.

OpenAIService#

OpenAI provides a text-to-speech service. It is through an API, so it requires an internet connection to work. It also requires an API key to use. Register for one here.

Install Manim Voiceover with the openai extra in order to use OpenAIService:

pip install "manim-voiceover[openai]"

Then, you need to find out your api key:

  • Sign in to OpenAI platform and click into Api Keys from the left panel.

  • Click create a new secret key and copy it.

Create a file called .env that contains your authentication information in the same directory where you call Manim.

OPENAI_API_KEY="..." # insert the secret key here. It should start with "sk-"

Check out OpenAI docs for more details.

Refer to the example usage to get started.

PyTTSX3Service#

pyttsx3 is a text-to-speech library that wraps espeak, a formant synthesis speech synthesizer.

Install Manim Voiceover with the pyttsx3 extra in order to use PyTTSX3Service:

pip install "manim-voiceover[pyttsx3]"

Refer to the example usage to get started.

ElevenLabsService#

ElevenLabs offers one of the most natural sounding speech service APIs. It has a range of realistic and emotive voices, and also allows you to clone your own voice by uploading a few minutes of your speech. To use it, you will need to create an account at Eleven Labs.

Tip

ElevenLabs currently offers free TTS of 10,000 characters/month and up to 3 custom voices.

Install Manim Voiceover with the elevenlabs extra in order to use ElevenLabsService:

pip install "manim-voiceover[elevenlabs]"

Then, you need to find out your API key.

  • Sign in to ElevenLabs portal and go to your profile to obtain the key

  • Set the environment variable ELEVEN_API_KEY to your key

Create a file called .env that contains your authentication information in the same directory where you call Manim.

ELEVEN_API_KEY="..." # insert Key 1 here

Check out ElevenLabs docs for more details.

Refer to the example usage to get started.