HyperTTS: Advanced Mode

HyperTTS

Advanced Mode provides a more powerful and flexible way to configure how audio should be added to notes from the Anki editor. It allows you to do things like:

Adding audio with a single click (or keyboard shortcut) depending on the Deck or Note Type you are editing.
Adding audio to two or more fields with a single click.
Setup alternate presets which you manually trigger under specific conditions.

In Advanced Mode, the Speaker and Play button have the following function:

Speaker Button: Apply audio to the current note.
Play Button: Listen to the sound, allowing you to confirm the sound is correct before adding the audio.

How to enable Advanced Mode

If you previously enabled Easy Mode, you can enable Advanced Mode by clicking the Gear Button in the Anki editor.

You can enable or disable Easy Mode with the check box here:

Configuring Preset Rules

After clicking the Gear Button , you will be presented with an empty Preset Rules screen:

In the upper left, you are shown the Note Type and Deck for which you are editing Preset Rules. This is important, because the presets you are adding here will be associated with the Note Type, or Note Type + Deck combination. Click Add Rule to add a new preset rule.

You can create a new preset, or choose an existing preset. A Preset just contains the following pieces of information:

Which Source Field to use
which Target Field to put the sound tag into
How to choose the voice
How to process the text before generating audio

The preset itself doesn't know which Deck it's for. This allows you to use the same preset for multiple decks. Let's create a new preset.

Creating New Preset

The Preset screen is now open.

Source

Look at the numbers on the screenshot:

It's a good idea to rename your preset so that you know what it's for. I will name mine Mandarin (for Chinese)
Make sure you choose the right source field. I chose Chinese.
Check the source text preview. This is the text which will be used to generate the audio. In my case, Chinese text will be used.

Let's move on to the Target tab.

Target

Select the field that the sound tag should be inserted into. This can be the same as the source field (but you'l have to select Text and Sound Tag below), but some people (including myself) find it cleaner to use a separate field.
Since I'm using a standalone Sound field, I select Sound Tag only here. There will no text in the Sound field.

Let's move on to the Voice Selection tab.

Voice Selection

You will have to choose the appropriate service and voice. I recommend choosing a Language, and you can further refine by Locale (for example if you want British English or Australian English). You'll need to select a service, if you don't know which one to pick, try Azure. Once you've selected a voice, click Preview Sound to confirm that the sound is playing correctly.

You can also configure random voices, or priority mode if you want to dynamically select from multiple voices.

Single: always choose the same voice.
Random: randomly choose from any number of voices that you would have selected ahead of time. You can change the random selection weight to make a given voice occur more often.
Priority: choose one voice first, and move to another if audio is not found. This allows you to choose from dictionary services such as Forvo first (which might only have recordings for single words), and fallback to TTS services which can pronounce everything.

You don't have to change anything in the Text Processing tab, but we can cover it anyway:

Text Processing

You can configure text replacement rules. This can be useful if you want to do something like:

replace acronyms and abbreviations by actual words to make it easier for the TTS engine to pronounce them
ignore parts of the source field
complex processing such as adding SSML pauses when a certain string or character appears (see HyperTTS Tips and Tricks)

Once you are satisfied with your Preset settings, click Save and Close.

Using Preset Rules

Assume we added multiple preset rules, here's an example screen. Normally, you only need to use this screen the first time you setup your presets, but you can also do the following:

Preview audio just for one preset
Apply audio just for one preset. This can be used if you have a preset not often used, which you use occasionally, but don't want to apply it automatically.
Edit a preset, if you want to change the voice or other settings
Indicate whether the preset should apply to all notes with this Note Type, or whether it should apply to the Deck + Note Type combinations. In my case, I have a Chinese-Words note type, and a Cantonese and Mandarin deck. They each different presets, so choose Deck and Note Type. But some people have multiple decks which share the same setting, for example Japanese 01 to Japanese 20, so in that case it makes sense to apply the presets at the Note Type level.
You can enable or disable a preset from being run automatically when you add/preview audio. This allows you to keep some presets as manual, to be applied only from this screen.
You can delete a preset.
Finally, if you are happy with the setup, click Save and Close.

When you are satisfied with your presets and preset rules, you can just add / preview audio using the Speaker Button or Play Button in the editor. You can also configure keyboard shortcuts in the HyperTTS Preferences.

April 6, 2025