Introducing Model 2.0

Our new generation text-to-speech engine brings fine-grained emotion control, faster rendering, and support for more languages — now set as the default for all Vocloner users.

Today we are launching Model 2.0, a major upgrade to the text-to-speech engine powering Vocloner. It is now the default model for all users — no settings to change, no action needed. You will notice the difference immediately.

What's new in Model 2.0

Model 2.0 has been rebuilt from the ground up with three core improvements: more expressive output, faster generation, and support for a significantly wider range of languages.

What changed

🎭

More Expressive

Place emotion and tone markers anywhere in your text, not just at the start of a sentence. More natural, more human.

⚡

Faster Output

Significantly reduced time-to-first-audio. Long passages generate faster without sacrificing quality.

🌍

Wider Language Support

Many more languages now supported with high accuracy, including improved multilingual voice cloning.

Better expressiveness, word by word

The biggest improvement in Model 2.0 is how it handles emotion and tone markers. In the previous model, markers had to be placed at the beginning of a sentence. Model 2.0 understands markers placed anywhere in your text — so you can direct the voice at exactly the right moment.

Marker placement: Model 1.0 vs Model 2.0

Model 1.0

(whispering) Don't let them hear you.

(excited) I finally found it!

(sad) I don't know what to do.

Tags only at the start of sentences.

Model 2.0

Don't let them (whispering) hear you.

I (pause) finally found it. (excited) Yes!

I really don't (sad) know what to do.

Tags placed anywhere in the text.

Performance at a glance

We ran extensive tests comparing Model 1.0 and Model 2.0 across several key dimensions. The results speak for themselves.

Model 1.0 vs Model 2.0 — performance comparison

Expressiveness

Model 1.0

55%

Model 2.0

92%

Naturalness

Model 1.0

63%

Model 2.0

95%

Languages

Model 1.0

40%

Model 2.0

90%

Speed

Model 1.0

60%

Model 2.0

88%

Model 2.0 is now your default

Starting today, all new text-to-speech generations on Vocloner use Model 2.0 automatically. There is nothing you need to do — just open the TTS page and start generating.

If you prefer the previous generation for any reason, you can switch back to Model 1.0 at any time from the Advanced Controls section on the TTS page.

All your existing voices work as before

All voices you have already cloned are fully compatible with Model 2.0. There is no need to re-clone or re-upload anything. Your voice library is unchanged.

Try Model 2.0 now

Open Vocloner

Introducing Model 2.0: More Expressive, Wider Language Support

Model 2.0

What's new in Model 2.0

More Expressive

Faster Output

Wider Language Support

Better expressiveness, word by word

Model 1.0

Model 2.0

Performance at a glance

Model 2.0 is now your default

All your existing voices work as before

Try Model 2.0 now