Today we are launching Model 2.0, a major upgrade to the text-to-speech engine powering Vocloner. It is now the default model for all users — no settings to change, no action needed. You will notice the difference immediately.
What's new in Model 2.0
Model 2.0 has been rebuilt from the ground up with three core improvements: more expressive output, faster generation, and support for a significantly wider range of languages.
More Expressive
Place emotion and tone markers anywhere in your text, not just at the start of a sentence. More natural, more human.
Faster Output
Significantly reduced time-to-first-audio. Long passages generate faster without sacrificing quality.
Wider Language Support
Many more languages now supported with high accuracy, including improved multilingual voice cloning.
Better expressiveness, word by word
The biggest improvement in Model 2.0 is how it handles emotion and tone markers. In the previous model, markers had to be placed at the beginning of a sentence. Model 2.0 understands markers placed anywhere in your text — so you can direct the voice at exactly the right moment.
Model 1.0
Tags only at the start of sentences.
Model 2.0
Tags placed anywhere in the text.
Performance at a glance
We ran extensive tests comparing Model 1.0 and Model 2.0 across several key dimensions. The results speak for themselves.
Model 2.0 is now your default
Starting today, all new text-to-speech generations on Vocloner use Model 2.0 automatically. There is nothing you need to do — just open the TTS page and start generating.
If you prefer the previous generation for any reason, you can switch back to Model 1.0 at any time from the Advanced Controls section on the TTS page.
All your existing voices work as before
All voices you have already cloned are fully compatible with Model 2.0. There is no need to re-clone or re-upload anything. Your voice library is unchanged.
Try Model 2.0 now
Log in to your Vocloner account and generate your first speech with the new model — it takes seconds.
Open Vocloner