About - Vocloner

How it works

Vocloner is an online voice cloning tool based on open source models, primarly released by Coqui AI. It offers 2 demos to experiment: first one is a simple voice cloning tool that works in english and is able to clone up to 2000 characters of text in a single step.

Version 1 (TTS Classic)

Based one of the first open source voice cloning tools by Coqui, it works primarily in english. It's quite fast and it does not need to run on GPU, even though the quality of the voice can be sometimes sounds robotic or metallic. A voice can be cloned with as little as 3 seconds of audio, anyway results are better if the audio is a least some minute long. It represents a natural evolution of the first released scripts of this field, such as Real Time Voice Cloning.

Using the tool requires just a reference audio of the voice you are going to clone. It can be a recording, a Whatsapp or Telegram audio, or a audio cut from a Youtube Video. You can record your voice from microphone as alternative.
The Classic TTS Cloning doesn't work on GPU, but it simply adapt a pretrained neural network weight on a target voice spectrogram. In this way it can achieve a fast yet faithful copy of the desired voice.

Version 2 (XTTS)

The XTTS foundation model is the result of years of relentless effort by the Coqui team, surpassing both open and closed models in a wide range of tasks. Here are some of its outstanding features: XTTS produces speech of exceptional quality, often exceeding professional production standards. It's multilingual, capable of generating speech in 13 different languages, with more languages on the horizon.
Voice cloning is made easy with XTTS, allowing you to replicate voices with just a small sample, whether in the same language or across languages. Coqui's innovation extends beyond XTTS; they are redefining open model licensing. Teaming up with Heather Meeker, an authority in open-source licenses, Coqui has introduced the Coqui Public Model License (CPML), with XTTS being the first model to be released under this groundbreaking license.

How to use Version 2

To start using XTTS Voice Cloning, just upload a reference audio of the voice you want to use. Set a language (supported ones are Arabic, Brazilian, Portuguese, Mandarin Chinese, Czech, Dutch, English, French, German, Italian, Polish, Russian, Spanish, Turkish, Japanese, Korean, Hungarian, Hindi).
Then you can enter the text you want the voice to say, just if it was a text to speech software. There is a limit of 200 characters per conversion to guarantee a fair use to everybody. Also, the model tends to perform better on short phrases composed by at maximum 1 or 2 sentences.

XTTS automatically recognizes the language in the text input field, however you can disable this option by checking the box in the additional parameters section. You can also regulate the use of microphone for conversion or decide to use cleanup tool for voice in case of noisy microphones.
Remember to check the option 'Agree' before starting conversion, to aknowledge Coqui license terms for conversion.