parler_tts_mini_v0.1 model to create a French version.
The model generates high-quality speech from input text, which can be controlled using a description prompt (e.g., gender, speaking rate, etc.).
The training uses the a text-to-speech dataset in French, enabling the model to produce natural and expressive speech in this language.
Connect to GitHub (if needed)
If you haven’t already connected FlexAI to GitHub, you’ll need to set up a code registry connection:This will allow FlexAI to pull repositories directly from GitHub using the
-u flag in training commands.Getting the Dataset
You can download the pre-processed version of the dataset by running the following command:
If you’d like to reproduce the pre-processing steps yourself to use a different dataset or simply to learn more about the process, you can refer to the Manual Dataset Pre-processing section below.Next, push the contents of the
text-to-speech-fr/ directory as a new FlexAI dataset:Training
To start the Training Job, run the following command:.json file as input, you can also set the arguments manually. For example:
Optional Extra Steps
You can run these extra steps in a FlexAI Interactive Session or in a local env (e.g.pipenv install --python 3.10), if you have hardware that’s capable of doing inference.
Inference
A simple inference script that you can easily adapt to your needs is available atcode/text-to-speech/predict.py.
Manual Dataset Pre-processing
If you’d prefer to perform the dataset pre-processing step yourself, you can follow these instructions.Clone this repository
If you haven’t already, clone this repository on your host machine:Install the dependencies
Depending on your environment, you might need to install - if not already - the experiments’ dependencies by running:Dataset preparation
Prepare the dataset by running the training command with the--preprocessing_only flag in ./code/text-to-speech/french_training.json.
For large datasets, it is recommended to run the preprocessing on a single machine to avoid timeouts when running the script in distributed mode.
--save_to_disk=./text-to-speech-fr/.
Run the dataset preparation using:
--preprocessing_only flag before attempting to run the script for training purposes.