Prerequisites
Before starting, make sure you have:- A Flexai account with access to the platform
- The
flexaiCLI installed and configured
Start the FlexAI Inference Endpoint
Start the FlexAI endpoint for the Kokoro text-to-speech model:- Create an inference endpoint named
text-to-speech - Use the
flexserveruntime optimized for text-to-speech tasks - Load the Kokoro model for natural voice synthesis
- Configure it for English language text-to-speech generation
Get Endpoint Information
Once the endpoint is deployed, you’ll see the API key displayed in the output. Store it in an environment variable:You’ll notice these
export lines use the jq tool to extract values from the JSON output of the inspect command.If you don’t have it already, you can get jq from its official website: https://jqlang.org/Generate Speech
Now you can generate speech by making HTTP POST requests to your endpoint. Here is an example:Parameters Explanation
The API accepts the following parameters:- inputs: The text you want to convert to speech
- voice: The voice model to use for synthesis
Supported Languages
You can configure different languages by changing the--lang-code parameter when starting the inference endpoint. The voice model to use is specified in the voice parameter of your request.
For a complete list of supported languages and available voices, check the Kokoro GitHub repository