Text-to-Speech with Kokoro on FlexAI Inference Endpoints

This experiment demonstrates how to deploy and use a text-to-speech model (Kokoro) using Flexai’s inference serving capabilities.

Prerequisites

Before starting, make sure you have:

A Flexai account with access to the platform
The flexai CLI installed and configured

Start the FlexAI Inference Endpoint

Start the FlexAI endpoint for the Kokoro text-to-speech model:

INFERENCE_NAME=text-to-speech
flexai inference serve $INFERENCE_NAME --runtime flexserve -- --model hexgrad/Kokoro-82M --task text-to-speech --lang-code a

This command will:

Create an inference endpoint named text-to-speech
Use the flexserve runtime optimized for text-to-speech tasks
Load the Kokoro model for natural voice synthesis
Configure it for English language text-to-speech generation

Get Endpoint Information

Once the endpoint is deployed, you’ll see the API key displayed in the output. Store it in an environment variable:

export INFERENCE_API_KEY=<API_KEY_FROM_ENDPOINT_CREATION_OUTPUT>

Then retrieve the endpoint URL:

export INFERENCE_URL=$(flexai inference inspect $INFERENCE_NAME -j | jq .config.endpointUrl -r)

You’ll notice these export lines use the jq tool to extract values from the JSON output of the inspect command.If you don’t have it already, you can get jq from its official website: https://jqlang.org/

Generate Speech

Now you can generate speech by making HTTP POST requests to your endpoint. Here is an example:

curl -X POST \
  -H "Authorization: Bearer $INFERENCE_API_KEY" \
  -H 'Content-Type: application/json' \
  -d '{
    "inputs": "Welcome to FlexAI! Produce lifelike speech and spin up production inference endpoints in seconds.",
    "parameters": {
      "voice": "af_heart"
    }
  }' \
  -o welcome.wav \
  "$INFERENCE_URL/v1/speeches/generations"

Parameters Explanation

The API accepts the following parameters:

inputs: The text you want to convert to speech
voice: The voice model to use for synthesis

Supported Languages

You can configure different languages by changing the --lang-code parameter when starting the inference endpoint. The voice model to use is specified in the voice parameter of your request. For a complete list of supported languages and available voices, check the Kokoro GitHub repository

⚡ Deploy this on FlexAI

Any model, NVIDIA or AMD, live in under 60 seconds. No code changes, no infra management, hardware-agnostic.

Get started →Talk to us

Getting Started

Inference

Fine-tuning

Training

Platform Services

Interactive Development

CLI

Console

Best Practices

FAQ

Blueprints

Text-to-Speech with Kokoro on FlexAI Inference Endpoints

Prerequisites

Start the FlexAI Inference Endpoint

Get Endpoint Information

Generate Speech

Parameters Explanation

Supported Languages

⚡ Deploy this on FlexAI

Getting Started

Inference

Fine-tuning

Training

Platform Services

Interactive Development

CLI

Console

Best Practices

FAQ

Blueprints

Documentation Index

​Prerequisites

​Start the FlexAI Inference Endpoint

​Get Endpoint Information

​Generate Speech

​Parameters Explanation

​Supported Languages

​⚡ Deploy this on FlexAI

Prerequisites

Start the FlexAI Inference Endpoint

Get Endpoint Information

Generate Speech

Parameters Explanation

Supported Languages

⚡ Deploy this on FlexAI