Deploy Speech-to-Text with FlexAI Inference Endpoints

This Speech-to-Text application provides an interactive interface for users to record audio messages using their microphone and receive accurate transcriptions.

Start the FlexAI endpoints

Create the FlexAI secret that contains your HF token in order to access the inference models:

# Enter your HF token value when prompted
flexai secret create hf-token

Start the FlexAI endpoint of the LLM:

LLM_INFERENCE_NAME=speech2text
flexai inference serve $LLM_INFERENCE_NAME --hf-token-secret hf-token --runtime vllm-nvidia-0.10.1 -- --model=openai/whisper-large-v3

Store the returned Inference Endpoint API KEY and Endpoint URL:

export LLM_API_KEY=<store the given API key>
export LLM_URL=$(flexai inference inspect $LLM_INFERENCE_NAME -j | jq .config.endpointUrl -r)

You’ll notice the last export line uses the jq tool to extract the value of endpointUrl from the JSON output of the inspect command.If you don’t have it already, you can get jq from its official website: https://jqlang.org/

Setup

Navigate to the speech-to-text directory

cd code/speech-to-text/

Install the required dependencies

pip install -r requirements.txt

Run the application

python main.py

The application will start and display two URLs:

Local URL: For local access (e.g., http://127.0.0.1:7860)
Public URL: For sharing (e.g., https://xxxxxxxxxx.gradio.live)

Open either URL in your browser to start transcribing audio.

Usage

Access the Interface: Open the Gradio interface in your web browser. To avoid any microphone access permission errors, prefer to use the public URL rather than the local one.
Record Audio:
- Click the record icon to start recording
- Speak your message
- Click stop when finished recording
Get Transcription: Click the “Transcribe” button to process your audio and receive the text transcription.
View Results: The transcribed text will appear in the results panel on the right side of the interface.

Code

main.py

# Copyright (c) 2025 FlexAI
# This file is part of the FlexAI Experiments repository.
# SPDX-License-Identifier: MIT

import os

import gradio as gr
from openai import OpenAI


def check_env() -> None:
    if "LLM_API_KEY" not in os.environ:
        raise ValueError("Please set the LLM_API_KEY environment variable.")
    if "LLM_URL" not in os.environ:
        raise ValueError("Please set the LLM_URL environment variable.")


def infer(audio_path: str, client: OpenAI):
    with open(audio_path, "rb") as f:
        transcription = client.audio.transcriptions.create(
            file=f,
            model="openai/whisper-large-v3",
            response_format="json",
            temperature=0.0,
            extra_body=dict(
                seed=42,
                repetition_penalty=1.3,
            ),
        )
        return transcription.text


def transcribe_audio(audio_file):
    if audio_file is None:
        return "No audio file provided"

    client = OpenAI(
        api_key=os.environ.get("LLM_API_KEY"),
        base_url=os.environ.get("LLM_URL") + "/v1",
    )

    result = infer(audio_file, client)

    return result


def create_gradio_interface():
    with gr.Blocks(title="Speech-to-Text Transcription") as demo:
        gr.Markdown("# Speech-to-Text Transcription")
        gr.Markdown("Record audio using your microphone and get the transcription.")

        with gr.Row():
            with gr.Column():
                audio_input = gr.Audio(
                    sources=["microphone"], type="filepath", label="Record Audio"
                )
                transcribe_btn = gr.Button("Transcribe", variant="primary")

            with gr.Column():
                output_text = gr.Textbox(
                    label="Transcription Result",
                    lines=10,
                    max_lines=20,
                    placeholder="Your transcription will appear here...",
                )

        transcribe_btn.click(
            fn=transcribe_audio, inputs=[audio_input], outputs=[output_text]
        )

    return demo


if __name__ == "__main__":
    check_env()
    demo = create_gradio_interface()
    demo.launch(share=True)

requirements.txt

gradio>=5.43.1
openai>=1.101.0

⚡ Deploy this on FlexAI

Any model, NVIDIA or AMD, live in under 60 seconds. No code changes, no infra management, hardware-agnostic.

Get started →Talk to us

​Start the FlexAI endpoints

​Setup

​Usage

​Code

​main.py

​requirements.txt

​⚡ Deploy this on FlexAI

Start the FlexAI endpoints

Setup

Usage

Code

main.py

requirements.txt

⚡ Deploy this on FlexAI