> ## Documentation Index
> Fetch the complete documentation index at: https://docs.flex.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Deploy Speech-to-Text with FlexAI Inference Endpoints

> Build a speech-to-text transcription app using FlexAI inference endpoints. Record audio and get transcriptions in real time, hardware-agnostic deployment.

This Speech-to-Text application provides an interactive interface for users to record audio messages using their microphone and receive accurate transcriptions.

## Start the FlexAI endpoints

Create the FlexAI secret that contains your HF token in order to access the inference models:

```bash theme={null}
# Enter your HF token value when prompted
flexai secret create hf-token
```

Start the FlexAI endpoint of the LLM:

```bash theme={null}
LLM_INFERENCE_NAME=speech2text
flexai inference serve $LLM_INFERENCE_NAME --hf-token-secret hf-token --runtime vllm-nvidia-0.10.1 -- --model=openai/whisper-large-v3
```

Store the returned Inference Endpoint API KEY and Endpoint URL:

```bash theme={null}
export LLM_API_KEY=<store the given API key>
export LLM_URL=$(flexai inference inspect $LLM_INFERENCE_NAME -j | jq .config.endpointUrl -r)
```

<Note>
  You'll notice the last `export` line uses the `jq` tool to extract the value of `endpointUrl` from the JSON output of the `inspect` command.

  If you don't have it already, you can get `jq` from its official website: [https://jqlang.org/](https://jqlang.org/)
</Note>

## Setup

<Steps>
  <Step title="Navigate to the speech-to-text directory">
    ```bash theme={null}
    cd code/speech-to-text/
    ```
  </Step>

  <Step title="Install the required dependencies">
    ```bash theme={null}
    pip install -r requirements.txt
    ```
  </Step>

  <Step title="Run the application">
    ```bash theme={null}
    python main.py
    ```

    The application will start and display two URLs:

    * **Local URL**: For local access (e.g., `http://127.0.0.1:7860`)
    * **Public URL**: For sharing (e.g., `https://xxxxxxxxxx.gradio.live`)

    Open either URL in your browser to start transcribing audio.
  </Step>
</Steps>

## Usage

1. **Access the Interface**: Open the Gradio interface in your web browser. To avoid any microphone access permission errors, prefer to use the **public URL** rather than the local one.

2. **Record Audio**:
   * Click the **record icon** to start recording
   * Speak your message
   * Click **stop** when finished recording

3. **Get Transcription**: Click the **"Transcribe"** button to process your audio and receive the text transcription.

4. **View Results**: The transcribed text will appear in the results panel on the right side of the interface.

## Code

### main.py

```python theme={null}
# Copyright (c) 2025 FlexAI
# This file is part of the FlexAI Experiments repository.
# SPDX-License-Identifier: MIT

import os

import gradio as gr
from openai import OpenAI


def check_env() -> None:
    if "LLM_API_KEY" not in os.environ:
        raise ValueError("Please set the LLM_API_KEY environment variable.")
    if "LLM_URL" not in os.environ:
        raise ValueError("Please set the LLM_URL environment variable.")


def infer(audio_path: str, client: OpenAI):
    with open(audio_path, "rb") as f:
        transcription = client.audio.transcriptions.create(
            file=f,
            model="openai/whisper-large-v3",
            response_format="json",
            temperature=0.0,
            extra_body=dict(
                seed=42,
                repetition_penalty=1.3,
            ),
        )
        return transcription.text


def transcribe_audio(audio_file):
    if audio_file is None:
        return "No audio file provided"

    client = OpenAI(
        api_key=os.environ.get("LLM_API_KEY"),
        base_url=os.environ.get("LLM_URL") + "/v1",
    )

    result = infer(audio_file, client)

    return result


def create_gradio_interface():
    with gr.Blocks(title="Speech-to-Text Transcription") as demo:
        gr.Markdown("# Speech-to-Text Transcription")
        gr.Markdown("Record audio using your microphone and get the transcription.")

        with gr.Row():
            with gr.Column():
                audio_input = gr.Audio(
                    sources=["microphone"], type="filepath", label="Record Audio"
                )
                transcribe_btn = gr.Button("Transcribe", variant="primary")

            with gr.Column():
                output_text = gr.Textbox(
                    label="Transcription Result",
                    lines=10,
                    max_lines=20,
                    placeholder="Your transcription will appear here...",
                )

        transcribe_btn.click(
            fn=transcribe_audio, inputs=[audio_input], outputs=[output_text]
        )

    return demo


if __name__ == "__main__":
    check_env()
    demo = create_gradio_interface()
    demo.launch(share=True)
```

### requirements.txt

```text theme={null}
gradio>=5.43.1
openai>=1.101.0
```

<div className="blueprint-cta">
  <h3>⚡ Deploy this on FlexAI</h3>
  <p>Any model, NVIDIA or AMD, live in under 60 seconds. No code changes, no infra management, hardware-agnostic.</p>
  <a href="https://console.flex.ai" className="cta-primary">Get started →</a>
  <a href="https://flex.ai/contact" className="cta-secondary">Talk to us</a>
</div>
