Documentation Index
Fetch the complete documentation index at: https://docs.flex.ai/llms.txt
Use this file to discover all available pages before exploring further.
This Speech-to-Text application provides an interactive interface for users to record audio messages using their microphone and receive accurate transcriptions.
Start the FlexAI endpoints
Create the FlexAI secret that contains your HF token in order to access the inference models:
# Enter your HF token value when prompted
flexai secret create hf-token
Start the FlexAI endpoint of the LLM:
LLM_INFERENCE_NAME=speech2text
flexai inference serve $LLM_INFERENCE_NAME --hf-token-secret hf-token --runtime vllm-nvidia-0.10.1 -- --model=openai/whisper-large-v3
Store the returned Inference Endpoint API KEY and Endpoint URL:
export LLM_API_KEY=<store the given API key>
export LLM_URL=$(flexai inference inspect $LLM_INFERENCE_NAME -j | jq .config.endpointUrl -r)
You’ll notice the last export line uses the jq tool to extract the value of endpointUrl from the JSON output of the inspect command.If you don’t have it already, you can get jq from its official website: https://jqlang.org/
Setup
Navigate to the speech-to-text directory
Install the required dependencies
pip install -r requirements.txt
Run the application
The application will start and display two URLs:
- Local URL: For local access (e.g.,
http://127.0.0.1:7860)
- Public URL: For sharing (e.g.,
https://xxxxxxxxxx.gradio.live)
Open either URL in your browser to start transcribing audio.
Usage
-
Access the Interface: Open the Gradio interface in your web browser. To avoid any microphone access permission errors, prefer to use the public URL rather than the local one.
-
Record Audio:
- Click the record icon to start recording
- Speak your message
- Click stop when finished recording
-
Get Transcription: Click the “Transcribe” button to process your audio and receive the text transcription.
-
View Results: The transcribed text will appear in the results panel on the right side of the interface.
Code
main.py
# Copyright (c) 2025 FlexAI
# This file is part of the FlexAI Experiments repository.
# SPDX-License-Identifier: MIT
import os
import gradio as gr
from openai import OpenAI
def check_env() -> None:
if "LLM_API_KEY" not in os.environ:
raise ValueError("Please set the LLM_API_KEY environment variable.")
if "LLM_URL" not in os.environ:
raise ValueError("Please set the LLM_URL environment variable.")
def infer(audio_path: str, client: OpenAI):
with open(audio_path, "rb") as f:
transcription = client.audio.transcriptions.create(
file=f,
model="openai/whisper-large-v3",
response_format="json",
temperature=0.0,
extra_body=dict(
seed=42,
repetition_penalty=1.3,
),
)
return transcription.text
def transcribe_audio(audio_file):
if audio_file is None:
return "No audio file provided"
client = OpenAI(
api_key=os.environ.get("LLM_API_KEY"),
base_url=os.environ.get("LLM_URL") + "/v1",
)
result = infer(audio_file, client)
return result
def create_gradio_interface():
with gr.Blocks(title="Speech-to-Text Transcription") as demo:
gr.Markdown("# Speech-to-Text Transcription")
gr.Markdown("Record audio using your microphone and get the transcription.")
with gr.Row():
with gr.Column():
audio_input = gr.Audio(
sources=["microphone"], type="filepath", label="Record Audio"
)
transcribe_btn = gr.Button("Transcribe", variant="primary")
with gr.Column():
output_text = gr.Textbox(
label="Transcription Result",
lines=10,
max_lines=20,
placeholder="Your transcription will appear here...",
)
transcribe_btn.click(
fn=transcribe_audio, inputs=[audio_input], outputs=[output_text]
)
return demo
if __name__ == "__main__":
check_env()
demo = create_gradio_interface()
demo.launch(share=True)
requirements.txt
gradio>=5.43.1
openai>=1.101.0
⚡ Deploy this on FlexAI
Any model, NVIDIA or AMD, live in under 60 seconds. No code changes, no infra management, hardware-agnostic.
Get started →Talk to us