Prerequisites
Before starting, make sure you have:- A Flexai account with access to the platform
- A Hugging Face token with access to the
stabilityai/stable-audio-open-1.0model - The
flexaiCLI installed and configured
Setup FlexAI Secret for Hugging Face Token
First, create a FlexAI secret that contains your Hugging Face token to access the inference model:Make sure your Hugging Face token has access to the
stabilityai/stable-audio-open-1.0 model. You may need to accept the model’s license terms on Hugging Face first.Start the FlexAI Inference Endpoint
Start the FlexAI endpoint for the Stable Audio Open 1.0 model:- Create an inference endpoint named
stable-audio-open - Use the
flexserveruntime optimized for text-to-audio tasks - Load the Stable Audio Open 1.0 model from Hugging Face
- Configure it for text-to-audio generation
Get Endpoint Information
Once the endpoint is deployed, you’ll see the API key displayed in the output. Store it in an environment variable:You’ll notice these
export lines use the jq tool to extract values from the JSON output of the inspect command.If you don’t have it already, you can get jq from its official website: https://jqlang.org/Generate Audio
Now you can generate audio by making HTTP POST requests to your endpoint. Here are some examples:Example 1: Relaxing Music
Example 2: Nature Soundscape
Example 3: Electronic Beat
Parameters Explanation
The API accepts the following parameters:- inputs: The text prompt describing the audio you want to generate
- audio_length_in_s: Output audio duration in seconds (typically up to 47 seconds for Stable Audio Open)
- num_inference_steps: Number of denoising steps (higher = better quality but slower, recommended: 100-200)
- guidance_scale: As you increase the value, the model tries harder to match your prompt.
- negative_prompt: Description of what you don’t want in the audio (helps improve quality)
- seed: Random seed for reproducible results