FlexAI Inference Endpoints

Deploying a model for Inference is a common task in machine learning workflows and FlexAI provides a streamlined process to achieve this.

In this guide, we will walk you through the steps you need to tae to deploy an Inference endpoint without the need for any code or configuration files —just a model name and in some cases, a Hugging Face Access token.

Prerequisites

A FlexAI account. If you don’t have one, you can sign up for free 🔗.
A vLLM supported model that you want to deploy. A list of vLLM supported models list found here 🔗
Depending on the model you want to deploy, you may need a Hugging Face Access Token. You can create one by following the instructions in the Hugging Face documentation 🔗.