Command: inference

The flexai inference command manages Inference Endpoints by allowing for their deployment and management.

An Inference Endpoint is a hosted model that can be used for inference tasks, such as text generation, image classification, and more.

Inference Endpoints are created from models hosted on Hugging Face, and they can be deployed to FlexAI’s infrastructure for easy access and scalability.

You can manage Inference Endpoints using the flexai inference set of subcommands.

Available subcommands

flexai inference delete - Deletes an Inference Endpoint.
flexai inference inspect - Displays detailed information about an Inference Endpoint.
flexai inference list - Lists all the Inference Endpoints.
flexai inference logs - Displays a stream of logs from an Inference Endpoint.
flexai inference scale - Allows for the definition of scaling policies for an Inference Endpoint.
flexai inference serve - Creates an Inference Endpoint from a model hosted by Hugging Face.
flexai inference stop - Stops an Inference Endpoint.