Inference-ready Checkpoints

FlexAI Checkpoints can be used not only to resume Training and Fine-tuning jobs but also to deploy Inference Endpoints.

When a Checkpoint is marked as Inference Ready, it means that the Checkpoint contains all the necessary files and metadata required to deploy an Inference Endpoint directly from it.

Checkpoint Metadata Extraction Process

Currently, the FlexAI Checkpoint Manager automatically extracts metadata from Checkpoints created using the Hugging Face Transformers library. This metadata is used to determine if a Checkpoint contains the necessary information to be marked as Inference Ready and be deployed as an Inference Endpoint:

Hugging Face Transformers Checkpoints

Currently, the FlexAI runtime supports Hugging Face Transformers checkpoints, which include the trainer_state.json and config.json files that contain metadata about the training process and model configuration:

STEP, TRAIN LOSS & EVAL LOSS: Extracted from trainer_state.json’s log_history field (last entry).
MODEL: Determined from config.json’s architectures field.
VERSION: Retrieved from config.json’s transformers_version field.
INFERENCE READY: Set to true if the architectures field is present in config.json.

Deploying an Inference-ready Checkpoint

To deploy an Inference Endpoint from an Inference-ready Checkpoint, follow these steps:

Using the FlexAI Console
Using the FlexAI CLI

Available soon. For now, please use the FlexAI CLI.

List the Checkpoints associated with a Training or Fine-tuning job.

flexai training checkpoints <training_job_name>

Which will return an output similar to:

    ID                               │ NAME              │ NODE │ STEP │ TRAIN LOSS │ EVAL LOSS │ MODEL              │ VERSION │ INFERENCE READY │ TIMESTAMP
─────────────────────────────────────┼───────────────────┼──────┼──────┼────────────┼───────────┼────────────────────┼─────────┼─────────────────┼──────────────────────────
a494f07f-e183-4a53-a6e6-e7116ca177fd │ checkpoint-250    │ 0    │ 250  │ 0.8438     │           │                    │         │ false           │ 2025-09-25 02:39:42 (1d)
3784735b-d7b6-4978-bd76-c6e9158d2ecc │ checkpoint-300    │ 0    │ 300  │ 0.7895     │           │                    │         │ false           │ 2025-09-25 02:41:02 (1d)
7f7fe96a-c649-4c94-bfc7-218e17d392ba │ hf_checkpoint     │ 0    │ 300  │ 0.7895     │           │ MistralForCausalLM │ 4.44.2  │ true            │ 2025-09-25 02:41:02 (1d)

Create an Inference Endpoint using the flexai inference serve command, specifying the Checkpoint’s name or UUID with the --checkpoint flag. Example:

flexai inference serve <inference_endpoint_name> \
  --checkpoint <checkpoint_name_or_uuid> \
  [<other_inference_args> ...] \
  -- [<vLLM_specific_args> ...]

Which can look like:

flexai inference serve mistral7b-inference \
  --checkpoint 7f7fe96a-c649-4c94-bfc7-218e17d392ba \
  --model-type mistral \
  --accels 2

Want to learn more about FlexAI Inference Check out the FlexAI Inference Endpoints documentation for more details.