Inference-ready Checkpoints
FlexAI Checkpoints can be used not only to resume Training and Fine-tuning jobs but also to deploy Inference Endpoints.
When a Checkpoint is marked as Inference Ready, it means that the Checkpoint contains all the necessary files and metadata required to deploy an Inference Endpoint directly from it.
Checkpoint Metadata Extraction Process
Section titled “Checkpoint Metadata Extraction Process”Currently, the FlexAI Checkpoint Manager automatically extracts metadata from Checkpoints created using the Hugging Face Transformers library. This metadata is used to determine if a Checkpoint contains the necessary information to be marked as Inference Ready and be deployed as an Inference Endpoint:
Hugging Face Transformers Checkpoints
Section titled “Hugging Face Transformers Checkpoints”Currently, the FlexAI runtime supports Hugging Face Transformers checkpoints, which include the trainer_state.json and config.json files that contain metadata about the training process and model configuration:
STEP,TRAIN LOSS&EVAL LOSS: Extracted fromtrainer_state.json’slog_historyfield (last entry).MODEL: Determined fromconfig.json’sarchitecturesfield.VERSION: Retrieved fromconfig.json’stransformers_versionfield.INFERENCE READY: Set totrueif thearchitecturesfield is present inconfig.json.
Deploying an Inference-ready Checkpoint
Section titled “Deploying an Inference-ready Checkpoint”To deploy an Inference Endpoint from an Inference-ready Checkpoint, follow these steps:
Available soon. For now, please use the FlexAI CLI.
-
List the Checkpoints associated with a Training or Fine-tuning job.
Listing Checkpoints for a Training Job flexai training checkpoints <training_job_name>Which will return an output similar to:
List of Checkpoints for a Training Job ID │ NAME │ NODE │ STEP │ TRAIN LOSS │ EVAL LOSS │ MODEL │ VERSION │ INFERENCE READY │ TIMESTAMP─────────────────────────────────────┼───────────────────┼──────┼──────┼────────────┼───────────┼────────────────────┼─────────┼─────────────────┼──────────────────────────a494f07f-e183-4a53-a6e6-e7116ca177fd │ checkpoint-250 │ 0 │ 250 │ 0.8438 │ │ │ │ false │ 2025-09-25 02:39:42 (1d)3784735b-d7b6-4978-bd76-c6e9158d2ecc │ checkpoint-300 │ 0 │ 300 │ 0.7895 │ │ │ │ false │ 2025-09-25 02:41:02 (1d)7f7fe96a-c649-4c94-bfc7-218e17d392ba │ hf_checkpoint │ 0 │ 300 │ 0.7895 │ │ MistralForCausalLM │ 4.44.2 │ true │ 2025-09-25 02:41:02 (1d) -
Create an Inference Endpoint using the
flexai inference servecommand, specifying the Checkpoint’s name or UUID with the--checkpointflag. Example:Creating an Inference Endpoint from a Checkpoint flexai inference serve <inference_endpoint_name> \--checkpoint <checkpoint_name_or_uuid> \[<other_inference_args> ...] \-- [<vLLM_specific_args> ...]Which can look like:
Creating an Inference Endpoint from a Checkpoint flexai inference serve mistral7b-inference \--checkpoint 7f7fe96a-c649-4c94-bfc7-218e17d392ba \--model-type mistral \--accels 2