Skip to content

Getting a Fine-tuning Job's Output

FlexAI Managed Checkpoints

FlexAI’s Managed Checkpoints feature enables you to get the final result of your Training Job after it completes, as well as being able to get intermediate checkpoints generated by your Training script.

The only thing you need to do is to make sure your Training script calls the torch.save() function and writes its output to the path specified by the FLEXAI_OUTPUT_CHECKPOINT_DIR environment variable. FlexAI’s Managed Checkpoints will handle the rest.


Once the Training Job is running, every time its code calls the torch.save() function, FlexAI’s Managed Checkpoints feature will automatically capture a Checkpoint and store it in the /output-checkpoint directory.

Each Checkpoint will be assigned a unique ID and its creation time will be recorded.

This means that you can go to a specific point in time and retrieve the state of the model at that moment, allowing you to resume training from that point or evaluate the model’s performance on a validation dataset.

After a Training Job completes, the last Checkpoint will be the one with the most recent creation timestamp.

Getting Checkpoints

You can select the gear icon ⚙️ (labeled as Configure) in the Actions field of the Training Jobs list page. This will open a “Details” panel. The Details tab will be selected by default, showing all the relevant information about your Training Job.

Navigate to the Checkpoints tab to view the list of checkpoints created during the Training Job. Each checkpoint entry includes details such as:

  • Actions:
    • Download
    • Deploy: If the Checkpoint is an “Inference-ready Checkpoint”, you can deploy it directly as an Inference Endpoint
  • Created:
    • Creation date and age
  • Training Loss: The reported training loss at the time the checkpoint was created
  • Evaluation Loss: The reported evaluation loss at the time the checkpoint was created
  • Status: The status of the checkpoint (available or processing)

Workload Raw Outputs

Currently, the FlexAI Console does not support this feature. Please refer to the "Using the FlexAI CLI" instructions instead.