Getting a Training Job's Output
Currently, the FlexAI Web Console relies on the FlexAI CLI to obtain the output of Training Jobs. This is a temporary measure, and this functionality will be integrated into the Web Console in the future.
FlexAI Managed Checkpoints
Section titled “FlexAI Managed Checkpoints”FlexAI’s Managed Checkpoints feature enables you to get the final result of your Training Job after it completes, as well as being able to get intermediate checkpoints generated by your Training script.
The only thing you need to do is to make sure your Training script calls the torch.save()
function and writes its output to the path specified by the FLEXAI_OUTPUT_CHECKPOINT_DIR
environment variable. FlexAI’s Managed Checkpoints will handle the rest.
Getting Checkpoints
Section titled “Getting Checkpoints”You can download and export a Training Job’s Checkpoints using the FlexAI CLI. Follow the steps of the Getting a Training Job’s Output section of the CLI Quickstart Tutorial to learn how to download the results of your Training Job.