Getting a Training Job's Output

Training Job’s Checkpoints

You can retrieve checkpoints generated by FlexAI’s Managed Checkpoints at any point, which allows you to go back to a previous point in the past to resume training, to test your model, or to use it for inference.

Listing Checkpoints

You can list all available checkpoints for a specific Training Job by running the flexai training checkpoints command:

flexai training checkpoints quickstart-training-job

This will return a table with a list of Checkpoint IDs and their corresponding creation timestamps, similar to the following:

 ID                                   │ TIMESTAMP
──────────────────────────────────────┼────────────────────────────────────
 50e5ec69-32b6-e483-9c49-38a73cc34294 │ 2025-06-30 12:42:55.214 +0100 WEST
 82d21263-8ba8-dd73-9c61-732d3b7b0adc │ 2025-06-30 12:43:01.77 +0100 WEST
 32d07a60-61cc-4598-b4f6-2073a4f8d0af │ 2025-06-30 12:43:14.734 +0100 WEST

Fetching a Checkpoint

Once you have the desired Checkpoint ID, you can download it to your host machine using the flexai checkpoint fetch command:

flexai checkpoint fetch 32d07a60-61cc-4598-b4f6-2073a4f8d0af

Writing in:  /home/diego/ckpt.pt
Progress: 0.4% (1.31 MB / 343.79 MB)
// ...
Progress: 100% (343.79 MB / 343.79 MB)

You can use this checkpoint file to resume training from the exact point it was saved, or to evaluate the model’s performance on a validation dataset.

A Training Job’s output

Any data written to the /output directory will be compressed into a zip file and made available to you via the flexai training fetch command:

flexai training fetch quickstart-training-job

This will download a .zip file to the current working directory on your host machine.

Once extracted you’ll get a local directory named output it will contain any files written to the /output directory by the training scripts.

Success!

You're ready to get started!

You’ve learned how to upload a Dataset and then use it to run a Training Job using training code hosted on a public GitHub repository.

You now have the knowledge required to create run your own Training Jobs on FlexAI by integrating your own public or private Code Repositories and loading your datasets.

Next Steps

Private Code Repositories

You can use any public or private GitHub repository as the source of your training code when using the --repository-url flag.

However, you can also use the flexai code-registry command to connect your GitHub account to FlexAI and use any of your private repositories as well.

Dataset Upload

FlexAI makes it easy to upload Datasets from your host machine through the flexai dataset push command.

But wait, there’s more! You can also push Datasets from remote sources, such as S3, GCS, MinIO or R2.

Interactive Training

With FlexAI you can run an “Interactive Training Job session” that allows you SSH into a Training Environment where you have access to the entire system by using the flexai training debug-ssh command.

This is useful for debugging and testing purposes, allowing you to test your training code in the environment it’ll be running on, reducing iteration times.

CLI Command Reference

Explore the CLI Command Reference pages to learn about all the ways you can use the FlexAI CLI to manage your workloads.

You will find a page for each CLI Command along with each of its subcommands, example usage, recommendations, flags you can use, output messages, and more!