The name of the Training Job to list checkpoints for.
Examples
-
gpt2training-1
Lists out the Checkpoints that have been generated for a Training Job.
Checkpoints are generated by the FlexAI runtime when a Training scriptβs code calls the torch.save()
function π.
flexai training checkpoints <training_job_name>
The name of the Training Job to list checkpoints for.
gpt2training-1
Output the information in JSON format.
--json
flexai training checkpoints gpt2training-1
Which will output:
ID β NAME β NODE β STEP β TRAIN LOSS β EVAL LOSS β MODEL β VERSION β INFERENCE READY β TIMESTAMPβββββββββββββββββββββββββββββββββββββββΌββββββββββββββββΌβββββββΌβββββββΌβββββββββββββΌββββββββββββΌββββββββββββββββββΌββββββββββΌββββββββββββββββββΌβββββββββββββββββββββββββββββββββββ ce4d8def-b6bf-4cc6-8067-de9d312a82c5 β checkpoint-50 β 0 β 50 β 3.3707 β 3.1356 β GPT2LMHeadModel β 4.53.2 β true β 2025-07-15 09:30:10.549 +0000 UTC cabf2516-59f5-4ce7-8d14-220a8ca57ba7 β checkpoint-99 β 0 β 90 β 3.2857 β 3.1356 β GPT2LMHeadModel β 4.53.2 β true β 2025-07-15 09:30:51.46 +0000 UTC c26d1452-6d41-4b23-805e-30eeeadca729 β β 0 β 99 β 3.2857 β 3.1356 β GPT2LMHeadModel β 4.53.2 β true β 2025-07-15 09:30:51.459 +0000 UTC
Column | Description |
---|---|
ID | The unique identifier of the checkpoint. |
NAME | The human-readable name of the checkpoint. |
NODE | The Node where the checkpoint was created. |
STEP | The training step at which the checkpoint was created. |
TRAIN LOSS | The training loss at the time the checkpoint was created. |
EVAL LOSS | The evaluation loss at the time the checkpoint was created. |
MODEL | The name of the base model used in the checkpoint. |
VERSION | The version of the model used in the checkpoint, such as 4.53.2 . |
INFERENCE READY | Indicates whether the checkpoint is ready for inference or not. Meaning it includes all necessary files and metadata required for inference to tasks. |
TIMESTAMP | The ISO 8601 formatted timestamp of when the checkpoint was created. |