> ## Documentation Index
> Fetch the complete documentation index at: https://docs.flex.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Getting a Training Job's Output

> Retrieve output files and results from a completed training job

## FlexAI Managed Checkpoints

<Tabs>
  <Tab title="Using the FlexAI Console">
    [FlexAI's Managed Checkpoints](/platform-services/checkpoint-manager/) feature enables you to get the final result of your Training Job after it completes, as well as being able to get intermediate checkpoints generated by your Training script.

    The only thing you need to do is to make sure your Training script calls the `torch.save()` function and writes its output to the path specified by the `FLEXAI_OUTPUT_CHECKPOINT_DIR` environment variable. FlexAI's Managed Checkpoints will handle the rest.

    Once the Training Job is running, every time its code calls the `torch.save()` function, FlexAI's Managed Checkpoints feature will automatically capture a Checkpoint and store it in the `/output-checkpoint` directory.

    Each Checkpoint will be assigned a unique ID and its creation time will be recorded.

    This means that you can go to a specific point in time and retrieve the state of the model at that moment, allowing you to resume training from that point or evaluate the model's performance on a validation dataset.

    After a Training Job completes, the last Checkpoint will be the one with the most recent creation timestamp.
  </Tab>

  <Tab title="Using the FlexAI CLI">
    You can retrieve checkpoints generated by *FlexAI's Managed Checkpoints* at any point, which allows you to go back to a previous point in the past to resume training, to test your model, or to use it for inference.

    ### Listing Checkpoints

    You can list all available checkpoints for a specific Training Job by running the [`flexai training checkpoints`](/cli/reference/training/checkpoints) command:

    ```bash theme={null}
    flexai training checkpoints quickstart-training-job
    ```

    This will return a table with a list of Checkpoint IDs and their corresponding creation timestamps, similar to the following:

    ```text theme={null}

     ID                                   │ TIMESTAMP
    ──────────────────────────────────────┼────────────────────────────────────
     50e5ec69-32b6-e483-9c49-38a73cc34294 │ 2025-06-30 12:42:55.214 +0100 WEST
     82d21263-8ba8-dd73-9c61-732d3b7b0adc │ 2025-06-30 12:43:01.77 +0100 WEST
     32d07a60-61cc-4598-b4f6-2073a4f8d0af │ 2025-06-30 12:43:14.734 +0100 WEST

    ```
  </Tab>
</Tabs>

## Getting Checkpoints

<Tabs>
  <Tab title="Using the FlexAI Console">
    You can select the gear icon ⚙️ (labeled as *Configure*) in the `Actions` field of the Training Jobs list page. This will open a "Details" panel. The **Details** tab will be selected by default, showing all the relevant information about your Training Job.

    Navigate to the **Checkpoints** tab to view the list of checkpoints created during the Training Job. Each checkpoint entry includes details such as:

    * Actions:
      * Download
      * Deploy: If the Checkpoint is an "Inference-ready Checkpoint", you can deploy it directly as an Inference Endpoint
    * Created:
      * Creation date and age
    * Training Loss: The reported training loss at the time the checkpoint was created
    * Evaluation Loss: The reported evaluation loss at the time the checkpoint was created
    * Status: The status of the checkpoint (`available` or `processing`)
  </Tab>

  <Tab title="Using the FlexAI CLI">
    Once you have the desired Checkpoint ID, you can download it to your host machine using the [`flexai checkpoint fetch`](/cli/reference/checkpoint/fetch) command:

    ```bash theme={null}
    flexai checkpoint fetch 32d07a60-61cc-4598-b4f6-2073a4f8d0af
    ```

    ```text theme={null}
    Writing in:  /home/diego/ckpt.pt
    Progress: 0.4% (1.31 MB / 343.79 MB)
    // ...
    Progress: 100% (343.79 MB / 343.79 MB)
    ```

    You can use this checkpoint file to resume training from the exact point it was saved, or to evaluate the model's performance on a validation dataset.

    <Tip>
      **Exporting a Checkpoint**

      Use the [`flexai checkpoint export`](/cli/reference/checkpoint/export) command to export a Checkpoint to a remote location by using a previously registered Remote Storage Connection, such as S3, GCS, MinIO or R2.

      This allows you to store your checkpoints in a more permanent location for later use.
    </Tip>
  </Tab>
</Tabs>

## Workload Raw Outputs

<Tabs>
  <Tab title="Using the FlexAI Console">
    Currently, the FlexAI Console does not support this feature. Please refer to the "Using the FlexAI CLI" instructions instead.
  </Tab>

  <Tab title="Using the FlexAI CLI">
    Any data written to the `/output` directory will be compressed into a zip file and made available to you via the [`flexai training fetch`](/cli/reference/training/fetch) command:

    ```bash theme={null}
    flexai training fetch quickstart-training-job
    ```

    This will download a `.zip` file to the current working directory on your host machine.

    Once extracted you'll get a local directory named `output` it will contain any files written to the `/output` directory by the training scripts.

    <Note>
      For this quickstart example we configured the training code to write to the checkpoints directory `/output-checkpoint` directory when we set the value of `--out_dir`, so the `/output` directory won't be used.

      However, you can run another Training Job that instead writes out to the `/output` directory. In this case, only the last checkpoint will be saved in the `/output` directory as a result of the training process.

      Note that your code can use both locations (`/output` and `/output-checkpoint`) simultaneously if needed.
    </Note>
  </Tab>
</Tabs>
