Checking the Training Job's Details
You can select the gear icon ⚙️ (labeled as Configure) in the Actions
field of the Training Jobs list page. This will open a “Details” drawer. The Details tab will be selected by default, showing all the relevant information about your Training Job.
Training Summary
Section titled “Training Summary”Field | Description |
---|---|
Name | The name you assigned to the Training Job. |
Status | The current status of the Training Job (e.g., pending , scheduling , building , in progress , succeeded , failed , stopped , etc.). |
Created At | The timestamp when the Training Job was created. |
Training Configuration
Section titled “Training Configuration”Field | Description |
---|---|
Dashboard URL | The URL of the Training Job dashboard, where you can monitor the performance and resource usage of your Training Job. |
Tensorboard Dashboard URL | The URL of the FlexAI-hosted TensorBoard dashboard, where you can visualize the training process of your models. |
Node Count | The number of nodes allocated to the Training Job. |
Accelerator Count | The number of accelerators (GPUs) allocated to the Training Job. |
Repository URL | The URL of the Git repository containing your training code. |
Repository Revision | The specific commit or branch of the repository that was used to create the Training Job. |
Repository Revision SHA | The SHA hash of the specific commit or branch of the repository that was used to create the Training Job. |
Entry Point | The entry point script along with its arguments. |
Datasets | The datasets that were attached to the Training Job. |
Environment | The environment variables and secrets that were set for the Training Job. Displayed in a Key-Value pai format where the Key is the name of the environment value within the Training Runtime, and the value is either the raw value (for Environment Variables) or the name of the FlexAI secret containing the secret value. |
Checkpoints | The checkpoints that were created during the Training Job. These are stored in the FlexAI object storage and can be used to resume training or to create an Inference Endpoint (depending on the type of model). |
The Details drawer also contains a Logs tab, which provides you with real-time logs from your Training Job, allowing you to monitor its activity and troubleshoot any issues that may arise. Check the Monitoring a Training Job’s Progress page for more information on how to use the logs and further monitoring options.