Skip to content

Checking the Training Job's Details

Training Summary

General Details about your Training Jobs

The “All Trainings” table in the FlexAI Console provides a summary of all your Training Jobs.

You can select the gear icon ⚙️ (labeled as Configure) in the Actions field of the Training Jobs list page. This will open a “Details” panel. The Details tab will be selected by default, showing all the relevant information about your Training Job.

FieldDescription
NameThe name you assigned to the Training Job.
StatusThe current status of the Training Job (e.g., pending, scheduling, building, in progress, succeeded, failed, stopped, etc.).
Created AtWorkload creation age.

Training Configuration

FieldDescription
Dashboard URLThe URL of the Training Job dashboard, where you can monitor the performance and resource usage of your Training Job.
Tensorboard Dashboard URLThe URL of the FlexAI-hosted TensorBoard dashboard, where you can visualize the training process of your models.
Node CountThe number of nodes allocated to the Training Job.
Accelerator CountThe number of accelerators (GPUs) allocated to the Training Job.
Repository URLThe URL of the Git repository containing your training code.
Repository RevisionThe specific commit or branch of the repository that was used to create the Training Job.
Repository Revision SHAThe SHA hash of the specific commit or branch of the repository that was used to create the Training Job.
Entry PointThe entry point script along with its arguments.
DatasetsThe datasets that were attached to the Training Job.
EnvironmentThe environment variables and secrets that were set for the Training Job. Displayed in a Key-Value pai format where the Key is the name of the environment value within the Training Runtime, and the value is either the raw value (for Environment Variables) or the name of the FlexAI secret containing the secret value.
CheckpointsThe checkpoints that were created during the Training Job. These are stored in the FlexAI object storage and can be used to resume training or to create an Inference Endpoint (depending on the type of model).

Next Steps

Once a Training Job is running, you can monitor its progress by checking its logs and leveraging the FlexAI Observability Services.

You'll learn more about this in the next step of the Quickstart Tutorial: Monitoring a Training Job's Progress.