Skip to content

Monitoring a Training Job's Progress

Logs

You can select the gear icon ⚙️ (labeled as Configure) in the Actions field of the Training or Fine-tuning Jobs list page. This will open a “Details” panel.

Select the Logs tab to view the messages emitted to the standard output (stdout) created by the runtime environment build process and your own code.

You can use the Search bar input field to filter the logs by a specific keyword. This is useful to quickly find relevant information in the logs.

Infrastructure Metrics

You can monitor the infrastructure metrics of your Training Job using the FlexAI Infrastructure Monitor. This will give you insights into the resource usage of your Training Job, such as CPU and memory usage, disk I/O, and network traffic.

Access FlexAI's Infrastructure Monitor by visiting https://dashboards.flex.ai/. Visit the FlexAI Infrastructure Monitor page to learn more.

TensorBoard

You can also use FlexAI's hosted TensorBoard to visualize the training process of your model. TensorBoard provides a suite of tools for inspecting and understanding your Training Job's evolution.

Visit https://dashboards.flex.ai/tensorboard and log in using your credentials. Learn more at the FlexAI TensorBoard page.

Next Steps

After a few minutes, your Training Job should have completed successfully!

The next step of this Quickstart Tutorial will guide you through the process of getting its outputs or results.