Skip to content

Command: checkpoint

Checkpoints are a FlexAI entity that represents a snapshot of a model’s state at a given point in time.

Checkpoints capture a model’s state at various stages of training. These snapshots include model weights, optimizer state, and other relevant training data. This allows you to resume training from a specific point, preventing data loss and enabling experimentation with different training paths while helping you avoid unnecessarily repeating training iterations.

Checkpoints can be pushed to FlexAI directly from the host machine running the FlexAI CLI or from a Remote Storage Provider Connection, such as Amazon S3, Cloudflare R2, or GCP Cloud Storage, among others. They can be individual files or entire directories.

You will find more information about Managed Checkpoints and the benefits they bring to your AI training workflows in the Managed Checkpoints page.

You can manage Checkpoints using the flexai checkpoint set of subcommands.