Skip to main content

2025-02-04

Highlights

  • Enabled support for FlashAttention: Training Jobs can now make use of the FlashAttention package without requiring any additional setup other than including it in the requirements.txt file of the repository.

Added

  • Checkpoint Details: Use the flexai checkpoint inspect command to view the detailed contents and metadata of checkpoints uploaded via the flexai checkpoint push, including its files, source, and both creation and update times.
  • Storage Connection Details: Use the storage inspect command to review connections to storage providers (e.g. AWS S3) and associated metadata.
  • Enabled support for FlashAttention: The Training runtime now includes the FLASH_ATTENTION_SKIP_CUDA_BUILD=1 environment variable to allow for the flash-attn to be used during Training Jobs.

Changed

  • Validation for Storage Connections: When creating a Remote Storage Provider Connection using flexai storage create, an error will be returned if the specified Secret name cannot be found, providing immediate feedback instead of creating an invalid connection.
  • Enhanced Training commands messages: Improved the messages for flexai training subcommands to provide more context better guidance when during the different workflows.

Fixed

  • Last Checkpoint Availability: Fixed an issue where the final checkpoint created by FCS-managed checkpoints for a Training Job was sometimes inaccessible.