Highlights
- Enabled support for FlashAttention: Training Jobs can now make use of the FlashAttention package without requiring any additional setup other than including it in the
requirements.txtfile of the repository.
Added
- Checkpoint Details: Use the
flexai checkpoint inspectcommand to view the detailed contents and metadata of checkpoints uploaded via theflexai checkpoint push, including its files, source, and both creation and update times. - Storage Connection Details: Use the
storage inspectcommand to review connections to storage providers (e.g. AWS S3) and associated metadata. - Enabled support for FlashAttention: The Training runtime now includes the
FLASH_ATTENTION_SKIP_CUDA_BUILD=1environment variable to allow for the flash-attn to be used during Training Jobs.
Changed
- Validation for Storage Connections: When creating a Remote Storage Provider Connection using
flexai storage create, an error will be returned if the specified Secret name cannot be found, providing immediate feedback instead of creating an invalid connection. - Enhanced Training commands messages: Improved the messages for
flexai trainingsubcommands to provide more context and better guidance during the different workflows.
Fixed
- Last Checkpoint Availability: Fixed an issue where the final checkpoint created by FCS-managed checkpoints for a Training Job was sometimes inaccessible.