Fine-Tuning

Fine-tune pre-trained AI models with your own data to create custom models tailored to your specific use cases.

You can fine-tune public and private models hosted on the Hugging Face Model Hub or models you’ve previously trained with FlexAI.

Key Features

Custom Models

Bring your own Custom Model tailored to your needs

Start from Existing Model or Checkpoint

Launch fine-tuning jobs from an existing model or a FlexAI Checkpoint

Managed Checkpoints

Automatic checkpoint management

Progress Monitoring

Real-time training progress tracking

Dataset Flexibility

Support for various Datasets and any kind of file formats

Third-Party Sources

Pull base models and Datasets from third-party sources such as Hugging Face

The resources you need, when you need them

Pick from a single GPU to multiple GPUs running on different Nodes

Anatomy of a Fine-tuning Job

A Fine-tuning Job in FlexAI consists of the following components:

Base Model Checkpoint: The base model Checkpoint you want to use to fine-tune your new model.
Fine-tuning Script: The script that defines the fine-tuning process, including data loading, model training, evaluation, Checkpoint generation, and so on.
Dataset: The Dataset you will use for to Fine-tune your new model.
- It can be an existing FlexAI Dataset, or it can be pulled/streamed from an external source during runtime.
Secrets: Any sensitive information (e.g., API keys, passwords) required for the fine-tuning process.
Hyperparameters: Configuration settings that control the training process (e.g., learning rate, batch size).

Base Model Checkpoint

A Fine-tuning Job by definition takes a base model as its starting point. It can be a base model you pull from an external source such as Hugging Face, or it can be a FlexAI Checkpoint from a previous Training or Fine-tuning Job.

A Fine-tuning Job can also generate multiple Checkpoints during its execution, which you can use to go back in time and pick the one that performs the best for your use case.

Fine-tuning Script

The fine-tuning script is a Python script that defines the fine-tuning process. It handles task such as:

Loading the base model Checkpoint from the /input-checkpoint/ path, or downloading it from an external source.
Loading a Dataset mounted through the FlexAI Dataset Manager on the /input/ path or from an external source.
Performing any necessary data processing steps.
Reading the values of any Secrets passed to the Fine-tuning Job’s runtime environment.
Applying any hyperparameters configurations passed to the Fine-tuning Script.
Running the training loop and periodically saving Checkpoints to /output-checkpoints/ path.

Dataset

A Fine-tuning Job generally leverages a Dataset to adjust a pre-trained model’s weights using new data. The Dataset can be:

An existing FlexAI Dataset.
Pulled/streamed from an external source during runtime.

The Dataset can be in any format, and it can contain any kind of files. The Fine-tuning Script is responsible for loading and processing the data as needed.

Secrets

Secrets are used to securely pass sensitive information to the Fine-tuning Job’s runtime environment. This can include API keys, passwords, or any other confidential data required for the fine-tuning process.

Secrets are managed through the FlexAI Secret Manager and are injected into the Fine-tuning Job’s environment as environment variables.

Hyperparameters

Hyperparameters are configuration settings that control the behavior of the Fine-tuning Script. They can include parameters such as learning rate, batch size, number of epochs, where to load the dataset from, where to save the model checkpoints, how often to save checkpoints, and so on.

Key Concepts

Third-Party Integration

Your FlexAI Fine-tuning Jobs can interact with third-party APIs and services by leveraging the FlexAI Secret Manager to securely store sensitive information such as API keys and tokens. These secrets can then be injected into the Fine-tuning Job’s runtime environment as environment variables, allowing your Fine-tuning scripts to access them securely and enabling flexible and powerful AI workflows.

A Fine-tuning Job’s Lifecycle

Learn more about the different statuses a Fine-tuning Job can have in the Fine-tuning Job Lifecycle page.

CLI Reference

The flexai training command manages Training Jobs as well as Fine-tuning Jobs. These two types of workloads share the same command set. A Fine-tuning Job is a Training Job that begins its execution from an existing model Checkpoint.

Getting Started

FlexAI fine-tuning jobs can be launched in a few steps. The Quickstart guide will walk you through the process of preparing your data and starting your first Fine-tuning job. Here’s a brief overview of the steps involved:

Preparing your dataset for fine-tuning.
Launching a fine-tuning job using FlexAI.
Monitoring training progress and managing checkpoints.

The button below will lead you to the FlexAI Fine-Tuning Quickstart guide’s overview, where you’ll find more details.