> ## Documentation Index
> Fetch the complete documentation index at: https://docs.flex.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# training run

> Start a new training job on FlexAI with custom configuration

Starts a new Training Job. This command allows you to specify the dataset, repository, hardware requirements, and other parameters for the job.

## Usage

```bash theme={null}
flexai training run <training_or_fine_tuning_job_name> [flags] -- <entry_point_script_path> [script_args]
```

## Arguments

| Argument                           | Type   | Required | Description                                                             |
| ---------------------------------- | ------ | -------- | ----------------------------------------------------------------------- |
| `training_or_fine_tuning_job_name` | string | Yes      | A unique name for the Training Job.                                     |
| `entry_point_script_path`          | string | Yes      | The path to the entry point script for the Training or Fine-tuning Job. |

## Flags

| Flag                    | Short | Type      | Description                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |
| ----------------------- | ----- | --------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `--accels`              | `-a`  | integer   | Number of accelerators/GPUs to use. Default: `1`                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |
| `--affinity`            |       | key-value | Pins the workload to a specific cluster. Format: `cluster=<cluster_name>`. Use [`flexai cluster list`](/cli/reference/cluster/list) to see clusters available to your organization. Useful when you need a specific accelerator type (for example, NVIDIA H100 vs A100) that lives on a particular cluster. The only recognized key today is `cluster`. When set, this takes precedence over the cluster FlexAI would have selected from `--device-arch`.                                                                                                                                      |
| `--build-secret`        |       | key-value | FlexAI Secrets to make available during the image build process. Format: `<flexai_secret_name>`=`<environment_variable_name>`                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
| `--checkpoint`          | `-C`  | string    | A Checkpoint to use as a starting point for a Fine-tuning Job. The name of a previously pushed Checkpoint (use [`flexai checkpoint list`](/cli/reference/checkpoint/list) to see available Checkpoints) or the UUID of an [*Inference Ready Checkpoint*](/platform-services/checkpoint-manager/inference-ready-checkpoints/) generated during the execution of a Training or Fine-tuning job (use [`flexai training checkpoints`](/cli/reference/training/checkpoints) to see available Checkpoints).                                                                                          |
| `--dataset`             | `-D`  | string    | Dataset to mount on the runtime environment. Can be specified as a simple dataset name, or as `<dataset-name>=<mount-path>` to use a custom mount path under `/input`.                                                                                                                                                                                                                                                                                                                                                                                                                         |
| `--device-arch`         | `-d`  | string    | The architecture of the device to run the Inference Endpoint on. Default: `nvidia`                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
| `--env`                 | `-E`  | key-value | Environment variables to set in the interactive environment.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |
| `--help`                | `-h`  | boolean   | Displays this help page.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |
| `--no-queuing`          |       | boolean   | Disables queuing for this workload: If no resources are available, the workload will fail immediately instead of waiting for resources to become available.                                                                                                                                                                                                                                                                                                                                                                                                                                    |
| `--nodes`               | `-n`  | integer   | The number of nodes across which to distribute the workload. Selecting more than 1 node will overwrite the value provided in the `--accels` flag to 8 accelerator per node. Default: `1`                                                                                                                                                                                                                                                                                                                                                                                                       |
| `--repository-revision` | `-b`  | string    | The branch name of the repository (default: `main`), a commit SHA hash, or a tag name.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |
| `--repository-url`      | `-u`  | string    | Git repository URL containing code to mount on the workload environment. Will be mounted on the `/workspace` directory.                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |
| `--requirements-path`   | `-q`  | string    | Path to a pip requirements.txt file in the repository.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |
| `--runtime`             | `-r`  | string    | Name of the runtime to use                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |
| `--secret`              | `-S`  | key-value | Environment variables that will be set in the Training Runtime. Secrets are sensitive values like API keys, tokens, or credentials that need to be accessed by your Training Job but should not be exposed in logs or command history. When using the --secret flag, the actual secret values are retrieved from the Secrets Storage and injected into the environment at runtime. Syntax: `<env_var_name>`=`<flexai_secret_name>` where `<env_var_name>` is the name of the environment variable to set, and `<flexai_secret_name>` is the name of the Secret containing the sensitive value. |
| `--verbose`             | `-v`  | boolean   | Enables verbose logging for detailed output during training job execution.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |

## Examples

### Start a training job with dataset and repository

```bash theme={null}
flexai training run gpt2training-1 --dataset my_data=wikitext-2-raw-v1 --repository-url https://github.com/flexaihq/nanoGPT/ --accels 4 --secret HF_TOKEN=hf-token-dev --env BATCH_SIZE=32 -- train.py --batch-size 32 --epochs 10
```

### Pin a training job to a specific cluster

Use `--affinity cluster=<cluster_name>` to target a specific cluster — for example, when you need a particular accelerator type. Run [`flexai cluster list`](/cli/reference/cluster/list) to discover the cluster names available to your organization.

```bash theme={null}
flexai training run gpt2training-1 \
  --repository-url https://github.com/flexaihq/nanoGPT/ \
  --affinity cluster=<cluster_name_from_flexai_cluster_list> \
  --accels 4 -- train.py
```
