training run

Starts a new Training Job. This command allows you to specify the dataset, repository, hardware requirements, and other parameters for the job.

Usage

flexai training run <training_or_fine_tuning_job_name> [flags] -- <entry_point_script_path> [script_args]

Arguments

Argument	Type	Required	Description
`training_or_fine_tuning_job_name`	string	Yes	A unique name for the Training Job.
`entry_point_script_path`	string	Yes	The path to the entry point script for the Training or Fine-tuning Job.

Flags

Flag	Short	Type	Description
`--accels`	`-a`	integer	Number of accelerators/GPUs to use. Default: `1`
`--affinity`		key-value	Affinity rules for the workload.
`--build-secret`		key-value	FlexAI Secrets to make available during the image build process. Format: `<flexai_secret_name>`=`<environment_variable_name>`
`--checkpoint`	`-C`	string	A Checkpoint to use as a starting point for a Fine-tuning Job. The name of a previously pushed Checkpoint (use `flexai checkpoint list` to see available Checkpoints) or the UUID of an Inference Ready Checkpoint generated during the execution of a Training or Fine-tuning job (use `flexai training checkpoints` to see available Checkpoints).
`--dataset`	`-D`	string	Dataset to mount on the runtime environment. Can be specified as a simple name or as a key-value mapping to use a custom mount path.
`--device-arch`	`-d`	string	The architecture of the device to run the Inference Endpoint on. Default: `nvidia`
`--env`	`-E`	key-value	Environment variables to set in the interactive environment.
`--help`	`-h`	boolean	Displays this help page.
`--no-queuing`		boolean	Disables queuing for this workload: If no resources are available, the workload will fail immediately instead of waiting for resources to become available.
`--nodes`	`-n`	integer	The number of nodes across which to distribute the workload. Selecting more than 1 node will overwrite the value provided in the `--accels` flag to 8 accelerator per node. Default: `1`
`--repository-revision`	`-b`	string	The branch name of the repository (default: `main`), a commit SHA hash, or a tag name.
`--repository-url`	`-u`	string	Git repository URL containing code to mount on the workload environment. Will be mounted on the `/workspace` directory.
`--requirements-path`	`-q`	string	Path to a pip requirements.txt file in the repository.
`--runtime`	`-r`	string	Name of the runtime to use
`--secret`	`-S`	key-value	Environment variables that will be set in the Training Runtime. Secrets are sensitive values like API keys, tokens, or credentials that need to be accessed by your Training Job but should not be exposed in logs or command history. When using the —secret flag, the actual secret values are retrieved from the Secrets Storage and injected into the environment at runtime. Syntax: `<env_var_name>`=`<flexai_secret_name>` where `<env_var_name>` is the name of the environment variable to set, and `<flexai_secret_name>` is the name of the Secret containing the sensitive value.
`--verbose`	`-v`	boolean	Enables verbose logging for detailed output during training job execution.

Examples

Start a training job with dataset and repository

flexai training run gpt2training-1 --dataset wikitext-2-raw-v1 --repository-url https://github.com/flexaihq/nanoGPT/ --accels 4 --secret HF_TOKEN=hf-token-dev --env BATCH_SIZE=32 -- train.py --batch-size 32 --epochs 10

Getting Started

Inference

Fine-tuning

Training

Platform Services

Interactive Development

CLI

Console

Best Practices

FAQ

Usage

Arguments

Flags

Examples

Start a training job with dataset and repository

Getting Started

Inference

Fine-tuning

Training

Platform Services

Interactive Development

CLI

Console

Best Practices

FAQ

​Usage

​Arguments

​Flags

​Examples

​Start a training job with dataset and repository

Usage

Arguments

Flags

Examples

Start a training job with dataset and repository