A unique name for the Training Job.
Examples
-   
gpt2training-1 -   
my-model-training 
Starts a new Training Job. This command allows you to specify the dataset, repository, hardware requirements, and other parameters for the job.
flexai training run <training_or_fine_tuning_job_name> [flags] -- <entry_point_script_path> [script_args]A unique name for the Training Job.
gpt2training-1  my-model-training  The path to the entry point script for the Training or Fine-tuning Job.
gpt2training-1  my-model-training  1  Number of accelerators/GPUs to use.
Affinity rules for the workload.
FlexAI Secrets to make available during the image build process. Format: <flexai_secret_name>=<environment_variable_name>
--build-secret build_config_secret=SECRET_ENV_VAR_TO_USE  A Checkpoint to use as a starting point for a Fine-tuning Job.
The name of a previously pushed Checkpoint. Use flexai checkpoint list to see available Checkpoints.
--checkpoint Mixtral-8x7B-v0_1  --checkpoint gemma-3n-E4B-it  The UUID of an Inference Ready Checkpoint generated during the execution of a Training or Fine-tuning job. Use flexai training checkpoints to see available Checkpoints.
--checkpoint 3fa85f64-5717-4562-b3fc-2c963f66afa6  Dataset to mount on the runtime environment
--dataset open_web  --dataset fineweb-edu  Datasets to mount on the runtime environment using a custom mount path
--dataset open_web=data/train/ow --dataset fineweb-edu=/data/train/fineweb-edu  nvidia  The architecture of the device to run the Inference Endpoint on.
One of:
nvidiaamdtt--device-arch nvidia  Environment variables to set in the interactive environment.
--env WANDB_ENTITY=georgec123 --env WANDB_PROJECT=gppt-j  Displays this help page.
Disables queuing for this workload: If no resources are available, the workload will fail immediately instead of waiting for resources to become available.
1  The number of nodes across which to distribute the workload.
Selecting more than 1 node will overwrite the value provided in the —accels flag to 8 accelerator per node.
--nodes 1  --nodes 4  main  The branch name of the repository. main by default.
--repository-revision secondary  --repository-revision testing  A commit SHA hash to use.
--repository-revision 9fceb02  --repository-revision e5bd391  A tag name to use.
--repository-revision v1.0.0  --repository-revision release-2024  Git repository URL containing code to mount on the workload environment.
Will be mounted on the /workspace directory.
--repository-url https://github.com/flexaihq/nanoGPT/  --repository-url https://github.com/flexaihq/nanoGPT.git  Path to a pip requirements.txt file in the repository.
--requirements-path code/project/requirements.txt  Name of the runtime to use
Environment variables that will be set in the Training Runtime.
Secrets are sensitive values like API keys, tokens, or credentials that need to be accessed by your Training Job but should not be exposed in logs or command history. When using the —secret flag, the actual secret values are retrieved from the Secrets Storage and injected into the environment at runtime.
Syntax:
<env_var_name>=<flexai_secret_name>Where <env_var_name> is the name of the environment variable to set, and <flexai_secret_name> is the name of the Secret containing the sensitive value.
--secret HF_TOKEN=hf-token-dev  --secret WANDB_API_KEY=wandb-key  Enables verbose logging for detailed output during training job execution.
flexai training run gpt2training-1 --dataset wikitext-2-raw-v1 --repository-url https://github.com/flexaihq/nanoGPT/ --accels 4 --secret HF_TOKEN=hf-token-dev --env BATCH_SIZE=32 -- train.py --batch-size 32 --epochs 10