Number of accelerators to use for the Training Job.
Examples
-
--accels 4
Starts an Interactive Training Job that allows connecting through SSH or VSCode to the Training Runtime, useful for fast test iterations.
The --vscode
flag is optional, but highly recommended to leverage the full potential of the Interactive Training Runtime.
flexai training debug-ssh \ --repository-url <repository_url> \ [--dataset <dataset_name>...] \ [--repository-revision <branch_name>] \ [--checkpoint <checkpoint_id>] \ [--env <env_var>=<value>...] \ [--secret <env_var>=<secret_value>...] \ [--nodes <node_count>] \ [--accels <accelerator_count>] \ [--device-arch <device_architecture>] \ [--git-author-name <git_author_name>] \ [--git-author-email <git_author_email>] \ [--dotfiles <dotfiles_repository>] \ [--authorized-keys <ssh_public_key>] \ [--session-timeout <timeout_in_seconds>] \ [--vscode]
Visit the Interactive Training guide to get more details on how to get started, recommendations and troubleshooting options.
1
Number of accelerators to use for the Training Job.
--accels 4
Path to an SSH public key.
Note that if a host ssh-agent
is not running, this flag will be required.
See GitHub’s Generating a new SSH key and adding it to the ssh-agent 🔗.
~/.ssh/id_ed25519.pub
main
The branch name of the repository.
--repository-revision main
The commit hash of the repository.
--repository-revision 53f6b645fc5d039152aef884def64288e3eeb56b
The tag name of the repository.
--repository-revision v1.0.0
The ID of the Checkpoint generated during Training Job’s execution (see flexai training checkpoints
).
--checkpoint mistral-500-checkpoint
The name of a user-provided Checkpoint (see flexai checkpoint
).
--checkpoint a1b18a7f-9b85-4c74-91a9-6aca526e8ce4
The ID of a Dataset (see flexai dataset list
).
--dataset wikitext-2-raw-v1
A key=value
pair representing a Dataset to use and its destination mount path on the Training Runtime.
Syntax: <dataset_name>=<dataset_mount_path>
--dataset wikitext-2-raw-v1=/wikitext2/v1
nvidia
nvidia
--device-arch nvidia
GitHub repository URL to a dotfiles 🔗 repository that will be installed in the Interactive Training Runtime with yadm 🔗
--dotfiles https://github.com/mathiasbynens/dotfiles
Environment variables that will be set in the Training Runtime.
--env BATCH_SIZE=32
--env WANDB_PROJECT=my-project-123
The pre-configured git config user.email
in the Interactive Training Runtime, default to git config user.email
in the host environment.
--git-author-email 'george@vandelay-industri.es'
The pre-configured git config user.name
in the Interactive Training Runtime, defaults to git config user.name
in the host environment.
--git-author-name 'George Costanza'
1
Number of nodes to use to run the Training Job.
--nodes 4
./
Path to the requirements.txt
file that will be used to install the dependencies in the Training Runtime.
This path is relative to the root of the repository (specified by the --repository-url
flag).
--requirements-path path/to/requirements.txt
Environment variables that will be set in the Training Runtime. The values of these variables are the names of Secrets (see flexai secret list
).
Secrets are sensitive values like API keys, tokens, or credentials that need to be accessed by your Training Job but should not be exposed in logs or command history. When using the --secret
flag, the actual secret values are retrieved from the Secrets Storage and injected into the environment at runtime.
Syntax:
<env_var_name>=<secret_name>
Where <env_var_name>
is the name of the environment variable to set, and <secret_name>
is the name of the Secret to use as the value.
--secret HF_TOKEN=hf-token-dev
--secret WANDB_API_KEY=wandb-key
600
Time in seconds after which the SSH session will time out.
--session-timeout 1200
The URL of the Git repository containing the model’s training code.
--repository-url https://github.com/flexaihq/nanoGPT/
--repository-url https://github.com/flexaihq/nanoGPT.git
Immediately open a new instance of Visual Studio Code and attach to the Interactive Training Runtime when it is ready.
Requires the Remote SSH VSCode extension 🔗 to be installed.
--vscode