Skip to content

training debug-ssh

Starts an Interactive Training Job that allows connecting through SSH or VSCode to the Training Runtime, useful for fast test iterations.

The --vscode flag is optional, but highly recommended to leverage the full potential of the Interactive Training Runtime.

Terminal window
flexai training debug-ssh \
--repository-url <repository_url> \
[--dataset <dataset_name>...] \
[--repository-revision <branch_name>] \
[--checkpoint <checkpoint_id>] \
[--env <env_var>=<value>...] \
[--secret <env_var>=<secret_value>...] \
[--nodes <node_count>] \
[--accels <accelerator_count>] \
[--device-arch <device_architecture>] \
[--git-author-name <git_author_name>] \
[--git-author-email <git_author_email>] \
[--dotfiles <dotfiles_repository>] \
[--authorized-keys <ssh_public_key>] \
[--session-timeout <timeout_in_seconds>] \
[--vscode]

Visit the Interactive Training guide to get more details on how to get started, recommendations and troubleshooting options.

-a , --accels
<integer>
Optional
Default Value: 1
Integer

Number of accelerators to use for the Training Job.

Examples
  • --accels 4
Optional
Default Value: main
String

The branch name of the repository.

Examples
  • --repository-revision main
UUID

The commit hash of the repository.

Examples
  • --repository-revision 53f6b645fc5d039152aef884def64288e3eeb56b
String

The tag name of the repository.

Examples
  • --repository-revision v1.0.0
Optional
UUID

The name of a user-provided Checkpoint (see flexai checkpoint).

Examples
  • --checkpoint a1b18a7f-9b85-4c74-91a9-6aca526e8ce4
-D , --dataset
<string><key=value>
Optional
Key Value Path Mapping

A key=value pair representing a Dataset to use and its destination mount path on the Training Runtime.

Syntax: <dataset_name>=<dataset_mount_path>

Examples
  • --dataset wikitext-2-raw-v1=/wikitext2/v1
Optional
Default Value: nvidia
Option list
  • nvidia
Examples
  • --device-arch nvidia
--dotfiles
<string>
Optional
Git Repository

GitHub repository URL to a dotfiles 🔗 repository that will be installed in the Interactive Training Runtime with yadm 🔗

Examples
  • --dotfiles https://github.com/mathiasbynens/dotfiles
-E , --env
<key=value>
Optional

Environment variables that will be set in the Training Runtime.

Examples
  • --env BATCH_SIZE=32
  • --env WANDB_PROJECT=my-project-123
Optional
String

The pre-configured git config user.email in the Interactive Training Runtime, default to git config user.email in the host environment.

Examples
  • --git-author-email 'george@vandelay-industri.es'
Optional
String

The pre-configured git config user.name in the Interactive Training Runtime, defaults to git config user.name in the host environment.

Examples
  • --git-author-name 'George Costanza'
-n , --nodes
<integer>
Optional
Default Value: 1
Integer

Number of nodes to use to run the Training Job.

Examples
  • --nodes 4
Optional
Default Value: ./
String

Path to the requirements.txt file that will be used to install the dependencies in the Training Runtime.

This path is relative to the root of the repository (specified by the --repository-url flag).

Examples
  • --requirements-path path/to/requirements.txt
-S , --secret
<key=value>
Optional

Environment variables that will be set in the Training Runtime. The values of these variables are the names of Secrets (see flexai secret list).

Secrets are sensitive values like API keys, tokens, or credentials that need to be accessed by your Training Job but should not be exposed in logs or command history. When using the --secret flag, the actual secret values are retrieved from the Secrets Storage and injected into the environment at runtime.

Syntax:

  • <env_var_name>=<secret_name>

Where <env_var_name> is the name of the environment variable to set, and <secret_name> is the name of the Secret to use as the value.

Examples
  • --secret HF_TOKEN=hf-token-dev
  • --secret WANDB_API_KEY=wandb-key
Optional
Default Value: 600
Integer

Time in seconds after which the SSH session will time out.

Examples
  • --session-timeout 1200
Required
Git Repository

The URL of the Git repository containing the model’s training code.

Examples
  • --repository-url https://github.com/flexaihq/nanoGPT/
  • --repository-url https://github.com/flexaihq/nanoGPT.git
--vscode
<boolean>
Optional
Flag

Immediately open a new instance of Visual Studio Code and attach to the Interactive Training Runtime when it is ready.

Requires the Remote SSH VSCode extension 🔗 to be installed.

Examples
  • --vscode