Skip to content

training debug-ssh

Starts an Interactive Training Job that allows connecting through SSH or VSCode to the Training Runtime, useful for fast test iterations.

The —vscode flag is optional, but highly recommended to leverage the full potential of the Interactive Training Runtime.

flexai training debug-ssh \
--repository-url <repository_url> \
[--dataset <dataset_name>...] \
[--repository-revision <branch_name>] \
[--checkpoint <checkpoint_id>] \
[--env <env_var>=<value>...] \
[--secret <env_var>=<secret_value>...] \
[--nodes <node_count>] \
[--accels <accelerator_count>] \
[--device-arch <device_architecture>] \
[--git-author-name <git_author_name>] \
[--git-author-email <git_author_email>] \
[--dotfiles <dotfiles_repository>] \
[--authorized-keys <ssh_public_key>] \
[--session-timeout <timeout_in_seconds>] \
[--vscode]
-a , --accels
<integer>
Optional
Default Value: 1
Integer

Number of accelerators/GPUs to use.

--affinity
<key=value>
Optional
Optional
Default Value: gathered from ssh-agent

List of SSH public keys to allow connecting to the interactive environment. If not provided, keys will be gathered from the local ssh-agent, if available.

--build-secret
<key=value>
Optional

FlexAI Secrets to make available during the image build process. Format: <flexai_secret_name>=<environment_variable_name>

Examples
  • --build-secret build_config_secret=SECRET_ENV_VAR_TO_USE
Optional

A Checkpoint to serve mount on the runtime environment.

The name of a previously pushed Checkpoint. Use flexai checkpoint list to see available Checkpoints.

Examples
  • --checkpoint Mixtral-8x7B-v0_1
  • --checkpoint gemma-3n-E4B-it
Optional

Dataset to mount on the runtime environment

Examples
  • --dataset open_web
  • --dataset fineweb-edu

Datasets to mount on the runtime environment using a custom mount path

Examples
  • --dataset open_web=data/train/ow --dataset fineweb-edu=/data/train/fineweb-edu
Optional
Default Value: nvidia
Option list

The architecture of the device to run the Inference Endpoint on.

One of:

  • nvidia
  • amd
  • tt
Examples
  • --device-arch nvidia
--dotfiles
<string>
Optional
Path

Github dotfiles repository URL that will be installed in the home directory of the interactive environment.

Examples
  • --dotfiles https://github.com/funnierinspanish/dotfiles.git
-E , --env
<key=value>
Optional

Environment variables to set in the interactive environment.

Examples
  • --env WANDB_ENTITY=georgec123 --env WANDB_PROJECT=gppt-j
Optional
Default Value: Git config user.email value
String

The Git commit author email to use in the interactive training environment.

Examples
  • diego@flex.ai
  • george@vandelay-industries.biz
Optional
Default Value: Git config user.name value
String

The Git commit author name to use in the interactive training environment.

-h , --help
<boolean>
Optional
Flag

Displays this help page.

--no-queuing
<boolean>
Optional
Flag

Disables queuing for this workload: If no resources are available, the workload will fail immediately instead of waiting for resources to become available.

-n , --nodes
<integer>
Optional
Default Value: 1
Integer

The number of nodes across which to distribute the workload.

Selecting more than 1 node will overwrite the value provided in the —accels flag to 8 accelerator per node.

Examples
  • --nodes 1
  • --nodes 4
Default Value: main
String

The branch name of the repository. main by default.

Examples
  • --repository-revision secondary
  • --repository-revision testing
UUID

A commit SHA hash to use.

Examples
  • --repository-revision 9fceb02
  • --repository-revision e5bd391
String

A tag name to use.

Examples
  • --repository-revision v1.0.0
  • --repository-revision release-2024
Optional
URL

Git repository URL containing code to mount on the workload environment.

Will be mounted on the /workspace directory.

Examples
  • --repository-url https://github.com/flexaihq/nanoGPT/
  • --repository-url https://github.com/flexaihq/nanoGPT.git
Optional
Path

Path to a pip requirements.txt file in the repository.

Examples
  • --requirements-path code/project/requirements.txt
Optional
String

Name of the runtime to use

-S , --secret
<key=value>
Optional

Environment variables that will be set in the Training or Fine-tuning Runtime.

Secrets are sensitive values like API keys, tokens, or credentials that need to be accessed by your Training Job but should not be exposed in logs or command history. When using the —secret flag, the actual secret values are retrieved from the Secrets Storage and injected into the environment at runtime.

Syntax:

  • <env_var_name>=<flexai_secret_name>

Where <env_var_name> is the name of the environment variable to set, and <flexai_secret_name> is the name of the Secret containing the sensitive value.

Examples
  • --secret HF_TOKEN=hf-token-dev
  • --secret WANDB_API_KEY=wandb-key
Optional
Default Value: 600
Integer

Timeout in seconds after which the interactive training session will be stopped if no activity is detected.

Examples
  • --session-timeout 666
  • --session-timeout 3600
Optional
Flag

Provides more detailed output when running a debug-ssh session.

--vscode
<boolean>
Optional
Flag

Opens the Visual Studio Code editor connected to the runtime environment via SSH. If not installed, the runtime will still be started and accessible via SSH.