Skip to content

Interactive Training Session

The flexai training debug-ssh command allows you to start an Interactive Training Job on a Training Runtime similar to that of regular Training Jobs. This command will allocate the required resources and then set up an Interactive Training Runtime you can connect to via SSH.

Having access to this debug Interactive Training Runtime will allow you to iterate quickly by enabling you to edit scripts, modify files, review logs and outputs, push your changes to your GitHub repository, and in general, have an inside out look at your Training results as they happen, before you commit to running a Training Job.

The environment in which the flexai training debug-ssh command is run should either:

  • Have an ssh-agent running and SSH keys loaded into it, or
  • Have an SSH key pair available

If an ssh-agent is not running and you would like to use one to be able to push your change from the Interactive Training Runtime to GitHub, run eval $(ssh-agent) in your terminal, and load your keys to it by running ssh-add <path_to_private_key>. You can then confirm the keys have been loaded using ssh-add -l and start your Interactive Training Job - it will automatically use your ssh-agent.

If SSH key pairs authenticating you to GitHub are loaded into your ssh-agent, you will be able to push your changes to GitHub from the Interactive Training Runtime by enabling the ForwardAgent option in your SSH configuration file (~/.ssh/config):

~/.ssh/config
Host debug-gw.flex.ai
ForwardAgent yes

To verify if an ssh-agent is running in your local environment and has your private keys loaded, run:

Terminal window
ssh-add -L

You can load more keys into your ssh-agent with:

Terminal window
ssh-add path/to/private/key

When not using the ssh-agent, you must set the path to the public key you will use through the --authorized-keys flag, e.g.:

Terminal window
flexai training debug-ssh --repository-url https://github.com/flexaihq/nanoGPT --vscode --authorized-keys ~/.ssh/id_ed25519.pub

In its simplest form, starting an Interactive Training Job will only require running the debug-ssh subcommand with the --repository-url flag, e.g.:

Terminal window
flexai training debug-ssh --repository-url https://github.com/flexaihq/nanoGPT --vscode

Information on the various stages of the Interactive Training Job will be displayed in the terminal. Once the job is ready, you will be provided with the SSH command to connect to the Interactive Training Runtime, example output:

Terminal window
Interactive training interactive-training-9fe8631b-aa6f-4f92-8ed4-b8a16df810b5 launching...
✅ Looking for an interactive training builder
[Node 0] To connect using ssh:
ssh -o ForwardAgent=yes -p 44417 flexai@debug-gw.flex.ai
[Node 0] To open VSCode, click the following URL (or open in your browser if not clickable):
"vscode://vscode-remote/ssh-remote+flexai@debug-gw.flex.ai:44417/workspace?windowId=_blank"
✅ Automatically configuring ~/.ssh/known_hosts

You can also attach to the Interactive Training Runtime using VSCode. To do so, you will need to install the Remote - SSH 🔗 extension.

Once installed, you can connect to the Interactive Training Runtime by simply clicking on the VSCode URL provided in the terminal output. This will open a new VSCode window with the SSH connection to the Interactive Training Runtime already established.

The --vscode flag can be used to automatically open a VSCode window into the Interactive Training Runtime, useful if the VSCode URL is not clickable in your terminal.

Interactive Training Jobs are automatically stopped after the specified session timeout (can be set with --session-timeout. Defaults to 600 seconds, which is 10 minutes) if there are no active SSH session into the Interactive Training Runtime.

Stopping a running Interactive Training Job

Section titled “Stopping a running Interactive Training Job”

Interactive Training Jobs can be manually stopped like regular Training Jobs by using the flexai training stop command:

Terminal window
flexai training stop <interactive_training_job_name>

The Interactive Training Job fails to start

Section titled “The Interactive Training Job fails to start”

You can check the Interactive Training session logs, just like a regular Training Job:

Terminal window
flexai training logs <training_job_name>

The same applies to inspecting a Training Session to get more information on its Lifecycle:

Terminal window
flexai training inspect <training_job_name>

Make sure that the ssh-agent is running and has the correct keys loaded. You can verify this by running ssh-add -L and checking that the keys you expect are listed.

A common issue is that the ssh-agent in your environment is not the same as the one used by the Interactive Training Job. This can happen if you have multiple ssh-agent instances running and the SSH_AUTH_SOCK environment variable points to a different agent than the one configured inside your ssh configuration file (~/.ssh/config). When loading keys into your agent with ssh-add /path/to/your-key or verifying which keys are loaded with ssh-add -L, make sure that the SSH_AUTH_SOCK environment variable is set to the same socket path as the one used by the Interactive Training Job, which you can find by running:

Terminal window
ssh -G debug-gw.flex.ai | awk '/^identityagent/ {print $2}'