Interactive Training Session
The flexai training debug-ssh
command allows you to start an Interactive Training Job on a Training Runtime similar to that of regular Training Jobs. This command will allocate the required resources and then set up an Interactive Training Runtime you can connect to via SSH.
Having access to this debug Interactive Training Runtime will allow you to iterate quickly by enabling you to edit scripts, modify files, review logs and outputs, push your changes to your GitHub repository, and in general, have an inside out look at your Training results as they happen, before you commit to running a Training Job.
Pre-requisites
The environment in which the flexai training debug-ssh
command is run should either:
- Have an
ssh-agent
running and SSH keys loaded into it, or - Have an SSH key pair available
Using an ssh-agent
If an ssh-agent
is not running and you would like to use one to be able to push your change from the Interactive Training Runtime to GitHub, run eval $(ssh-agent)
in your terminal, and load your keys to it by running ssh-add <path_to_private_key>
. You can then confirm the keys have been loaded using ssh-add -l
and start your Interactive Training Job - it will automatically use your ssh-agent
.
If SSH key pairs authenticating you to GitHub are loaded into your ssh-agent
, you will be able to push your changes to GitHub from the Interactive Training Runtime by enabling the ForwardAgent
option in your SSH configuration file (~/.ssh/config
):
Host debug-gw.flex.ai
ForwardAgent yes
To verify if an ssh-agent
is running in your local environment and has your private keys loaded, run:
ssh-add -L
You can load more keys into your ssh-agent
with:
ssh-add path/to/private/key
ssh-add
relies on the environment variable SSH_AUTH_SOCK
being correctly configured. If you are overriding it through the IdentityAgent
ssh config option in .ssh/config
, make sure to set the SSH_AUTH_SOCK
environment variable to the same socket path when running the aforementioned ssh-add
commands. Failing to do, ssh-add
will load keys into the wrong ssh-agent
.
Using SSH key pairs
When not using the ssh-agent
, you must set the path to the public key you will use through the --authorized-keys
flag, e.g.:
flexai training debug-ssh --repository-url https://github.com/flexaihq/nanoGPT --repository-revision flexai-main --vscode --authorized-keys ~/.ssh/id_ed25519.pub
If the path to your public key is non-standard (i.e. neither of ~/.ssh/id_rsa
, ~/.ssh/id_ecdsa
, ~/.ssh/id_ecdsa_sk
, ~/.ssh/id_ed25519
, ~/.ssh/id_ed25519_sk
or ~/.ssh/id_dsa
), you will need to pass an extra -i <path_to_corresponding_private_key>
flag to the ssh
command when connecting to the training environment.
Starting an Interactive Training Job
In its simplest form, starting an Interactive Training Job will only require running the debug-ssh
subcommand with the --repository-url
flag, e.g.:
flexai training debug-ssh --repository-url https://github.com/flexaihq/nanoGPT --repository-revision flexai-main --vscode
Information on the various stages of the Interactive Training Job will be displayed in the terminal. Once the job is ready, you will be provided with the SSH command to connect to the Interactive Training Runtime, example output:
Interactive training interactive-training-618a7339-0412-463c-b864-81063d55f809 launching...
✅ Looking for an interactive training builder
[Node 0] To connect using ssh:
ssh -o ForwardAgent=yes -p 44417 flexai@debug-gw.flex.ai
[Node 0] To open VSCode, click the following URL (or open in your browser if not clickable):
"vscode://vscode-remote/ssh-remote+flexai@debug-gw.flex.ai:44417/workspace?windowId=_blank"
✅ Automatically configuring ~/.ssh/known_hosts
Attaching through VSCode
You can also attach to the Interactive Training Runtime using VSCode. To do so, you will need to install the Remote - SSH extension.
Once installed, you can connect to the Interactive Training Runtime by simply clicking on the VSCode URL provided in the terminal output. This will open a new VSCode window with the SSH connection to the Interactive Training Runtime already established.
The --vscode
flag can be used to automatically open a VSCode window into the Interactive Training Runtime, useful if the VSCode URL is not clickable in your terminal.
Lifetime of an Interactive Training Job
Session timeout
Interactive Training Jobs are automatically stopped after the specified session timeout (can be set with --session-timeout
. Defaults to 600 seconds, which is 10 minutes) if there are no active SSH session into the Interactive Training Runtime.
Stopping a running Interactive Training Job
Interactive Training Jobs can be manually stopped like regular Training Jobs by using the flexai training stop
command:
flexai training stop <interactive_training_name>
Troubleshooting
The Interactive Training Job fails to start
You can check the Interactive Training session logs, just like a regular Training Job:
flexai training logs <training_job_name>
The same applies to inspecting a Training Session to get more information on its Lifecycle:
flexai training inspect <training_job_name>
SSH connection issues
When using an ssh-agent
Make sure that the ssh-agent
is running and has the correct keys loaded. You can verify this by running ssh-add -L
and checking that the keys you expect are listed.
A common issue is that the ssh-agent
in your environment is not the same as the one used by the Interactive Training Job. This can happen if you have multiple ssh-agent
instances running and the SSH_AUTH_SOCK
environment variable points to a different agent than the one configured inside your ssh configuration file (~/.ssh/config
). When loading keys into your agent with ssh-add /path/to/your-key
or verifying which keys are loaded with ssh-add -L
, make sure that the SSH_AUTH_SOCK
environment variable is set to the same socket path as the one used by the Interactive Training Job, which you can find by running:
ssh -G debug-gw.flex.ai | awk '/^identityagent/ {print $2}'