Interactive Training Session
The flexai training debug-ssh command allows you to start an Interactive Training Job on a Training Runtime similar to that of regular Training Jobs. This command will allocate the required resources and then set up an Interactive Training Runtime you can connect to via SSH.
Having access to this debug Interactive Training Runtime will allow you to iterate quickly by enabling you to edit scripts, modify files, review logs and outputs, push your changes to your GitHub repository, and in general, have an inside out look at your Training results as they happen, before you commit to running a Training Job.
Pre-requisites
Section titled “Pre-requisites”The environment in which the flexai training debug-ssh command is run should either:
- Have an
ssh-agentrunning and SSH keys loaded into it, or - Have an SSH key pair available
Using an ssh-agent
Section titled “Using an ssh-agent”If an ssh-agent is not running and you would like to use one to be able to push your change from the Interactive Training Runtime to GitHub, run eval $(ssh-agent) in your terminal, and load your keys to it by running ssh-add <path_to_private_key>. You can then confirm the keys have been loaded using ssh-add -l and start your Interactive Training Job - it will automatically use your ssh-agent.
If SSH key pairs authenticating you to GitHub are loaded into your ssh-agent, you will be able to push your changes to GitHub from the Interactive Training Runtime by enabling the ForwardAgent option in your SSH configuration file (~/.ssh/config):
Host debug-gw.flex.ai ForwardAgent yesTo verify if an ssh-agent is running in your local environment and has your private keys loaded, run:
ssh-add -LYou can load more keys into your ssh-agent with:
ssh-add path/to/private/keyUsing SSH key pairs
Section titled “Using SSH key pairs”When not using the ssh-agent, you must set the path to the public key you will use through the --authorized-keys flag, e.g.:
flexai training debug-ssh \ --repository-url https://github.com/flexaihq/nanoGPT \ --authorized-keys ~/.ssh/id_ed25519.pub \ --vscodeStarting an Interactive Training Job
Section titled “Starting an Interactive Training Job”In its simplest form, starting an Interactive Training Job will only require running the debug-ssh subcommand with the --repository-url flag, e.g.:
flexai training debug-ssh \ --repository-url https://github.com/flexaihq/nanoGPT \ --vscodeInformation on the various stages of the Interactive Training Job will be displayed in the terminal. Once the job is ready, you will be provided with the SSH command to connect to the Interactive Training Runtime, example output:
flexai training debug-ssh \ --repository-url https://github.com/flexaihq/nanoGPT \ --vscodeInteractive training interactive-training-61a01afb-66bc-411c-a23b-d9fc566c1f9c launching...✅ Looking for an interactive training builder✅ Starting interactive training environment
[Node 0] To connect using ssh: ssh -o ForwardAgent=yes -p 36663 flexai@debug-gw.flex.ai[Node 0] To open VSCode, click the following URL (or open in your browser if not clickable): "vscode://vscode-remote/ssh-remote+flexai@debug-gw.flex.ai:36663/workspace?windowId=_blank"✅ Automatically configuring ~/.ssh/known_hostsIf the —vscode flag was used, Visual Studio Code will automatically open.
Attaching through Visual Studio Code
Section titled “Attaching through Visual Studio Code”You can also attach to the Interactive Training Runtime using VSCode. To do so, you will need to install the Remote - SSH 🔗 extension.
Once installed, you can connect to the Interactive Training Runtime by simply clicking on the VSCode URL provided in the terminal output. This will open a new VSCode window with the SSH connection to the Interactive Training Runtime already established.
The --vscode flag can be used to automatically open a VSCode window into the Interactive Training Runtime, useful if the VSCode URL is not clickable in your terminal.
Lifetime of an Interactive Training Job
Section titled “Lifetime of an Interactive Training Job”Session timeout
Section titled “Session timeout”Interactive Training Jobs are automatically stopped after the specified session timeout (can be set with --session-timeout. Defaults to 600 seconds, which is 10 minutes) if there are no active SSH session into the Interactive Training Runtime.
Stopping a running Interactive Training Job
Section titled “Stopping a running Interactive Training Job”Interactive Training Jobs can be manually stopped like regular Training Jobs by using the flexai training stop command:
flexai training stop <interactive_training_job_name>Troubleshooting
Section titled “Troubleshooting”The Interactive Training Job fails to start
Section titled “The Interactive Training Job fails to start”You can check the Interactive Training session logs, just like a regular Training Job:
flexai training logs <training_job_name>The same applies to inspecting a Training Session to get more information on its Lifecycle:
flexai training inspect <training_job_name>SSH connection issues
Section titled “SSH connection issues”When using an ssh-agent
Section titled “When using an ssh-agent”Make sure that the ssh-agent is running and has the correct keys loaded. You can verify this by running ssh-add -L and checking that the keys you expect are listed.
A common issue is that the ssh-agent in your environment is not the same as the one used by the Interactive Training Job. This can happen if you have multiple ssh-agent instances running and the SSH_AUTH_SOCK environment variable points to a different agent than the one configured inside your ssh configuration file (~/.ssh/config). When loading keys into your agent with ssh-add /path/to/your-key or verifying which keys are loaded with ssh-add -L, make sure that the SSH_AUTH_SOCK environment variable is set to the same socket path as the one used by the Interactive Training Job, which you can find by running:
ssh -G debug-gw.flex.ai | awk '/^identityagent/ {print $2}'Visual Studio Code connection issues
Section titled “Visual Studio Code connection issues”If Visual Studio Code fails to open automatically:
- Make sure you have the Remote - SSH 🔗 extension installed.
- Copy the
vscode://URL provided in the terminal output and:- In a Visual Studio instance open the command palette (You can use the Ctrl + Shift + P / Command + Shift + P key combination), then paste the
vscode://URL. - Open a terminal window, then use the
openorxdg-opencommand (depends on your system) followed by thevscode://URL you just copied:Terminal window open vscode://vscode-remote/ssh-remote+flexai@debug-gw.flex.ai:36663/workspace?windowId=_blankTerminal window xdg-open vscode://vscode-remote/ssh-remote+flexai@debug-gw.flex.ai:36663/workspace?windowId=_blank
- In a Visual Studio instance open the command palette (You can use the Ctrl + Shift + P / Command + Shift + P key combination), then paste the
Note that in order to establish the connection, you may be prompted to unlock your SSH key if it is passphrase-protected.