Checking the Fine-tuning Job's Details
Fine-tuning Summary
General Details about your Training Jobs
The “All Trainings” table in the FlexAI Console provides a summary of all your Training Jobs.
You can select the gear icon ⚙️ (labeled as Configure) in the Actions field of the Training Jobs list page. This will open a “Details” panel. The Details tab will be selected by default, showing all the relevant information about your Training Job.
| Field | Description |
|---|---|
Name | The name you assigned to the Training Job. |
Status | The current status of the Training Job (e.g., pending, scheduling, building, in progress, succeeded, failed, stopped, etc.). |
Created At | Workload creation age. |
General Details about your Training Jobs
You can use the list command to get a table with general information about all the Training Jobs you have access to through your FlexAI account:
flexai training listThis provides an output similar to the following:
name | device | node | accelerator | dataset | repository | status | age---------------------------+--------+------+-------------+-----------------+-------------------------------------+----------+------quickstart-fine-tuning-job | nvidia | 1 | 1 | nanogpt-dataset | https://github.com/flexaihq/nanogpt | building | 15sFine-tuning Configuration
| Field | Description |
|---|---|
Dashboard URL | The URL of the Training Job dashboard, where you can monitor the performance and resource usage of your Training Job. |
Tensorboard Dashboard URL | The URL of the FlexAI-hosted TensorBoard dashboard, where you can visualize the training process of your models. |
Node Count | The number of nodes allocated to the Training Job. |
Accelerator Count | The number of accelerators (GPUs) allocated to the Training Job. |
Repository URL | The URL of the Git repository containing your training code. |
Repository Revision | The specific commit or branch of the repository that was used to create the Training Job. |
Repository Revision SHA | The SHA hash of the specific commit or branch of the repository that was used to create the Training Job. |
Entry Point | The entry point script along with its arguments. |
Datasets | The datasets that were attached to the Training Job. |
Environment | The environment variables and secrets that were set for the Training Job. Displayed in a Key-Value pai format where the Key is the name of the environment value within the Training Runtime, and the value is either the raw value (for Environment Variables) or the name of the FlexAI secret containing the secret value. |
Checkpoints | The checkpoints that were created during the Training Job. These are stored in the FlexAI object storage and can be used to resume training or to create an Inference Endpoint (depending on the type of model). |
flexai training inspect quickstart-fine-tuning-jobOutput example:
kind: Trainingmetadata: name: quickstart-training-job id: 75179cc2-ec63-4f93-b4da-44e49ea86049 creatorUserID: 16e289cc-c81b-4a15-91d9-0e2aae00a317 ownerOrgId: 270a5476-b91a-442f-8a13-852ef7bb5b9cconfig: device: nvidia nodes: 1 accelerator: 1 entrypoint: - train.py - config/train_shakespeare_char.py - --out_dir=/output-checkpoint - --max_iters=1500 datasetsNames: - nanoGPT-dataset checkpointName: "" sourceName: "" repositoryURL: https://github.com/flexaihq/nanogpt repositoryRevision: main secrets: [] environment: []runtime: status: succeeded queuePosition: 0 repositoryRevisionSha: 116799dbae7b0fe33caf1b90f73a72f84bc32adc selectedAgentId: k8s-training-sesterce-001-CLIENT-PROD-client-prod lifecycleEvents: - type: AgentSelection status: ResponseReceived message: |- Cluster Scheduling result{ Name: aws-cloud AgentID: k8s-training-aws-001-CLIENT-PROD-client-prod Response: NoAnswer Conditions: [NonSchedulable: NoAnswer] } raisedAt: "2025-06-30T11:41:54Z" - type: AgentSelection status: ResponseReceived message: |- Cluster Scheduling result{ Name: sesterce-h100-bm-01 AgentID: k8s-training-sesterce-001-CLIENT-PROD-client-prod Response: OK Conditions: [] } raisedAt: "2025-06-30T11:41:54Z" - type: AgentSelection status: ResponseReceived message: |- Cluster Scheduling result{ Name: sesterce-h200-bm-01 AgentID: k8s-training-sesterce-002-CLIENT-PROD-client-prod Response: NoAnswer Conditions: [NonSchedulable: NoAnswer] } raisedAt: "2025-06-30T11:41:54Z" - type: AgentSelection status: ResponseReceived message: |- Cluster Scheduling result{ Name: sesterce-l40s-bm-01 AgentID: k8s-training-sesterce-003-CLIENT-PROD-client-prod Response: NoAnswer Conditions: [NonSchedulable: NoAnswer] } raisedAt: "2025-06-30T11:41:54Z" - type: AgentSelection status: ResponseReceived message: |- Cluster Scheduling result{ Name: sesterce-a100-bm-01 AgentID: k8s-training-sesterce-004-CLIENT-PROD-client-prod Response: NoAnswer Conditions: [NonSchedulable: NoAnswer] } raisedAt: "2025-06-30T11:41:54Z" - type: AgentSelection status: ResponseReceived message: |- Cluster Scheduling result{ Name: k8s-training-smc-001 AgentID: k8s-training-smc-001-CLIENT-PROD-client-prod Response: NoAnswer Conditions: [NonSchedulable: NoAnswer, OrgNotAuthorized] } raisedAt: "2025-06-30T11:41:54Z" - type: AgentSelection status: Completed message: Selected agent k8s-training-sesterce-001-CLIENT-PROD-client-prod raisedAt: "2025-06-30T11:41:54Z" - type: BuildSubmission status: Succeeded message: Build request sent to flex-agent raisedAt: "2025-06-30T11:41:54Z" - type: BuildExecution status: Succeeded message: Build completed with image rg.fr-par.scw.cloud/paas-trainings-client-prod/9f9c379c-8d46-419b-8bf5-d0b0986a6dd9-arch_nvidia-1x1@sha256:0d854f75f698a549d2a8a0e024e930383b885bdac2863ee0cf74ebdc8a8f358c raisedAt: "2025-06-30T11:41:54Z" - type: TrainingPreparation status: Succeeded message: Training trainings-client-prod/training-75b79cc2-ec63-4f93-b4da-44e49a4a6049-zqg6d created raisedAt: "2025-06-30T11:41:54Z" - type: TrainingExecution status: InProgress message: Training in progress raisedAt: "2025-06-30T11:42:00Z" - type: TrainingExecution status: Succeeded message: Training complete, output available raisedAt: "2025-06-30T11:43:48Z" createdAt: "2025-06-30T11:41:54Z" lastUpdate: "2025-06-30T11:43:48Z"{ "kind": "Training", "metadata": { "name": "quickstart-training-job", "id": "75179cc2-ec63-4f93-b4da-44e49ea86049", "creatorUserID": "16e2894c-c81b-4a15-91d9-0e2aae00a317", "ownerOrgID": "108dddec-e922-49b8-a466-4d7ed5dcc746" }, "config": { "device": "nvidia", "nodes": 1, "accelerator": 1, "entrypoint": [ "train.py", "config/train_shakespeare_char.py", "--out_dir=/output-checkpoint", "--max_iters=1500" ], "datasetsNames": [ "nanoGPT-dataset" ], "checkpointName": "", "sourceName": "", "repositoryURL": "https://github.com/flexaihq/nanogpt", "repositoryRevision": "main", "secrets": [], "environment": [] }, "runtime": { "status": "succeeded", "queuePosition": 0, "repositoryRevisionSha": "116799dbae7b0fe33caf1b90f73a72f84bc32adc", "selectedAgentId": "k8s-training-sesterce-001-CLIENT-PROD-client-prod", "lifecycleEvents": [ { "type": "AgentSelection", "status": "ResponseReceived", "message": "Cluster Scheduling result{\n Name: aws-cloud\n AgentID: k8s-training-aws-001-CLIENT-PROD-client-prod\n Response: NoAnswer\n Conditions: [NonSchedulable: NoAnswer]\n}", "raisedAt": "2025-06-30T11:41:54Z" }, { "type": "AgentSelection", "status": "ResponseReceived", "message": "Cluster Scheduling result{\n Name: sesterce-h100-bm-01\n AgentID: k8s-training-sesterce-001-CLIENT-PROD-client-prod\n Response: OK\n Conditions: []\n}", "raisedAt": "2025-06-30T11:41:54Z" }, { "type": "AgentSelection", "status": "ResponseReceived", "message": "Cluster Scheduling result{\n Name: sesterce-h200-bm-01\n AgentID: k8s-training-sesterce-002-CLIENT-PROD-client-prod\n Response: NoAnswer\n Conditions: [NonSchedulable: NoAnswer]\n}", "raisedAt": "2025-06-30T11:41:54Z" }, { "type": "AgentSelection", "status": "ResponseReceived", "message": "Cluster Scheduling result{\n Name: sesterce-l40s-bm-01\n AgentID: k8s-training-sesterce-003-CLIENT-PROD-client-prod\n Response: NoAnswer\n Conditions: [NonSchedulable: NoAnswer]\n}", "raisedAt": "2025-06-30T11:41:54Z" }, { "type": "AgentSelection", "status": "ResponseReceived", "message": "Cluster Scheduling result{\n Name: sesterce-a100-bm-01\n AgentID: k8s-training-sesterce-004-CLIENT-PROD-client-prod\n Response: NoAnswer\n Conditions: [NonSchedulable: NoAnswer]\n}", "raisedAt": "2025-06-30T11:41:54Z" }, { "type": "AgentSelection", "status": "ResponseReceived", "message": "Cluster Scheduling result{\n Name: k8s-training-smc-001\n AgentID: k8s-training-smc-001-CLIENT-PROD-client-prod\n Response: NoAnswer\n Conditions: [NonSchedulable: NoAnswer, OrgNotAuthorized]\n}", "raisedAt": "2025-06-30T11:41:54Z" }, { "type": "AgentSelection", "status": "Completed", "message": "Selected agent k8s-training-sesterce-001-CLIENT-PROD-client-prod", "raisedAt": "2025-06-30T11:41:54Z" }, { "type": "BuildSubmission", "status": "Succeeded", "message": "Build request sent to flex-agent", "raisedAt": "2025-06-30T11:41:54Z" }, { "type": "BuildExecution", "status": "Succeeded", "message": "Build completed with image rg.fr-par.scw.cloud/paas-trainings-client-prod/9f9c379c-8d46-419b-8bf5-d0b0986a6dd9-arch_nvidia-1x1@sha256:0d854f75f698a549d2a8a0e024e930383b885bdac2863ee0cf74ebdc8a8f358c", "raisedAt": "2025-06-30T11:41:54Z" }, { "type": "TrainingPreparation", "status": "Succeeded", "message": "Training trainings-client-prod/training-75b79cc2-ec63-4f93-b4da-44e49a4a6049-zqg6d created", "raisedAt": "2025-06-30T11:41:54Z" }, { "type": "TrainingExecution", "status": "InProgress", "message": "Training in progress", "raisedAt": "2025-06-30T11:42:00Z" }, { "type": "TrainingExecution", "status": "Succeeded", "message": "Training complete, output available", "raisedAt": "2025-06-30T11:43:48Z" } ], "createdAt": "2025-06-30T11:41:54Z", "lastUpdate": "2025-06-30T11:43:48Z" }}Next Steps
Once a Fine-tuning Job is running, you can monitor its progress by checking its logs and leveraging the FlexAI Observability Services.
You'll learn more about this in the next step of the Quickstart Tutorial: Monitoring a Fine-tuning Job's Progress.