Checking the Fine-tuning Job's Details

Fine-tuning Summary

Using the FlexAI Console
Using the FlexAI CLI

General Details about your Training Jobs

The “All Trainings” table in the FlexAI Console provides a summary of all your Training Jobs.You can select the gear icon ⚙️ (labeled as Configure) in the Actions field of the Training Jobs list page. This will open a “Details” panel. The Details tab will be selected by default, showing all the relevant information about your Training Job.

Field	Description
`Name`	The name you assigned to the Training Job.
`Status`	The current status of the Training Job (e.g., `pending`, `scheduling`, `building`, `in progress`, `succeeded`, `failed`, `stopped`, etc.).
`Created At`	Workload creation age.

You can learn more about the different Training Job statuses on the Lifecycle page.

General Details about your Training Jobs

You can use the list command to get a table with general information about all the Training Jobs you have access to through your FlexAI account:

flexai training list

This provides an output similar to the following:

  NAME                     | DEVICE | NODE | ACCELERATOR |     DATASET     |             REPOSITORY              |  STATUS  | AGE
---------------------------+--------+------+-------------+-----------------+-------------------------------------+----------+------
quickstart-fine-tuning-job | nvidia | 1    | 1           | nanoGPT-dataset | https://github.com/flexaihq/nanogpt | building | 15s

Fine-tuning Configuration

Using the FlexAI Console
Using the FlexAI CLI

Field	Description
`Dashboard URL`	The URL of the Training Job dashboard, where you can monitor the performance and resource usage of your Training Job.
`Tensorboard Dashboard URL`	The URL of the FlexAI-hosted TensorBoard dashboard, where you can visualize the training process of your models.
`Node Count`	The number of nodes allocated to the Training Job.
`Accelerator Count`	The number of accelerators (GPUs) allocated to the Training Job.
`Repository URL`	The URL of the Git repository containing your training code.
`Repository Revision`	The specific commit or branch of the repository that was used to create the Training Job.
`Repository Revision SHA`	The SHA hash of the specific commit or branch of the repository that was used to create the Training Job.
`Entry Point`	The entry point script along with its arguments.
`Datasets`	The datasets that were attached to the Training Job.
`Environment`	The environment variables and secrets that were set for the Training Job. Displayed in a Key-Value pai format where the Key is the name of the environment value within the Training Runtime, and the value is either the raw value (for Environment Variables) or the name of the FlexAI secret containing the secret value.
`Checkpoints`	The checkpoints that were created during the Training Job. These are stored in the FlexAI object storage and can be used to resume training or to create an Inference Endpoint (depending on the type of model).

You can have a deeper look at the Training Job status using the flexai training inspect quickstart-fine-tuning-job command. Especially useful for debugging purposes:

flexai training inspect quickstart-fine-tuning-job

Below you will find an example of the output you will get when running the inspect command:

Output example:

YAML (Default)
JSON Output

kind: Training
metadata:
    name: quickstart-training-job
    id: 75179cc2-ec63-4f93-b4da-44e49ea86049
    creatorUserID: 16e289cc-c81b-4a15-91d9-0e2aae00a317
    ownerOrgId: 270a5476-b91a-442f-8a13-852ef7bb5b9c
config:
    device: nvidia
    nodes: 1
    accelerator: 1
    entrypoint:
        - train.py
        - config/train_shakespeare_char.py
        - --out_dir=/output-checkpoint
        - --max_iters=1500
    datasetsNames:
        - nanoGPT-dataset
    checkpointName: ""
    sourceName: ""
    repositoryURL: https://github.com/flexaihq/nanogpt
    repositoryRevision: main
    secrets: []
    environment: []
runtime:
    status: succeeded
    queuePosition: 0
    repositoryRevisionSha: 116799dbae7b0fe33caf1b90f73a72f84bc32adc
    selectedAgentId: k8s-training-sesterce-001-CLIENT-PROD-client-prod
    lifecycleEvents:
        - type: AgentSelection
          status: ResponseReceived
          message: |-
            Cluster Scheduling result{
              Name: aws-cloud
              AgentID: k8s-training-aws-001-CLIENT-PROD-client-prod
              Response: NoAnswer
              Conditions: [NonSchedulable: NoAnswer]
            }
          raisedAt: "2025-06-30T11:41:54Z"
        - type: AgentSelection
          status: ResponseReceived
          message: |-
            Cluster Scheduling result{
              Name: sesterce-h100-bm-01
              AgentID: k8s-training-sesterce-001-CLIENT-PROD-client-prod
              Response: OK
              Conditions: []
            }
          raisedAt: "2025-06-30T11:41:54Z"
        - type: AgentSelection
          status: ResponseReceived
          message: |-
            Cluster Scheduling result{
              Name: sesterce-h200-bm-01
              AgentID: k8s-training-sesterce-002-CLIENT-PROD-client-prod
              Response: NoAnswer
              Conditions: [NonSchedulable: NoAnswer]
            }
          raisedAt: "2025-06-30T11:41:54Z"
        - type: AgentSelection
          status: ResponseReceived
          message: |-
            Cluster Scheduling result{
              Name: sesterce-l40s-bm-01
              AgentID: k8s-training-sesterce-003-CLIENT-PROD-client-prod
              Response: NoAnswer
              Conditions: [NonSchedulable: NoAnswer]
            }
          raisedAt: "2025-06-30T11:41:54Z"
        - type: AgentSelection
          status: ResponseReceived
          message: |-
            Cluster Scheduling result{
              Name: sesterce-a100-bm-01
              AgentID: k8s-training-sesterce-004-CLIENT-PROD-client-prod
              Response: NoAnswer
              Conditions: [NonSchedulable: NoAnswer]
            }
          raisedAt: "2025-06-30T11:41:54Z"
        - type: AgentSelection
          status: ResponseReceived
          message: |-
            Cluster Scheduling result{
              Name: k8s-training-smc-001
              AgentID: k8s-training-smc-001-CLIENT-PROD-client-prod
              Response: NoAnswer
              Conditions: [NonSchedulable: NoAnswer, OrgNotAuthorized]
            }
          raisedAt: "2025-06-30T11:41:54Z"
        - type: AgentSelection
          status: Completed
          message: Selected agent k8s-training-sesterce-001-CLIENT-PROD-client-prod
          raisedAt: "2025-06-30T11:41:54Z"
        - type: BuildSubmission
          status: Succeeded
          message: Build request sent to flex-agent
          raisedAt: "2025-06-30T11:41:54Z"
        - type: BuildExecution
          status: Succeeded
          message: Build completed with image rg.fr-par.scw.cloud/paas-trainings-client-prod/9f9c379c-8d46-419b-8bf5-d0b0986a6dd9-arch_nvidia-1x1@sha256:0d854f75f698a549d2a8a0e024e930383b885bdac2863ee0cf74ebdc8a8f358c
          raisedAt: "2025-06-30T11:41:54Z"
        - type: TrainingPreparation
          status: Succeeded
          message: Training trainings-client-prod/training-75b79cc2-ec63-4f93-b4da-44e49a4a6049-zqg6d created
          raisedAt: "2025-06-30T11:41:54Z"
        - type: TrainingExecution
          status: InProgress
          message: Training in progress
          raisedAt: "2025-06-30T11:42:00Z"
        - type: TrainingExecution
          status: Succeeded
          message: Training complete, output available
          raisedAt: "2025-06-30T11:43:48Z"
    createdAt: "2025-06-30T11:41:54Z"
    lastUpdate: "2025-06-30T11:43:48Z"

{
  "kind": "Training",
  "metadata": {
    "name": "quickstart-training-job",
    "id": "75179cc2-ec63-4f93-b4da-44e49ea86049",
    "creatorUserID": "16e2894c-c81b-4a15-91d9-0e2aae00a317",
    "ownerOrgID": "108dddec-e922-49b8-a466-4d7ed5dcc746"
  },
  "config": {
    "device": "nvidia",
    "nodes": 1,
    "accelerator": 1,
    "entrypoint": [
      "train.py",
      "config/train_shakespeare_char.py",
      "--out_dir=/output-checkpoint",
      "--max_iters=1500"
    ],
    "datasetsNames": [
      "nanoGPT-dataset"
    ],
    "checkpointName": "",
    "sourceName": "",
    "repositoryURL": "https://github.com/flexaihq/nanogpt",
    "repositoryRevision": "main",
    "secrets": [],
    "environment": []
  },
  "runtime": {
    "status": "succeeded",
    "queuePosition": 0,
    "repositoryRevisionSha": "116799dbae7b0fe33caf1b90f73a72f84bc32adc",
    "selectedAgentId": "k8s-training-sesterce-001-CLIENT-PROD-client-prod",
    "lifecycleEvents": [
      {
        "type": "AgentSelection",
        "status": "ResponseReceived",
        "message": "Cluster Scheduling result{\n  Name: aws-cloud\n  AgentID: k8s-training-aws-001-CLIENT-PROD-client-prod\n  Response: NoAnswer\n  Conditions: [NonSchedulable: NoAnswer]\n}",
        "raisedAt": "2025-06-30T11:41:54Z"
      },
      {
        "type": "AgentSelection",
        "status": "ResponseReceived",
        "message": "Cluster Scheduling result{\n  Name: sesterce-h100-bm-01\n  AgentID: k8s-training-sesterce-001-CLIENT-PROD-client-prod\n  Response: OK\n  Conditions: []\n}",
        "raisedAt": "2025-06-30T11:41:54Z"
      },
      {
        "type": "AgentSelection",
        "status": "ResponseReceived",
        "message": "Cluster Scheduling result{\n  Name: sesterce-h200-bm-01\n  AgentID: k8s-training-sesterce-002-CLIENT-PROD-client-prod\n  Response: NoAnswer\n  Conditions: [NonSchedulable: NoAnswer]\n}",
        "raisedAt": "2025-06-30T11:41:54Z"
      },
      {
        "type": "AgentSelection",
        "status": "ResponseReceived",
        "message": "Cluster Scheduling result{\n  Name: sesterce-l40s-bm-01\n  AgentID: k8s-training-sesterce-003-CLIENT-PROD-client-prod\n  Response: NoAnswer\n  Conditions: [NonSchedulable: NoAnswer]\n}",
        "raisedAt": "2025-06-30T11:41:54Z"
      },
      {
        "type": "AgentSelection",
        "status": "ResponseReceived",
        "message": "Cluster Scheduling result{\n  Name: sesterce-a100-bm-01\n  AgentID: k8s-training-sesterce-004-CLIENT-PROD-client-prod\n  Response: NoAnswer\n  Conditions: [NonSchedulable: NoAnswer]\n}",
        "raisedAt": "2025-06-30T11:41:54Z"
      },
      {
        "type": "AgentSelection",
        "status": "ResponseReceived",
        "message": "Cluster Scheduling result{\n  Name: k8s-training-smc-001\n  AgentID: k8s-training-smc-001-CLIENT-PROD-client-prod\n  Response: NoAnswer\n  Conditions: [NonSchedulable: NoAnswer, OrgNotAuthorized]\n}",
        "raisedAt": "2025-06-30T11:41:54Z"
      },
      {
        "type": "AgentSelection",
        "status": "Completed",
        "message": "Selected agent k8s-training-sesterce-001-CLIENT-PROD-client-prod",
        "raisedAt": "2025-06-30T11:41:54Z"
      },
      {
        "type": "BuildSubmission",
        "status": "Succeeded",
        "message": "Build request sent to flex-agent",
        "raisedAt": "2025-06-30T11:41:54Z"
      },
      {
        "type": "BuildExecution",
        "status": "Succeeded",
        "message": "Build completed with image rg.fr-par.scw.cloud/paas-trainings-client-prod/9f9c379c-8d46-419b-8bf5-d0b0986a6dd9-arch_nvidia-1x1@sha256:0d854f75f698a549d2a8a0e024e930383b885bdac2863ee0cf74ebdc8a8f358c",
        "raisedAt": "2025-06-30T11:41:54Z"
      },
      {
        "type": "TrainingPreparation",
        "status": "Succeeded",
        "message": "Training trainings-client-prod/training-75b79cc2-ec63-4f93-b4da-44e49a4a6049-zqg6d created",
        "raisedAt": "2025-06-30T11:41:54Z"
      },
      {
        "type": "TrainingExecution",
        "status": "InProgress",
        "message": "Training in progress",
        "raisedAt": "2025-06-30T11:42:00Z"
      },
      {
        "type": "TrainingExecution",
        "status": "Succeeded",
        "message": "Training complete, output available",
        "raisedAt": "2025-06-30T11:43:48Z"
      }
    ],
    "createdAt": "2025-06-30T11:41:54Z",
    "lastUpdate": "2025-06-30T11:43:48Z"
  }
}

Next Steps

Once a Fine-tuning Job is running, you can monitor its progress by checking its logs and leveraging the FlexAI Observability Services. You’ll learn more about this in the next step of the Quickstart Tutorial: Monitoring a Fine-tuning Job’s Progress.

Getting Started

Inference

Fine-tuning

Training

Platform Services

Interactive Development

CLI

Console

Best Practices

FAQ

Checking the Fine-tuning Job's Details

Fine-tuning Summary

General Details about your Training Jobs

General Details about your Training Jobs

Fine-tuning Configuration

Next Steps

Getting Started

Inference

Fine-tuning

Training

Platform Services

Interactive Development

CLI

Console

Best Practices

FAQ

​Fine-tuning Summary

​General Details about your Training Jobs

​General Details about your Training Jobs

​Fine-tuning Configuration

​Next Steps

Fine-tuning Summary

General Details about your Training Jobs

General Details about your Training Jobs

Fine-tuning Configuration

Next Steps