> ## Documentation Index
> Fetch the complete documentation index at: https://docs.flex.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Creating a Fine-tuning Job

> Create and configure a new fine-tuning job on FlexAI

With a Dataset available on your FlexAI account, you can now create a Fine-tuning Job that will use it.

## The Model's repository

For this tutorial we will use the [**FlexAI fork of the nanoGPT repository**](https://github.com/flexaihq/nanogpt), originally created by [Andrej Karpathy](https://github.com/karpathy).

A Fine-tuning Job requires at least a **Name**, a link to a **GitHub repository** where its code resides, and the ***path to the entry point script*** that will initiate the Workload.

In addition, the *entry point script* can be followed by any arguments required, such as configuration files or Hyperparameters.

### Entry Point script arguments

The entry point script path for this quickstart tutorial is [`./train.py`](https://github.com/flexaihq/nanoGPT/blob/main/train.py), and it expects the following arguments:

* `config/train_shakespeare_char.py`: A configuration file, which contains the default Workload Parameters.
* `--dataset_dir`: The path within the `/input` directory of the Workload Runtime where the Dataset files are located.
* `--out_dir`: The output directory, which will be mounted into the Workload Runtime as `/output-checkpoint`.
* `--max_iters`: The maximum number of iterations to run the Workload script for (optional).

<Accordion title="Entry Point script arguments details">
  These include any **Environment Settings** and **Hyperparameters** the entry point script may require. For this tutorial:

  | Parameter                          | Type                | Description                                                                                                                                                                                    |
  | ---------------------------------- | ------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
  | `config/train_shakespeare_char.py` | Environment Setting | A positional argument pointing to a configuration file used by nanoGPT's `train.py` script to set the default Workload Parameters                                                              |
  | `--out_dir=/output-checkpoint`     | Environment Setting | The output directory where the Workload script will write checkpoint files. In order to take advantage of FlexAI's Managed Checkpoints feature, this **`should always be /output-checkpoint`** |
  | `--max_iters=1500`                 | Hyperparameter      | The maximum number of iterations to run the Workload script for. This is an optional hyperparameter that can be used to tweak the Workload execution                                           |
</Accordion>

## Starting a new Fine-tuning Job

<Tabs>
  <Tab title="Using the FlexAI Console">
    The **Start a new Fine-tuning job** form consists of a set of required and optional fields that you can use to customize your deployment.

    ### To open the **Start a new Fine-tuning job** form

    Either:

    * Follow the direct link to the "Start a new Fine-tuning Job" page.

    Or

    <Steps>
      <Step title="Navigate to the Fine-tuning section">
        Navigate to the **Fine-tuning** section from either the navigation bar or the card on the home page.
      </Step>

      <Step title="Select the New button">
        Select the **New** button to display the creation form.
      </Step>
    </Steps>

    A drawer menu with the creation form will be displayed.

    ### Required Fields

    * **Name**: Your Fine-tuning Job name. Should follow the [resource naming conventions](/best-practices/resource-naming-conventions/).
    * **Repository URL**: The URL of the Git repository containing your Fine-tuning code.\`.
    * **Entry Point**: The path to the entry point script in your repository that will initiate the Fine-tuning Job.
      * The *entry point script* can be followed by any arguments you want to pass to it, such as configurations and Hyperparameters. **Value**: `train.py config/train_shakespeare_char.py --dataset_dir=my_dataset --out_dir=/output-checkpoint --max_iters=1500`.

    ### Other fields

    * **Repository Revision**: The Git revision (branch, tag, or commit) you want to use for this Fine-tuning Job. The `main` branch will be used by default.
    * **Node Count**: The number of nodes you want to use for this Fine-tuning Job. Defaults to `1`.
      * This will determine the amount of Accelerators you will have available for your Fine-tuning Job:
        * 1 node will allow you to use up to 8 Accelerators.
        * Using more than 1 node will make all 8 Accelerators per Node available to your Fine-tuning Job.
    * **Accelerator Count**: The number of Accelerators you want to use for this Fine-tuning Job. Must follow the logic described above. Defaults to `1`.
    * **Datasets**: Can be selected from a dropdown list of the datasets you want to use for this Fine-tuning Job. You can add multiple datasets as well as specify the mount path within the Fine-tuning Runtime (they will be mounted under `/input`). You can read more about this in the [Pushing a Dataset guide](/core-services/training/quickstart/uploading-a-dataset/).

    <Note>
      Don't forget to select the "Add" button after picking a Dataset, otherwise it won't be added to the Training Job.
    </Note>

    * **Environment Variables & Secrets**: Add any environment variables you want to set for this Fine-tuning Job. These will be available to your Fine-tuning code as environment variables within the Training Runtime.
      * You can also reference **Secrets**, which will be securely injected into the Fine-tuning Job's Runtime.
    * **Cluster**: The cluster where the Fine-tuning workload will run on. It can be selected from a dropdown list of available clusters in your FlexAI account. A default cluster will be automatically selected for you if none is specified.

    ### Form Values

    | Field Name              | Value                                                                                                              |
    | ----------------------- | ------------------------------------------------------------------------------------------------------------------ |
    | **Name**                | `nanoGPT-flexai-console`                                                                                           |
    | **Repository URL**      | `https://github.com/flexaihq/nanogpt`                                                                              |
    | **Repository Revision** | `main`                                                                                                             |
    | **Node Count**          | `1`                                                                                                                |
    | **Accelerator Count**   | `1`                                                                                                                |
    | **Entry Point**         | `train.py config/train_shakespeare_char.py --dataset_dir=my_dataset --out_dir=/output-checkpoint --max_iters=1500` |
    | **Datasets**            | Dataset: `nanoGPT-dataset` (from the CLI quickstart), <br /> Mount Directory: `my_dataset`                         |
    | **Cluster**             | *Your organization's designated cluster*                                                                           |

    ### Entry Point script arguments

    The entry point script for this Fine-tuning Job is `train.py`, and it expects the following arguments:

    * `config/train_shakespeare_char.py`:A configuration file, which contains the default Fine-tuning Parameters.
    * `--dataset_dir`: The path within the `/input` directory of the Fine-tuning Runtime where the Dataset files are located.
    * `--out_dir`: The output directory, which will be mounted into the Fine-tuning Runtime as `/output-checkpoint`.
    * `--max_iters`: The maximum number of iterations to run the Fine-tuning script for (optional).

    <Accordion title="Entry Point script arguments details">
      These include any **Environment Settings** and **Hyperparameters** the Fine-tuning script may require. For this tutorial:

      | Parameter                          | Type                | Description                                                                                                                                                                                       |
      | ---------------------------------- | ------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
      | `config/train_shakespeare_char.py` | Environment Setting | A positional argument pointing to a configuration file used by nanoGPT's `train.py` script to set the default Fine-tuning Parameters                                                              |
      | `--out_dir=/output-checkpoint`     | Environment Setting | The output directory where the Fine-tuning script will write checkpoint files. In order to take advantage of FlexAI's Managed Checkpoints feature, this **`should always be /output-checkpoint`** |
      | `--max_iters=1500`                 | Hyperparameter      | The maximum number of iterations to run the Fine-tuning script for. This is an optional hyperparameter that can be used to tweak the Fine-tuning Job execution                                    |
    </Accordion>

    ***

    After filling out the form, select the **Submit** button to start the Fine-tuning Job. You should get a confirmation message indicating that the Fine-tuning Job creation process has been initiated successfully.

    The **Start a new training job** form will close and you will be redirected to the Fine-tuning Jobs list page, where you can see your newly created Fine-tuning Job in the list.

    ***
  </Tab>

  <Tab title="Using the FlexAI CLI">
    Considering the minimum required elements for the creation of a Fine-tuning Job, the following command will initiate its creation and start it running immediately:

    ```bash theme={null}
    flexai training run quickstart-fine-tuning-job \
        --dataset nanoGPT-dataset=my_dataset \
        --repository-url https://github.com/flexaihq/nanogpt \
        --checkpoint a1b18a7f-9b85-4c74-91a9-6aca526e8ce4 \
        -- train.py config/train_shakespeare_char.py --dataset_dir=my_dataset --out_dir=/output-checkpoint --max_iters=1500
    ```

    <Accordion title="Zooming into the `flexai training run` arguments & flags">
      #### Arguments

      | FlexAI command Argument  | Value                        | Description                     |
      | ------------------------ | ---------------------------- | ------------------------------- |
      | **Fine-tuning Job Name** | `quickstart-fine-tuning-job` | The name of the Fine-tuning Job |

      #### Flags

      | Flag                   | Value                                 | Description                                                                                        |
      | ---------------------- | ------------------------------------- | -------------------------------------------------------------------------------------------------- |
      | **Dataset**            | `my_dataset=nanoGPT-dataset`          | The mount path followed by the Dataset name. The dataset will be accessible at `/input/my_dataset` |
      | **Repository URL**     | `https://github.com/flexaihq/nanogpt` | The URL of the GitHub repository containing the workload's code                                    |
      | **Entry Point Script** | `train.py`                            | The path of entry point Fine-tuning script as defined by the repository                            |

      #### Entry Point script arguments

      These include any **Environment Settings** and **Hyperparameters** the entry point script may require. Keep in mind that these are specific to the code you're running:

      | Entry point script argument        | Type                | Description                                                                                                                                                                            |
      | ---------------------------------- | ------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
      | `config/train_shakespeare_char.py` | Environment Setting | A positional argument pointing to a configuration file used by nanoGPT's `train.py` script to set the default runtime Parameters                                                       |
      | `--out_dir=/output-checkpoint`     | Environment Setting | The output directory where the  script will write checkpoint files. In order to take advantage of FlexAI's Managed Checkpoints feature, this **`should always be /output-checkpoint`** |
      | `--max_iters=1500`                 | Hyperparameter      | The maximum number of iterations to run. This is an optional hyperparameter that can be used to tweak the Workload execution                                                           |
    </Accordion>
  </Tab>
</Tabs>

## Up next

Next you'll learn how to get a Fine-tuning Job's details and monitor its progress.
