> ## Documentation Index
> Fetch the complete documentation index at: https://docs.flex.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Uploading a Dataset

> Upload and prepare a dataset for your fine-tuning job

## Download The Sample Dataset Files

<Tabs>
  <Tab title="Using the FlexAI Console">
    You can upload files or directories from your local machine to create a new Dataset.

    For this quickstart tutorial, we will use two pre-generated files: `train.bin` and `val.bin` that you can download to your machine by using the links below:

    * [train.bin](https://github.com/flexaihq/nanoGPT/raw/refs/heads/prepare_dataset/data/shakespeare_char/dataset/train.bin): Training dataset
    * [val.bin](https://github.com/flexaihq/nanoGPT/raw/refs/heads/prepare_dataset/data/shakespeare_char/dataset/val.bin): Validation dataset for model evaluation

    ***
  </Tab>

  <Tab title="Using the FlexAI CLI">
    Download the pre-generated training dataset files for this quickstart tutorial:

    ```bash theme={null}
    curl -L --remote-name-all https://github.com/flexaihq/nanoGPT/raw/refs/heads/prepare_dataset/data/shakespeare_char/dataset/{train.bin,val.bin}
    ```

    This downloads two binary files:

    * `train.bin`: Training dataset (Shakespeare character-level data)
    * `val.bin`: Validation dataset for model evaluation

    ***
  </Tab>
</Tabs>

## Uploading files to the FlexAI Dataset Manager Service

<Tabs>
  <Tab title="Using the FlexAI Console">
    Navigate to the Dataset Manager in the FlexAI Console:

    <Steps>
      <Step title="Visit the Add Dataset section">
        Visit the ["Add Dataset" section of the FlexAI Console](https://console.flex.ai/datasets/new)
      </Step>

      <Step title="Enter a name for your dataset">
        Enter a name for your dataset: `nanoGPT-dataset`

        * Must follow the FlexAI [Resource Naming conventions](/best-practices/resource-naming-conventions/)
      </Step>

      <Step title="Select the Local option">
        Select the "Local" option for "Upload Origin"
      </Step>

      <Step title="Upload Items">
        Select the `+ Upload Item` button to open the "Upload Items" dialog
      </Step>
    </Steps>

    Use the "Select file" option to open a file browser dialog:

    <Steps>
      <Step title="Select both files">
        Select both `train.bin` and `val.bin` files from your local machine

        * Depending on your system and browser, you might need to hold down the Ctrl/Cmd key while selecting multiple files
        * When selecting multiple files, your browser might prompt you with a confirmation message asking you to allow multiple file selection.
        * You can also select files individually if you prefer
      </Step>

      <Step title="Enter destination path">
        In the "Destination Path" field, enter `shakespeare_char`
      </Step>

      <Step title="Add files">
        Select the `Add` button to confirm the file selection and destination mapping
      </Step>
    </Steps>

    The "Upload Items" dialog will close and you will get back to the "Add a Dataset" form, where you will see the files you just added listed under the "Upload Items" section. Here you will see a list of files similar to the one below:

    ```
    shakespeare_char/
    ├── train.bin (1.91MB)
    └── val.bin (217.85KB)
    ```

    Below the file list you will find an `+ Add items` button that will open up the "Upload Items" dialog again, in case you want to add more files or directories to the Dataset.

    Finally, select the `Add Dataset` button to start the upload process.

    ***
  </Tab>

  <Tab title="Using the FlexAI CLI">
    Use the `flexai dataset push` command to create and upload your Dataset:

    ```bash theme={null}
    flexai dataset push nanoGPT-dataset \
      --file train.bin=shakespeare_char/train.bin \
      --file val.bin=shakespeare_char/val.bin
    ```

    The command above has the following components:

    | Component        | Value                                  | Description                                                                                                                         |
    | ---------------- | -------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------- |
    | **Dataset Name** | `nanoGPT-dataset`                      | Name for the Dataset in FlexAI. Must follow the FlexAI [Resource Naming conventions](/best-practices/resource-naming-conventions/). |
    | **File Mapping** | `train.bin=shakespeare_char/train.bin` | Maps local file to dataset path                                                                                                     |
    | **File Mapping** | `val.bin=shakespeare_char/val.bin`     | Maps local file to dataset path                                                                                                     |

    ***
  </Tab>
</Tabs>

## Dataset structure

<Tabs>
  <Tab title="Using the FlexAI Console">
    Once upload is complete, you can select the gear icon ⚙️ (labeled as *Configure*) in the `Actions` field of the Dataset list page. This will open the Dataset "Details" panel where you will be able to see the Dataset's name, status, creation date, and a list of files that were uploaded as part of the Dataset.

    ### Summary

    * **Name**: nanoGPT-dataset
    * **Status**: Ready
    * **Creation Time**: 8/5/2025, 2:13:07 PM

    ### Details

    ```
    shakespeare_char/
    ├── train.bin (1.191MB)
    └── val.bin (217.85KB)
    ```

    To learn more about the ways your workloads can access Datasets, check out the [Runtime Access](/platform-services/dataset-manager/#runtime-access) section of the Dataset Manager overview page.

    ***
  </Tab>

  <Tab title="Using the FlexAI CLI">
    Check that your dataset was uploaded successfully:

    ```bash theme={null}
    flexai dataset list
    ```

    Expected output:

    ```text theme={null}
     NAME                │ FILES COUNT │ TOTAL SIZE │ STATUS    │ CREATED AT
    ─────────────────────┼─────────────┼────────────┼───────────┼────────────────────────────
     nanoGPT-dataset     │ 2           │ 2.13 MB    │ available │ 2025-08-05 13:13:07 (1h)
    ```

    The status should show `available` when the upload is complete and the dataset is ready for training.

    You can run the `flexai dataset inspect <DATASET_NAME>` command to get more detailed information about your Dataset:

    ```bash theme={null}
    flexai dataset inspect nanoGPT-dataset
    ```

    Which will output something like:

    <Tabs>
      <Tab title="YAML Output">
        ```yaml theme={null}
        kind: Dataset
        metadata:
          name: nanoGPT-dataset
          id: 1b541f62-2faf-4e32-8fd1-a6bc27e26b58
          creatorUserID: 16e289cc-c81b-4a15-91d9-0e2aae00a317
          ownerOrgId: 270a5476-b91a-442f-8a13-852ef7bb5b9c
        spec:
          fromLocalFiles:
            - train.bin
            - val.bin
          storageProvider: ""
          sourcePath: ""
        status:
          status: available
          storageProviderID: 00000000-0000-0000-0000-000000000000
          size: 2.13 MB
          files:
            - path: shakespeare_char/train.bin
              size: 1.91 MB
            - path: shakespeare_char/val.bin
              size: 217.85 KB
          createdAt: 2025-08-05 14:13:07 (57d)
          updatedAt: 2025-08-05 14:13:08 (57d)
          dataSyncs: []
        ```
      </Tab>

      <Tab title="JSON Output">
        ```json theme={null}
        {
          "kind": "Dataset",
          "metadata": {
            "name": "nanoGPT-dataset",
            "id": "1b541f62-2faf-4e32-8fd1-a6bc27e26b58",
            "creatorUserID": "15e2894c-c81b-4a15-91d5-0e2aae00a317",
            "ownerOrgID": "270a5476-b91a-442f-8a13-852ef7bb5b9c"
          },
          "spec": {
            "fromLocalFiles": [
              "train.bin",
              "val.bin"
            ],
            "storageProvider": "",
            "sourcePath": ""
          },
          "status": {
            "status": "available",
            "storageProviderID": "00000000-0000-0000-0000-000000000000",
            "size": 2230788,
            "files": [
              {
                "path": "shakespeare_char/train.bin",
                "size": 2007708
              },
              {
                "path": "shakespeare_char/val.bin",
                "size": 223080
              }
            ],
            "createdAt": "2025-08-05T13:13:07Z",
            "updatedAt": "2025-08-05T13:13:08Z",
            "dataSyncs": []
          }
        }
        ```
      </Tab>
    </Tabs>

    ***
  </Tab>
</Tabs>
