Loading a Dataset

Getting the Quickstart's Dataset files

To train nanoGPT, download the train.bin and val.bin pre-generated files for this Quickstart tutorial and upload them to the FlexAI Cloud Services platform using the flexai dataset family of commands.

curl --remote-name-all https://docs.flex.ai/example_data/{train.bin,val.bin}

Uploading the Dataset

The flexai dataset push command creates a new Dataset using the provided local files and pushes it to FlexAI Cloud Services, so it can be attached to a training job later on:

Use the -f/--file flag to specify the path of the target file on your local filesystem followed by and equals sign = and the desired path for the file in the FCS Dataset:

flexai dataset push nanoGPT-dataset \
  --file train.bin=shakespeare_char/train.bin \
  --file val.bin=shakespeare_char/val.bin

An FCS Dataset structure

Your FCS Dataset files will be made available to your training scripts under the /input/ directory. The FCS Dataset file structure of the example above will look like this:

// FCS dataset:

/
└── input/
    └── nanoGPT-dataset/
        └── shakespeare_char/
            ├── train.bin
            └── val.bin

Learn more about Uploading Datasets

The Uploading Datasets guide provides a comprehensive overview of the different methods available to upload Datasets to FlexAI.

Listing your Datasets

To list your existing Datasets, use the list command:

flexai dataset list

NAME                | FILES COUNT  | TOTAL SIZE |  STATUS   | AGE
--------------------+--------------+------------+-----------+------
nanoGPT-dataset     | 2            | 2.13 MB    | available | 1m

Getting the Quickstart's Dataset files​

Uploading the Dataset​

An FCS Dataset structure​

Listing your Datasets​

Getting the Quickstart's Dataset files

Uploading the Dataset

An FCS Dataset structure

Listing your Datasets