Skip to content

Loading a Dataset

To train nanoGPT, download the train.bin and val.bin pre-generated files for this Quickstart tutorial and upload them to the FlexAI platform using the flexai dataset family of commands.

Terminal window
curl --remote-name-all https://docs.flex.ai/example_data/{train.bin,val.bin}

The flexai dataset push command creates a new Dataset using the provided local files and pushes it to FlexAI, so it can be attached to a Training Job later on:

Use the -f/--file flag to specify the path of the target file on your local filesystem followed by and equals sign = and the desired path for the file in the FCS Dataset:

Terminal window
flexai dataset push nanoGPT-dataset \
--file train.bin=shakespeare_char/train.bin \
--file val.bin=shakespeare_char/val.bin

Your FCS Dataset files will be made available to your training scripts under the /input/ directory. For this example, the FCS Dataset file structure will look like this:

  • Directory/
    • Directoryinput/
      • DirectorynanoGPT-dataset/
        • Directoryshakespeare_char/
          • train.bin
          • val.bin

To list your existing Datasets, use the list command:

Terminal window
flexai dataset list
NAME | FILES COUNT | TOTAL SIZE | STATUS | AGE
--------------------+--------------+------------+-----------+------
nanoGPT-dataset | 2 | 2.13 MB | available | 1m