Loading a Dataset
Getting the Quickstart’s Dataset files
Section titled “Getting the Quickstart’s Dataset files”To train nanoGPT, download the train.bin and
val.bin pre-generated files for this Quickstart tutorial and upload them to the FlexAI platform using the flexai dataset
family of commands.
curl --remote-name-all https://docs.flex.ai/example_data/{train.bin,val.bin}
Uploading the Dataset
Section titled “Uploading the Dataset”The flexai dataset push
command creates a new Dataset using the provided local files and pushes it to FlexAI, so it can be attached to a Training Job later on:
Use the -f
/--file
flag to specify the path of the target file on your local filesystem followed by and equals sign =
and the desired path for the file in the FCS Dataset:
flexai dataset push nanoGPT-dataset \ --file train.bin=shakespeare_char/train.bin \ --file val.bin=shakespeare_char/val.bin
An FCS Dataset structure
Section titled “An FCS Dataset structure”Your FCS Dataset files will be made available to your training scripts under the /input/
directory. For this example, the FCS Dataset file structure will look like this:
Directory/
Directoryinput/
DirectorynanoGPT-dataset/
Directoryshakespeare_char/
- train.bin
- val.bin
Listing your Datasets
Section titled “Listing your Datasets”To list your existing Datasets, use the list
command:
flexai dataset list
NAME | FILES COUNT | TOTAL SIZE | STATUS | AGE--------------------+--------------+------------+-----------+------nanoGPT-dataset | 2 | 2.13 MB | available | 1m