Uploading a Dataset
Download The Sample Dataset Files
You can upload files or directories from your local machine to create a new Dataset.
For this quickstart tutorial, we will use two pre-generated files: train.bin and val.bin that you can download to your machine by using the links below:
- train.bin π: Training dataset
- val.bin π: Validation dataset for model evaluation
Download the pre-generated training dataset files for this quickstart tutorial:
curl --remote-name-all https://docs.flex.ai/example_data/{train.bin,val.bin}This downloads two binary files:
train.bin: Training dataset (Shakespeare character-level data)val.bin: Validation dataset for model evaluation
Uploading files to the FlexAI Dataset Manager Service
Navigate to the Dataset Manager in the FlexAI Console:
- Visit the βAdd Datasetβ section of the FlexAI Console π
- Enter a name for your dataset:
nanoGPT-dataset- Must follow the FlexAI Resource Naming conventions
- Select the βLocalβ option for βUpload Originβ
- Select the
+ Upload Itembutton to open the βUpload Itemsβ dialog
Use the βSelect fileβ option to open a file browser dialog:
- Select both
train.binandval.binfiles from your local machine- Depending on your system and browser, you might need to hold down the Ctrl / Cmd key while selecting multiple files
- When selecting multiple files, your browser might prompt you with a confirmation message asking you to allow multiple file selection.
- You can also select files individually if you prefer
- In the βDestination Pathβ field, enter
shakespeare_char - Select the
Addbutton to confirm the file selection and destination mapping
The βUpload Itemsβ dialog will close and you will get back to the βAdd a Datasetβ form, where you will see the files you just added listed under the βUpload Itemsβ section. Here you will see a list of files similar to the one below:
Directoryshakespeare_char/ # The βDestination Pathβ you specified
- train.bin 1.91MB
- val.bin 217.85KB
Below the file list you will find an + Add items button that will open up the βUpload Itemsβ dialog again, in case you want to add more files or directories to the Dataset.
Finally, select the Add Dataset button to start the upload process.
Use the flexai dataset push command to create and upload your Dataset:
flexai dataset push nanoGPT-dataset \ --file train.bin=shakespeare_char/train.bin \ --file val.bin=shakespeare_char/val.binThe command above has the following components:
| Component | Value | Description |
|---|---|---|
| Dataset Name | nanoGPT-dataset | Name for the Dataset in FlexAI. Must follow the FlexAI Resource Naming conventions. |
| File Mapping | train.bin=shakespeare_char/train.bin | Maps local file to dataset path |
| File Mapping | val.bin=shakespeare_char/val.bin | Maps local file to dataset path |
Dataset structure
Once upload is complete, you can select the gear icon βοΈ (labeled as Configure) in the Actions field of the Dataset list page. This will open the Dataset βDetailsβ panel where you will be able to see the Datasetβs name, status, creation date, and a list of files that were uploaded as part of the Dataset.
Summary
- Name: nanoGPT-dataset
- Status: Ready
- Creation Time: 8/5/2025, 2:13:07 PM
Details
Directoryshakespeare_char/
- train.bin 1.191MB
- val.bin 217.85KB
To learn more about the ways your workloads can access Datasets, check out the Runtime Access section of the Dataset Manager overview page.
Check that your dataset was uploaded successfully:
flexai dataset listExpected output:
NAME β FILES COUNT β TOTAL SIZE β STATUS β CREATED ATββββββββββββββββββββββΌββββββββββββββΌβββββββββββββΌββββββββββββΌββββββββββββββββββββββββββββ nanoGPT-dataset β 2 β 2.13 MB β available β 2025-08-05 13:13:07 (1h)The status should show available when the upload is complete and the dataset is ready for training.
You can run the flexai dataset inspect <DATASET_NAME> command to get more detailed information about your Dataset:
flexai dataset inspect nanoGPT-datasetWhich will output something like:
kind: Datasetmetadata: name: nanoGPT-dataset id: 1b541f62-2faf-4e32-8fd1-a6bc27e26b58 creatorUserID: 16e289cc-c81b-4a15-91d9-0e2aae00a317 ownerOrgId: 270a5476-b91a-442f-8a13-852ef7bb5b9cspec: fromLocalFiles: - train.bin - val.bin storageProvider: "" sourcePath: ""status: status: available storageProviderID: 00000000-0000-0000-0000-000000000000 size: 2.13 MB files: - path: shakespeare_char/train.bin size: 1.91 MB - path: shakespeare_char/val.bin size: 217.85 KB createdAt: 2025-08-05 14:13:07 (57d) updatedAt: 2025-08-05 14:13:08 (57d) dataSyncs: []{ "kind": "Dataset", "metadata": { "name": "nanoGPT-dataset", "id": "1b541f62-2faf-4e32-8fd1-a6bc27e26b58", "creatorUserID": "15e2894c-c81b-4a15-91d5-0e2aae00a317", "ownerOrgID": "270a5476-b91a-442f-8a13-852ef7bb5b9c" }, "spec": { "fromLocalFiles": [ "train.bin", "val.bin" ], "storageProvider": "", "sourcePath": "" }, "status": { "status": "available", "storageProviderID": "00000000-0000-0000-0000-000000000000", "size": 2230788, "files": [ { "path": "shakespeare_char/train.bin", "size": 2007708 }, { "path": "shakespeare_char/val.bin", "size": 223080 } ], "createdAt": "2025-08-05T13:13:07Z", "updatedAt": "2025-08-05T13:13:08Z", "dataSyncs": [] }}