Command: dataset
The flexai dataset
command manages datasets within FlexAI Cloud Services (FCS).
For information on how Datasets are handled by the flexai
CLI, check the Uploading Datasets guide.
dataset delete
Deletes a dataset from FlexAI Cloud Services.
flexai dataset delete <dataset_name>
dataset inspect
Returns detailed information about a dataset: its inception time, metadata, status, file content list, and more. It outputs the information in YAML format by default.
flexai dataset inspect <dataset_name> [--json]
Flags
Flag | Type | Optional / Required | Definition | Example |
---|---|---|---|---|
--json | Flag | Optional | Output the information in JSON format | --json |
Details on the returned information
Returned information
Field | Description | Data Type |
---|---|---|
kind | The type of resource | String |
metadata | Metadata about the dataset | Object |
metadata.name | The name of the dataset | String |
metadata.id | The unique identifier of the dataset | String (UUID) |
metadata.creatorUserID | The user ID of the dataset creator | String (UUID) |
metadata.ownerOrgID | The organization ID that owns the dataset | String (UUID) |
spec | Dataset contents details | Object |
spec.fromLocalFiles | A list of files included in the dataset | String List |
spec.storageProvider | The name of the Remote Storage Provider connection | String |
spec.sourcePath | The path to the bucket and file or directory on the Remote Storage Provider connection | String |
status | Status information of the dataset | Object |
status.status | The current status of the dataset | String |
status.storageProviderID | The ID of the Remote Storage Provider connection used to upload the dataset | String (UUID) |
status.size | The total size of the dataset | String (File Size) |
status.files | A list of files with their paths and sizes | Object List |
status.files.path | The path of the file within the dataset | String (File Path) |
status.files.size | The size of the file | String (File Size) |
status.createdAt | The timestamp when the dataset was created | String (ISO 8601) |
status.updatedAt | The timestamp when the dataset was last updated | String (ISO 8601) |
Example
kind: Dataset
metadata:
name: nanoGPT-dataset
id: 862f794f-a4a0-4c94-8792-21ab324d2ee9
creatorUserID: bd67af19-4c94-4a57-832e-a1dc042f48be
ownerOrgID: 270a5476-4c94-442f-8a13-852efabb5b94
spec:
files:
- train.bin
- val.bin
storageProvider: ""
sourcePath: ""
status:
status: available
size: 2.13 MB
files:
- path: shakespeare_char/train.bin
size: 1.91 MB
- path: shakespeare_char/val.bin
size: 217.85 KB
createdAt: "2024-10-10T11:24:01.244403Z"
updatedAt: "2024-10-10T11:24:01.341233Z"
dataset list
Lists all the available datasets.
flexai dataset list
Example
flexai dataset list
NAME | FILES COUNT | TOTAL SIZE | STATUS | AGE
-----------------------------+-------------+------------+-----------+------
wikitext-2-raw-v1 | 1 | 63.21 MB | available | 2d
nanoGPT-dataset | 2 | 2.13 MB | available | 18h
train_media_2025-01 | 2059 | 443 GB | available | 2h
dataset push
Pushes a new dataset to a FlexAI Cloud Services. Multiple files can be uploaded at once by using multiple instances of the --file
flag, or by pointing the --file
flag to a directory containing dataset files.
The Uploading Datasets guide provides more information on how to use the dataset push
command to upload both local files and files from a Remote Storage Provider.
flexai dataset push <dataset_name> (( --file <source_path> ... | --file <source_path>=<fcs_dataset_path> ...) | [(--storage-provider <storage_provider_name> --source-path <source_path>)])
Arguments
Argument | Description |
---|---|
dataset_name | Resource name. Must follow the FCS resource naming conventions |
Flags
Flag | Type | Optional / Required | Definition | Example |
---|---|---|---|---|
-f / --file | String or Key/Value mapping | Required | Local source path and optionally, the desired destination path to upload the file or directory to the dataset. See below for more details | -f tdata.tar.gz , -f train.bin=shakespeare_char/train.bin |
--source-path | String | Optional | The path to the bucket and file or directory to be pushed <bucket_name> /<path> . Used in conjunction with --storage-provider | my-bucket/datasets/coffee-leaf-diseases--train |
--storage-provider | String | Optional | The name of the Remote Storage Provider connection to use for the dataset upload | aws-storage-conn-eu |
source_path
can point to files or directories.
The --file
flag
The -f
/ --file
flag can be used in two ways:
- Source path: The
--file
point to local source path that is either a file or a directory. In this case, the selected file or directory will be added to the root of the dataset - Key/Value path mapping: The
--file
flag can be used to specify a local source and destination path of the file or directory to add to the the dataset by using the<source_path>=<fcs_dataset_path>
syntax
Dataset Statuses
Datasets in FCS can have different statuses, which indicate their availability and readiness for use. The following statuses are used:
Status | Description |
---|---|
available | The dataset is ready for use |
error | The dataset is in an error state and cannot be used |
pending | The dataset is being processed by FCS |
uploading | The dataset is being uploaded to FCS |
syncing | The dataset is being synchronized with the Remote Storage Provider |