Skip to main content

Command: dataset

The flexai dataset command manages datasets within FlexAI Cloud Services (FCS).

For information on how Datasets are handled by the flexai CLI, check the Uploading Datasets guide.

dataset delete

Deletes a dataset from FlexAI Cloud Services.

flexai dataset delete <dataset_name>

dataset inspect

Returns detailed information about a dataset: its inception time, metadata, status, file content list, and more. It outputs the information in YAML format by default.

flexai dataset inspect <dataset_name> [--json]

Flags

FlagTypeOptional / RequiredDefinitionExample
--jsonFlagOptionalOutput the information in JSON format--json
Details on the returned information

Returned information

FieldDescriptionData Type
kindThe type of resourceString
metadataMetadata about the datasetObject
metadata.nameThe name of the datasetString
metadata.idThe unique identifier of the datasetString (UUID)
metadata.creatorUserIDThe user ID of the dataset creatorString (UUID)
metadata.ownerOrgIDThe organization ID that owns the datasetString (UUID)
specDataset contents detailsObject
spec.fromLocalFilesA list of files included in the datasetString List
spec.storageProviderThe name of the Remote Storage Provider connectionString
spec.sourcePathThe path to the bucket and file or directory on the Remote Storage Provider connectionString
statusStatus information of the datasetObject
status.statusThe current status of the datasetString
status.storageProviderIDThe ID of the Remote Storage Provider connection used to upload the datasetString (UUID)
status.sizeThe total size of the datasetString (File Size)
status.filesA list of files with their paths and sizesObject List
status.files.pathThe path of the file within the datasetString (File Path)
status.files.sizeThe size of the fileString (File Size)
status.createdAtThe timestamp when the dataset was createdString (ISO 8601)
status.updatedAtThe timestamp when the dataset was last updatedString (ISO 8601)

Example

kind: Dataset
metadata:
name: nanoGPT-dataset
id: 862f794f-a4a0-4c94-8792-21ab324d2ee9
creatorUserID: bd67af19-4c94-4a57-832e-a1dc042f48be
ownerOrgID: 270a5476-4c94-442f-8a13-852efabb5b94
spec:
files:
- train.bin
- val.bin
storageProvider: ""
sourcePath: ""
status:
status: available
size: 2.13 MB
files:
- path: shakespeare_char/train.bin
size: 1.91 MB
- path: shakespeare_char/val.bin
size: 217.85 KB
createdAt: "2024-10-10T11:24:01.244403Z"
updatedAt: "2024-10-10T11:24:01.341233Z"

dataset list

Lists all the available datasets.

flexai dataset list

Example

flexai dataset list

NAME | FILES COUNT | TOTAL SIZE | STATUS | AGE
-----------------------------+-------------+------------+-----------+------
wikitext-2-raw-v1 | 1 | 63.21 MB | available | 2d
nanoGPT-dataset | 2 | 2.13 MB | available | 18h
train_media_2025-01 | 2059 | 443 GB | available | 2h

dataset push

Pushes a new dataset to a FlexAI Cloud Services. Multiple files can be uploaded at once by using multiple instances of the --file flag, or by pointing the --file flag to a directory containing dataset files.

The Uploading Datasets guide provides more information on how to use the dataset push command to upload both local files and files from a Remote Storage Provider.

flexai dataset push <dataset_name> (( --file <source_path> ... | --file <source_path>=<fcs_dataset_path> ...) | [(--storage-provider <storage_provider_name> --source-path <source_path>)])

Arguments

ArgumentDescription
dataset_nameResource name. Must follow the FCS resource naming conventions

Flags

FlagTypeOptional / RequiredDefinitionExample
-f / --fileString or Key/Value mappingRequiredLocal source path and optionally, the desired destination path to upload the file or directory to the dataset. See below for more details-f tdata.tar.gz, -f train.bin=shakespeare_char/train.bin
--source-pathStringOptionalThe path to the bucket and file or directory to be pushed <bucket_name>/<path>. Used in conjunction with --storage-providermy-bucket/datasets/coffee-leaf-diseases--train
--storage-providerStringOptionalThe name of the Remote Storage Provider connection to use for the dataset uploadaws-storage-conn-eu

source_path can point to files or directories.

The --file flag

The -f / --file flag can be used in two ways:

  1. Source path: The --file point to local source path that is either a file or a directory. In this case, the selected file or directory will be added to the root of the dataset
  2. Key/Value path mapping: The --file flag can be used to specify a local source and destination path of the file or directory to add to the the dataset by using the <source_path>=<fcs_dataset_path> syntax

Dataset Statuses

Datasets in FCS can have different statuses, which indicate their availability and readiness for use. The following statuses are used:

StatusDescription
availableThe dataset is ready for use
errorThe dataset is in an error state and cannot be used
pendingThe dataset is being processed by FCS
uploadingThe dataset is being uploaded to FCS
syncingThe dataset is being synchronized with the Remote Storage Provider