Multiple Upload Sources
Upload from a local machine, AWS S3, Google Cloud Storage, the Hugging Face Hub, and more
The FlexAI Dataset Manager provides a comprehensive solution for uploading, organizing, and managing Datasets used in your AI training and Fine-tuning workloads. Whether your data is stored locally or in the cloud, the Dataset Manager streamlines the process of making it available to your AI Workloads.
The FlexAI Dataset Manager enables you to:
Multiple Upload Sources
Upload from a local machine, AWS S3, Google Cloud Storage, the Hugging Face Hub, and more
Flexible Structure
Support for flat or hierarchical directory structures with custom file organization
Immutable Storage
Datasets are immutable once created, ensuring reproducible training results
Use multiple Datasets on a single Workload
Attach multiple Datasets to a single Training or Fine-tuning Job for complex scenarios.
Upload Datasets directly from your local filesystem:
Seamlessly upload Datasets from cloud storage without local downloads:
Visit the Remote Storage Connection Manager section for detailed instructions on setting up and using remote storage connections.
FlexAI supports completely flexible Dataset organization.
Here’s an example of a Dataset with a flat structure. It was named my_flat_dataset at the time of its attachment to a Training or Fine-tuning Job:
/input/ are read-onlyHere’s an example of a Dataset with a large set of nested directories and files inside. It was named my-dataset at the time of its attachment to a Training or Fine-tuning Job:
/input/ are read-onlyThis example shows how multiple Datasets are organized when attached to a single Training or Fine-tuning Job. Each Dataset gets its own sub-directory under /input/.
Here we have three Datasets attached to the same Job:
finewebylecun_mnistOpenAssistant-oasst1/input/ are read-onlyDatasets are mounted as read-only resources in your Runtime Environment:
/input/<dataset_name>/The Dataset Manager supports any file format your training code can process:
Optimize upload performance with efficient batch operations:
Maximize storage efficiency and minimize costs:
Ready to start managing your Datasets? Explore these resources:
Comprehensive guide to uploading Datasets from your local host
Learn how to upload data from a Cloud Storage provider
Learn how the Remote Storage Connection Manager works and how to set a connection up