The logic behind the process
The process of uploading Datasets from remote sources to FlexAI is split into three steps:- storing the Storage Provider access credentials into the FlexAI Secret Manager
- Creating a Storage Provider Connection
- And finally, uploading the Datasets to FlexAI
Storing the required credentials
Credentials are safely kept using FlexAI’s Secret Manager, which encrypts and stores them in a secure manner. You can store strings and even entire files as Secrets, allowing for maximum flexibility when it comes to managing your credentials.Creating a Storage Provider Connection
Once access credentials are stored in a FlexAI Secret, you can then move on to creating a Storage Provider Connection. A Storage Provider Connection is a FlexAI entity that holds the necessary information to connect to a Cloud Storage Service. This includes the type of Storage Provider, the Secret containing the access credentials, and any additional configuration required to establish the connection which may vary depending on the Storage Provider.Uploading the Datasets to FlexAI
Uploading Datasets from a remote source to FlexAI is quite similar to uploading them from your local machine. In fact you use the same command:flexai dataset push, but instead of specifying a local file path, and a destination path in the FlexAI Dataset using the --file flag, you pass in the name of the Storage Provider Connection (--storage-provider) and the path to the file or directory in the Remote Storage Service (--source-path):
flexai dataset inspect <dataset_name> command.
You can push multiple Datasets using the same Storage Provider Connection using different values for --source-path or create additional Storage Provider Connections for other providers you have access to.
The contents of the
<remote_path> you push will be placed into the root of the Dataset. Learn more about a Dataset’s structure in the Dataset Manager section.Supported Cloud Storage Providers
The FlexAI Remote Storage Connection Manager Service currently supports the following Cloud Storage Providers:s3: Amazon S3gcs: Google Cloud Storager2: Cloudflare R2minio: MinIOhuggingface: Hugging Face Hub
Working with other Cloud Storage Providers
Some Cloud Storage Providers are not directly supported by FlexAI, but if they provide the option of mounting their storage to a file system path (either on your host machine or a Virtual Machine), you can use theflexai dataset push command to upload your Datasets to FlexAI.
Microsoft Azure
Files stored in Azure Blob Storage can be mounted to a file system path viaBlobFuse2.
Once the data is available as a file system path, you can then use the flexai dataset push command to upload your dataset files to make them available to the FlexAI Dataset Manager Service, as described in the Uploading Datasets from your local machine guide. Note that if you decide to use a Virtual Machine without a graphical environment, you can follow the instructions in the CLI on a Virtualized Environment page to authenticate your FlexAI CLI.
Visit Microsoft Learn’s guide on How to mount an Azure Blob Storage container on Linux with BlobFuse2 for more details.