Uploading Datasets from Remote Sources
If you rely on a Cloud Storage Service to store your Datasets, you can upload them directly to FCS Storage Services without having to download them to your local machine first, streamlining the process by leveraging a “Server to Server” connection and thus, saving you time.
The logic behind the process
Section titled “The logic behind the process”The process of uploading Datasets from remote sources to FlexAI is split into three steps:
- storing the Storage Provider access credentials into the FlexAI Secret Store
- Creating a Storage Provider Connection
- And finally, uploading the Datasets to FlexAI
Storing the required credentials
Section titled “Storing the required credentials”Credentials are safely kept using FlexAI’s Secret Manager, which encrypts and stores them in a secure manner. You can store strings and even entire files as Secrets, allowing for maximum flexibility when it comes to managing your credentials.
Creating a Storage Provider Connection
Section titled “Creating a Storage Provider Connection”Once access credentials are stored in a FlexAI Secret, you can then move on to creating a Storage Provider Connection.
A Storage Provider Connection is a FlexAI entity that holds the necessary information to connect to a Cloud Storage Service. This includes the type of Storage Provider, the Secret containing the access credentials, and any additional configuration required to establish the connection which may vary depending on the Storage Provider.
Uploading the Datasets to FlexAI
Section titled “Uploading the Datasets to FlexAI”Uploading Datasets from a remote source to FlexAI is quite similar to uploading them from your local machine. In fact you use the same command: flexai dataset push
, but instead of specifying a local file path, and a destination path in the FlexAI Dataset using the --file
flag, you pass in the name of the Storage Provider Connection (--storage-provider
) and the path to the file or directory in the Remote Storage Service (--source-path
):
flexai dataset push <dataset_name> --storage-provider <storage_provider_connection_name> --source-path <remote_path>
An asynchronous process will start, and you will be able to monitor the progress of the upload using the flexai dataset inspect <dataset_name>
command.
You can push multiple Datasets using the same Storage Provider Connection using different values for --source-path
or create additional Storage Provider Connections for other providers you have access to.
Supported Cloud Storage Providers
Section titled “Supported Cloud Storage Providers”FCS currently supports the following Cloud Storage Providers:
s3
: Amazon S3gcs
: Google Cloud Storager2
: Cloudflare R2minio
: MinIOhuggingface
: Hugging Face Hub
You can learn more about how to create a Storage Provider Connection for each of these providers in the Creating a Storage Provider Connection guide.
Working with other Cloud Storage Providers
Section titled “Working with other Cloud Storage Providers”Some Cloud Storage Providers are not directly supported by FlexAI, but if they provide the option of mounting their storage to a file system path (either on your host machine or a Virtual Machine), you can use the flexai dataset push
command to upload your Datasets to FlexAI.
Microsoft Azure
Section titled “Microsoft Azure”Files stored in Azure Blob Storage can be mounted to a file system path via BlobFuse2
🔗.
Once the data is available as a file system path, you can then use the flexai dataset push
command to upload your Datasets to FCS, as described in the Uploading Datasets from your local machine guide. Note that if you decide to use a Virtual Machine without a graphical environment, you can follow the instructions in the Integration Tips & Recommendations page to authenticate your FlexAI CLI.
Visit Microsoft Learn’s guide on How to mount an Azure Blob Storage container on Linux with BlobFuse2 🔗 for more details.