Skip to main content

Uploading Datasets from Remote Sources

If you rely on a Cloud Storage Service to store your Datasets, you can upload them directly to FCS Storage Services without having to download them to your local machine first, streamlining the process by leveraging a "Server to Server" connection and thus, saving you time.

The logic behind the process

The process of uploading Datasets from remote sources to FCS is split into three steps:

  1. storing the Storage Provider access credentials into FCS
  2. Creating a Storage Provider Connection
  3. And finally, uploading the Datasets to FCS

Storing the required credentials

Credentials are safely kept using the FCS Secret Manager, which encrypts and stores them in a secure manner. You can store strings and even entire files as Secrets, allowing for maximum flexibility when it comes to managing your credentials.

Creating a Storage Provider Connection

Once access credentials are stored in an FCS Secret, you can then move on to creating a Storage Provider Connection.

A Storage Provider Connection is an FCS entity that holds the necessary information to connect to a Cloud Storage Service. This includes the type of Storage Provider, the Secret containing the access credentials, and any additional configuration required to establish the connection which may vary depending on the Storage Provider.

Uploading the Datasets to FCS

Uploading Datasets from a remote source to FCS is quite similar to uploading them from your local machine. In fact you use the same command: flexai dataset push, but instead of specifying a local file path, and a destination path in the FCS Dataset using the --file flag, you pass in the name of the Storage Provider Connection (--storage-provider) and the path to the file or directory in the Remote Storage Service (--source-path):

flexai dataset push <dataset_name> --storage-provider <storage_provider_connection_name> --source-path <remote_path>

An asynchronous process will start, and you will be able to monitor the progress of the upload using the flexai dataset inspect <dataset_name> command.

You can push multiple Datasets using the same Storage Provider Connection using different values for --source-path or create additional Storage Provider Connections for other providers you have access to.

note

The contents of the <remote_path> you push will be placed into the root of the Dataset. Learn more about a Dataset's structure in the Datasets in FCS section of the introduction to this guide under "Uploading Datasets to FCS".

Supported Cloud Storage Providers

FCS currently supports the following Cloud Storage Providers:

Working with other Cloud Storage Providers

Some Cloud Storage Providers are not directly supported by FCS, but if they provide the option of mounting their storage to a file system path (either on your host machine or a Virtual Machine), you can use the flexai dataset push command to upload your Datasets to FCS.

Microsoft Azure

Files stored in Azure Blob Storage can be mounted to a file system path via BlobFuse2.

Once the data is available as a file system path, you can then use the flexai dataset push command to upload your Datasets to FCS, as described in the Uploading Datasets from your local machine guide. Note that if you decide to use a Virtual Machine without a graphical environment, you can follow the instructions in the Integration Tips & Recommendations page to authenticate your FlexAI CLI.

Visit Microsoft Learn's guide on How to mount an Azure Blob Storage container on Linux with BlobFuse2 for more details.