Skip to content

From a Remote Storage Provider

Uploading Datasets from Remote Storage Providers

Section titled “Uploading Datasets from Remote Storage Providers”

Here you will find instructions on how to upload datasets from some of the Remote Storage Providers supported by FlexAI.

To upload Datasets from Amazon S3 to FlexAI, you first need to create a Storage Provider Connection that holds the necessary information to connect to your Amazon S3 bucket.

You will find this entry from the AWS Security Blog useful: How to quickly find and update your access keys […] 🔗.

You will need the following:

  • Your Amazon S3 Secret Access Key
  • Your Amazon S3 Access Key ID
  • The Amazon S3 region
  • The endpoint URL associated with your Amazon S3 region
  1. Store Your Credentials using the FlexAI Secret Manager

    Store your Amazon S3 Secret Access Key

    To store your Amazon S3 Secret Access Key as an FlexAI Secret, run the following command:

    Terminal window
    flexai secret create s3-secret-access-key

    You will be prompted to enter your Amazon S3 Secret Access Key (of course you can paste it in!). Once you have entered it, hit Enter, and the Secret s3-secret-access-key will be created.

  2. Create the Storage Provider Connection

    Create the Storage Provider Connection

    With the Amazon S3 Secret Access Key stored as an FlexAI Secret, you can now create a Storage Provider Connection for Amazon S3 using the flexai storage command by following the example shown by the command string below:

    Terminal window
    flexai storage create <storage_provider_connection_name> \
    --provider s3 \
    --region <s3_region> \
    --endpoint <s3_endpoint> \
    --access-key-id <access_key_id> \
    --secret-access-key-name <name_of_the_secret_with_the_secret_access_key>

    Note that the value of --endpoint will depend on the region where your Amazon S3 bucket is located. You can find the official list of Amazon S3 endpoints here 🔗.

    A Remote Storage Connection for an Amazon S3 bucket located in the eu-west-1 region with the endpoint s3.eu-west-1.amazonaws.com and an Access Key ID AKIAIOSFODIN7AAF89GU would look like this:

    Terminal window
    flexai storage create aws-storage-conn-eu \
    --provider s3 \
    --region eu-west-1 \
    --endpoint s3.eu-west-1.amazonaws.com \
    --access-key-id AKIAIOSFODIN7AAF89GU \
    --secret-access-key-name s3-secret-access-key
  3. Upload Datasets from Amazon S3 to FlexAI

    Upload Datasets from Amazon S3 to FlexAI

    Now you can use your newly created aws-storage-conn-eu Storage Provider Connection to upload Datasets from an Amazon S3 bucket directly to FlexAI by using the flexai dataset push command as shown by the command string below:

    Terminal window
    flexai dataset push <dataset_name> \
    --storage-provider aws-storage-conn-eu \
    --source-path <s3_bucket_name>/<s3_object_key>

    For instance, creating an FlexAI Dataset named s3-dataset-audio from an Amazon S3 bucket named data-sets with the object key files/wav-files using the aws-storage-conn-eu Storage Provider Connection would look like this:

    Terminal window
    flexai dataset push s3-dataset-audio \
    --storage-provider aws-storage-conn-eu \
    --source-path data-sets/files/wav-files
  4. Monitor the Dataset Upload Progress

    Monitor the Dataset Upload Progress

    The progress of the Dataset upload can be monitored by using the inspect subcommand from flexai dataset:

    Terminal window
    flexai dataset inspect <dataset_name>

    Which for our example would look like this:

    Terminal window
    flexai dataset inspect s3-dataset-audio