AWS S3
A Storage Provider Connection for Amazon S3
To upload Datasets from Amazon S3 to FCS, you first need to create a Storage Provider Connection that holds the necessary information to connect to your Amazon S3 bucket. You will find this entry from the AWS Security Blog useful: How to quickly find and update your access keys [...].
Prerequisites
Before creating a Storage Provider Connection, you need to have the following information at hand:
- Your Amazon S3 Secret Access Key
- Your Amazon S3 Access Key ID
- The Amazon S3 region
- The endpoint URL associated with your Amazon S3 region
Storing your Amazon S3 Secret Access Key
To store your Amazon S3 Secret Access Key as an FCS Secret, run the following command:
flexai secret create s3-secret-access-key
You will be prompted to enter your Amazon S3 Secret Access Key (of course you can paste it in!). Once you have entered it, hit Enter, and the Secret s3-secret-access-key
will be created.
Creating the Storage Provider Connection
With the Amazon S3 Secret Access Key stored as an FCS Secret, you can now create a Storage Provider Connection for Amazon S3 using the flexai storage
command by following the example shown by the command string below:
flexai storage create <storage_provider_connection_name> \
--provider s3 \
--region <s3_region> \
--endpoint <s3_endpoint> \
--access-key-id <access_key_id> \
--secret-access-key-name <name_of_the_secret_with_the_secret_access_key>
Note that the value of
--endpoint
will depend on the region where your Amazon S3 bucket is located. You can find the official list of Amazon S3 endpoints here.
A Remote Storage Connection for an Amazon S3 bucket located in the eu-west-1
region with the endpoint s3.eu-west-1.amazonaws.com
and an Access Key ID AKIAIOSFODIN7AAF89GU
would look like this:
flexai storage create aws-storage-conn-eu \
--provider s3 \
--region eu-west-1 \
--endpoint s3.eu-west-1.amazonaws.com \
--access-key-id AKIAIOSFODIN7AAF89GU \
--secret-access-key-name s3-secret-access-key
After running the command, the Storage Provider Connection aws-storage-conn-eu
will be created.
Uploading Datasets from Amazon S3 to FCS
Now you can use your newly created aws-storage-conn-eu
Storage Provider Connection to upload Datasets from an Amazon S3 bucket directly to FCS by using the flexai dataset push
command as shown by the command string below:
flexai dataset push <dataset_name> \
--storage-provider aws-storage-conn-eu \
--source-path <s3_bucket_name>/<s3_object_key>
Which will look like this when uploading a dataset named leaf-pictures-train
from an Amazon S3 bucket named my-bucket
with the object key datasets/coffee-leaf-diseases--train
to which we're connected via the aws-storage-conn-eu
Storage Provider Connection:
flexai dataset push leaf-pictures-train \
--storage-provider aws-storage-conn-eu \
--source-path my-bucket/datasets/coffee-leaf-diseases--train
After running the command, the dataset leaf-pictures-train
will begin to be synced by asynchronously copying the contents of the Amazon S3 bucket resource my-bucket
/datasets/coffee-leaf-diseases--train
into the root of the Dataset.
You can use the same Remote Storage Connection to upload multiple datasets from the same or different Amazon S3 buckets.
Monitoring the Dataset Upload Progress
The progress of the Dataset upload can be monitored by using the inspect
subcommand flexai dataset
:
flexai dataset inspect <dataset_name>
Which for our example would look like this:
flexai dataset inspect leaf-pictures-train
You can use the same Remote Storage Connection to upload multiple datasets from the same or different Amazon S3 buckets.