From a Remote Storage Provider
Uploading Datasets from Remote Storage Providers
Section titled “Uploading Datasets from Remote Storage Providers”Here you will find instructions on how to upload datasets from some of the Remote Storage Providers supported by FlexAI.
To upload Datasets from Amazon S3 to FlexAI, you first need to create a Storage Provider Connection that holds the necessary information to connect to your Amazon S3 bucket.
You will find this entry from the AWS Security Blog useful: How to quickly find and update your access keys […] 🔗.
You will need the following:
- Your Amazon S3 Secret Access Key
- Your Amazon S3 Access Key ID
- The Amazon S3 region
- The endpoint URL associated with your Amazon S3 region
-
Store Your Credentials using the FlexAI Secret Manager
Store your Amazon S3 Secret Access Key
To store your Amazon S3 Secret Access Key as an FlexAI Secret, run the following command:
Terminal window flexai secret create s3-secret-access-keyYou will be prompted to enter your Amazon S3 Secret Access Key (of course you can paste it in!). Once you have entered it, hit Enter, and the Secret
s3-secret-access-key
will be created. -
Create the Storage Provider Connection
Create the Storage Provider Connection
With the Amazon S3 Secret Access Key stored as an FlexAI Secret, you can now create a Storage Provider Connection for Amazon S3 using the
flexai storage
command by following the example shown by the command string below:Terminal window flexai storage create <storage_provider_connection_name> \--provider s3 \--region <s3_region> \--endpoint <s3_endpoint> \--access-key-id <access_key_id> \--secret-access-key-name <name_of_the_secret_with_the_secret_access_key>Note that the value of
--endpoint
will depend on the region where your Amazon S3 bucket is located. You can find the official list of Amazon S3 endpoints here 🔗.A Remote Storage Connection for an Amazon S3 bucket located in the
eu-west-1
region with the endpoints3.eu-west-1.amazonaws.com
and an Access Key IDAKIAIOSFODIN7AAF89GU
would look like this:Terminal window flexai storage create aws-storage-conn-eu \--provider s3 \--region eu-west-1 \--endpoint s3.eu-west-1.amazonaws.com \--access-key-id AKIAIOSFODIN7AAF89GU \--secret-access-key-name s3-secret-access-key -
Upload Datasets from Amazon S3 to FlexAI
Upload Datasets from Amazon S3 to FlexAI
Now you can use your newly created
aws-storage-conn-eu
Storage Provider Connection to upload Datasets from an Amazon S3 bucket directly to FlexAI by using theflexai dataset push
command as shown by the command string below:Terminal window flexai dataset push <dataset_name> \--storage-provider aws-storage-conn-eu \--source-path <s3_bucket_name>/<s3_object_key>For instance, creating an FlexAI Dataset named
s3-dataset-audio
from an Amazon S3 bucket nameddata-sets
with the object keyfiles/wav-files
using theaws-storage-conn-eu
Storage Provider Connection would look like this:Terminal window flexai dataset push s3-dataset-audio \--storage-provider aws-storage-conn-eu \--source-path data-sets/files/wav-files -
Monitor the Dataset Upload Progress
Monitor the Dataset Upload Progress
The progress of the Dataset upload can be monitored by using the
inspect
subcommand fromflexai dataset
:Terminal window flexai dataset inspect <dataset_name>Which for our example would look like this:
Terminal window flexai dataset inspect s3-dataset-audio
To upload Datasets from Google Cloud Storage (GCS) to FlexAI, you first need to create a Storage Provider Connection that holds the necessary information to connect to your GCS bucket. Setting up a Storage Provider Connection for GCS leverages Google Cloud Platform’s Service Account JSON key file.
You can find more information on how to create a Service Account JSON key file in the Google Cloud documentation 🔗 (Pick the Console tab for instructions on how to obtain it using the GCP web console).
You will need the following:
- A Google Cloud Service Account JSON key file with sufficient permissions to access the target GCS bucket.
-
Store Your Credentials using the FlexAI Secret Manager
Store Your Google Cloud Service Account Key
To store your Google Cloud Service Account JSON key file as an FlexAI Secret, run the following command:
Terminal window cat <service_account_key.json> | flexai secret create <secret_name> --value-stdinFor example, if your Google Cloud Service Account JSON key file is named
gcp-service-account.json
, you can store it as an FlexAI Secret namedgcp-sa
by running the following command:Terminal window cat gcp-service-account.json | flexai secret create gcp-sa --value-stdinThis command reads the contents of your Google Cloud Service Account JSON key file and securely stores its entirety as an FlexAI Secret named
gcp-sa
. -
Create the Storage Provider Connection
Create the Storage Provider Connection
With the Google Cloud Service Account JSON key file stored as an FlexAI Secret, you can now create a Storage Provider Connection for GCS using the
flexai storage
command as shown by the example below:Terminal window flexai storage create <storage_provider_connection_name> \--provider gcs \--service-account-file-name <secret_with_the_service_account_key_json_file_contents>For example, creating a Storage Provider Connection named
gcs-conn
that has the Service Account Key JSON details stored in thegcp-sa
FlexAI Secret would look like this:Terminal window flexai storage create gcs-conn \--provider gcs \--service-account-file-name gcp-saAfter running the command, the Storage Provider Connection
gcs-conn
will be created. -
Upload Datasets from Google Cloud Storage to FlexAI
Upload Datasets from Google Cloud Storage to FlexAI Now you can use the
gcs-conn
Storage Provider Connection to upload Datasets from a GCS bucket to FlexAI by using theflexai dataset push
command as shown by the command string below:Terminal window flexai dataset push <dataset_name> \--storage-provider <storage_provider_connection_name> \--source-path <gcs_bucket_name>/<gcs_object_key>For instance, creating an FlexAI Dataset named
gcs-dataset-audio
from a GCS bucket nameddata-sets
with the object keyfiles/wav-files
using thegcs-conn
Storage Provider Connection would look like this:Terminal window flexai dataset push gcs-dataset-audio \--storage-provider gcs-conn \--source-path data-sets/files/wav-filesAfter running the command, the dataset
gcs-dataset-audio
will begin to be synced by asynchronously copying the contents of the GCS bucket resourcedata-sets/files/wav-files
into the root of the Dataset. -
Monitor the Dataset Upload Progress
Monitor the Dataset Upload Progress
The progress of the Dataset upload can be monitored by using the
inspect
subcommand fromflexai dataset
:Terminal window flexai dataset inspect <dataset_name>Which for our example would look like this:
Terminal window flexai dataset inspect gcs-dataset-audio
To upload Datasets from Hugging Face Hub to FlexAI, you first need to create a Storage Provider Connection that holds the necessary information to connect to your Hugging Face Hub account.
Setting up a Storage Provider Connection for Hugging Face Hub requires a Hugging Face Access Token 🔗.
You can find more information on how to create a Hugging Face Access Token in the Hugging Face documentation 🔗.
-
Store Your Hugging Face Access Token using the FlexAI Secret Manager
To store your Hugging Face Access Token as an FlexAI Secret, run the following command:
Terminal window flexai secret create hf-access-tokenYou will be prompted to enter your Hugging Face Access Token. Once you have entered it, hit Enter, and the Secret
hf-access-token
will be created. -
Create the Storage Provider Connection
With the Hugging Face Access Token stored as an FlexAI Secret, you can now create a Storage Provider Connection for Hugging Face Hub using the
flexai storage
command as shown below:Terminal window flexai storage create hf-conn \--provider huggingface \--hf-token-name hf-access-tokenAfter running the command, the Storage Provider Connection
hf-conn
will be created. -
Upload Datasets from Hugging Face Hub to FlexAI
Now you can use the
hf-conn
Storage Provider Connection to upload Datasets from Hugging Face Hub to FlexAI by using theflexai dataset push
command as shown below:Terminal window flexai dataset push plant-disease-dataset \--storage-provider hf-conn \--source-path Diginsa/Plant-Disease-Detection-Project/datasetThis example creates an FlexAI Dataset named
plant-disease-dataset
from the Hugging Face repositoryDiginsa/Plant-Disease-Detection-Project
using thehf-conn
Storage Provider Connection, pulling the contents from thedataset
directory in the repo.After running the command, the dataset
plant-disease-dataset
will begin to be synced by asynchronously copying the contents from Hugging Face Hub into the root of the Dataset. -
Monitor the Dataset Upload Progress
The progress of the Dataset upload can be monitored by using the
inspect
subcommand fromflexai dataset
:Terminal window flexai dataset inspect plant-disease-dataset