Create a Remote Storage Connection
Creating a Remote Storage Connection involves two main steps:
-
Store your access credentials securely using the FlexAI Secret Manager.
-
Create a Remote Storage Connection that uses the stored credentials.
Storing your Access Credentials
Section titled “Storing your Access Credentials”Visit the Create Secret 🔗 page in the FlexAI Console to create a new secret.
Note that currently, the FlexAI Console only supports creating secrets with text values. If you need to create a secret with a file value (e.g., a JSON key file for Google Cloud Storage), please use the FlexAI CLI instead.
To upload Datasets from Amazon S3 to FlexAI, you first need to create a Storage Provider Connection that holds the necessary information to connect to your Amazon S3 bucket.
You will find this entry from the AWS Security Blog useful: How to quickly find and update your access keys […] 🔗.
You will need the following:
- Your Amazon S3 Secret Access Key
- Your Amazon S3 Access Key ID
- The Amazon S3 region
- The endpoint URL associated with your Amazon S3 region
-
Store Your Credentials using the FlexAI Secret Manager
To store your Amazon S3 Secret Access Key as an FlexAI Secret, run the following command:
Terminal window flexai secret create s3-secret-access-keyYou will be prompted to enter your Amazon S3 Secret Access Key (of course you can paste it in!). Once you have entered it, hit Enter, and the Secret
s3-secret-access-keywill be created.
To upload Datasets from Google Cloud Storage (GCS) to FlexAI, you first need to create a Storage Provider Connection that holds the necessary information to connect to your GCS bucket. Setting up a Storage Provider Connection for GCS leverages Google Cloud Platform’s Service Account JSON key file.
You can find more information on how to create a Service Account JSON key file in the Google Cloud documentation 🔗 (Pick the Console tab for instructions on how to obtain it using the GCP web console).
You will need the following:
- A Google Cloud Service Account JSON key file with sufficient permissions to access the target GCS bucket.
-
Store Your Credentials using the FlexAI Secret Manager
To store your Google Cloud Service Account JSON key file as an FlexAI Secret, run the following command:
Terminal window cat <service_account_key_file_path> | flexai secret create <secret_name> --value-stdinFor example, if your Google Cloud Service Account JSON key file is named
gcp-service-account.json, you can store it as an FlexAI Secret namedgcp-saby running the following command:Terminal window cat gcp-service-account.json | flexai secret create gcp-sa --value-stdinThis command reads the contents of your Google Cloud Service Account JSON key file and securely stores its entirety as an FlexAI Secret named
gcp-sa.
To upload Datasets from Hugging Face Hub to FlexAI, you first need to create a Storage Provider Connection that holds the necessary information to connect to your Hugging Face Hub account.
Setting up a Storage Provider Connection for Hugging Face Hub requires a Hugging Face Access Token 🔗.
You can find more information on how to create a Hugging Face Access Token in the Hugging Face documentation 🔗.
-
Store Your Hugging Face Access Token using the FlexAI Secret Manager
To store your Hugging Face Access Token as an FlexAI Secret, run the following command:
Terminal window flexai secret create hf_tokenYou will be prompted to enter your Hugging Face Access Token. Once you have entered it, hit Enter, and the Secret
hf_tokenwill be created.
Creating a Remote Storage Connection
Section titled “Creating a Remote Storage Connection”Currently, creating Remote Storage Connections via the FlexAI Console is not supported. Please use the FlexAI CLI instead.
-
Create the Storage Provider Connection
With the Amazon S3 Secret Access Key stored as an FlexAI Secret, you can now create a Storage Provider Connection for Amazon S3 using the
flexai storagecommand by following the example shown by the command string below:Terminal window flexai storage create <storage_provider_connection_name> \--provider s3 \--region <s3_region> \--endpoint <s3_endpoint> \--access-key-id <access_key_id> \--secret-access-key-name <name_of_the_secret_with_the_secret_access_key>Note that the value of
--endpointwill depend on the region where your Amazon S3 bucket is located. You can find the official list of Amazon S3 endpoints here 🔗.A Remote Storage Connection for an Amazon S3 bucket located in the
eu-west-1region with the endpoints3.eu-west-1.amazonaws.comand an Access Key IDAKIAIOSFODIN7AAF89GUwould look like this:Terminal window flexai storage create aws-storage-conn-eu \--provider s3 \--region eu-west-1 \--endpoint s3.eu-west-1.amazonaws.com \--access-key-id AKIAIOSFODIN7AAF89GU \--secret-access-key-name s3-secret-access-key -
Upload Datasets from Amazon S3 to FlexAI
Now you can use your newly created
aws-storage-conn-euStorage Provider Connection to upload Datasets from an Amazon S3 bucket directly to FlexAI by using theflexai dataset pushcommand as shown by the command string below:Terminal window flexai dataset push <dataset_name> \--storage-provider aws-storage-conn-eu \--source-path <s3_bucket_name>/<s3_object_key>For instance, creating an FlexAI Dataset named
s3-dataset-audiofrom an Amazon S3 bucket nameddata-setswith the object keyfiles/wav-filesusing theaws-storage-conn-euStorage Provider Connection would look like this:Terminal window flexai dataset push s3-dataset-audio \--storage-provider aws-storage-conn-eu \--source-path data-sets/files/wav-files -
Monitor the Dataset Upload Progress
The progress of the Dataset upload can be monitored by using the
inspectsubcommand fromflexai dataset:Terminal window flexai dataset inspect <dataset_name>Which for our example would look like this:
Terminal window flexai dataset inspect s3-dataset-audio
-
Create the Storage Provider Connection
With the Google Cloud Service Account JSON key file stored as an FlexAI Secret, you can now create a Storage Provider Connection for GCS using the
flexai storagecommand as shown by the example below:Terminal window flexai storage create <storage_provider_connection_name> \--provider gcs \--service-account-file-name <secret_with_the_service_account_key_json_file_contents>For example, creating a Storage Provider Connection named
gcs-connthat has the Service Account Key JSON details stored in thegcp-saFlexAI Secret would look like this:Terminal window flexai storage create gcs-conn \--provider gcs \--service-account-file-name gcp-saAfter running the command, the Storage Provider Connection
gcs-connwill be created. -
Upload Datasets from Google Cloud Storage to FlexAI
Now you can use the
gcs-connStorage Provider Connection to upload Datasets from a GCS bucket to FlexAI by using theflexai dataset pushcommand as shown by the command string below:Terminal window flexai dataset push <dataset_name> \--storage-provider <storage_provider_connection_name> \--source-path <gcs_bucket_name>/<gcs_object_key>For instance, creating an FlexAI Dataset named
gcs-dataset-audiofrom a GCS bucket nameddata-setswith the object keyfiles/wav-filesusing thegcs-connStorage Provider Connection would look like this:Terminal window flexai dataset push gcs-dataset-audio \--storage-provider gcs-conn \--source-path data-sets/files/wav-filesAfter running the command, the dataset
gcs-dataset-audiowill begin to be synced by asynchronously copying the contents of the GCS bucket resourcedata-sets/files/wav-filesinto the root of the Dataset. -
Monitor the Dataset Upload Progress
The progress of the Dataset upload can be monitored by using the
inspectsubcommand fromflexai dataset:Terminal window flexai dataset inspect <dataset_name>Which for our example would look like this:
Terminal window flexai dataset inspect gcs-dataset-audio
-
Create the Storage Provider Connection
With the Hugging Face Access Token stored as an FlexAI Secret, you can now create a Storage Provider Connection for Hugging Face Hub using the
flexai storagecommand as shown below:Terminal window flexai storage create hf-conn \--provider huggingface \--hf-token-name hf_tokenAfter running the command, the Storage Provider Connection
hf-connwill be created. -
Bring a Dataset from the Hugging Face Hub to FlexAI
Now you can use the
hf-connStorage Provider Connection to push a Hugging Face Hub Dataset to to the FlexAI DatasetManager by using theflexai dataset pushcommand as shown below:Terminal window flexai dataset push hf-finepdfs \--storage-provider hf-conn \--source-path HuggingFaceFW/finepdfsThis example creates an FlexAI Dataset named
hf-finepdfsfrom the Hugging Face repositoryHuggingFaceFW/finepdfsusing thehf-connStorage Provider Connection, pulling the contents from thedatasetdirectory in the repo.After running the command, the dataset
hf-finepdfswill begin to be synced by asynchronously copying the contents from Hugging Face Hub into the root of the Dataset. -
Monitor the Dataset Upload Progress
The progress of the Dataset upload can be monitored by using the
inspectsubcommand fromflexai dataset:Terminal window flexai dataset inspect hf-finepdfs
Next Steps
Section titled “Next Steps”Monitoring the push process of Datasets and Checkpoints: