Google Cloud Storage
A Storage Provider Connection for Google Cloud Storage
To upload Datasets from Google Cloud Storage (GCS) to FCS, you first need to create a Storage Provider Connection that holds the necessary information to connect to your GCS bucket. Setting up a Storage Provider Connection for GCS leverages Google Cloud Platform's Service Account JSON key file. You can find more information on how to create a Service Account JSON key file in the Google Cloud documentation (Pick the Console tab for instructions on how to obtain it using the GCP web console).
Prerequisites
Before creating a Storage Provider Connection, you need to have the following information at hand:
- A Google Cloud Service Account JSON key file with sufficient permissions to access the target GCS bucket.
Storing Your Google Cloud Service Account Key
To store your Google Cloud Service Account JSON key file as an FCS Secret, run the following command:
cat <service_account_key.json> | flexai secret create <secret_name> --value-stdin
For example, if your Google Cloud Service Account JSON key file is named gcp-service-account.json
, you can store it as an FCS Secret named gcp-sa
by running the following command:
cat gcp-service-account.json | flexai secret create gcp-sa --value-stdin
This command reads the contents of your Google Cloud Service Account JSON key file and securely stores its entirety as an FCS Secret named gcp-sa
.
Creating the Storage Provider Connection
With the Google Cloud Service Account JSON key file stored as an FCS Secret, you can now create a Storage Provider Connection for GCS using the flexai storage
command as shown by the example below:
flexai storage create <storage_provider_connection_name> \
--provider gcs \
--service-account-file-name <secret_with_the_service_account_key_json_file_contents>
For example, creating a Storage Provider Connection named gcs-conn
that has the Service Account Key JSON details stored in the gcp-sa
FCS Secret would look like this:
flexai storage create gcs-conn \
--provider gcs \
--service-account-file-name gcp-sa
After running the command, the Storage Provider Connection gcs-conn
will be created.
Uploading Datasets from Google Cloud Storage to FCS
Now you can use the gcs-conn
Storage Provider Connection to upload Datasets from a GCS bucket to FCS by using the flexai dataset push
command as shown below:
flexai dataset push <dataset_name> \
--storage-provider <storage_provider_connection_name> \
--source-path <gcs_bucket_name>/<gcs_object_key>
For instance, creating an FCS Dataset named gcs-dataset-audio
from a GCS bucket named data-sets
with the object key files/wav-files
using the gcs-conn
Storage Provider Connection would look like this:
flexai dataset push gcs-dataset-audio \
--storage-provider gcs-conn \
--source-path data-sets/files/wav-files
After running the command, the dataset gcs-dataset-audio
will begin to be synced by asynchronously copying the contents of the GCS bucket resource data-sets/files/wav-files
into the root of the Dataset.
Monitoring the Dataset Upload Progress
You can monitor the progress of the Dataset upload with the help of the inspect
subcommand from flexai dataset
:
flexai dataset inspect <dataset_name>
Which for our example would look like this:
flexai dataset inspect gcs-dataset-audio