> ## Documentation Index
> Fetch the complete documentation index at: https://docs.flex.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Step by step: Uploading a Dataset From a Remote Storage Provider

> Step-by-step instructions for uploading datasets from remote storage

Here you will find instructions on how to upload datasets from some of the Remote Storage Providers supported by FlexAI.

<Tabs>
  <Tab title="Amazon S3">
    To upload Datasets from Amazon S3 to FlexAI, you first need to create a Storage Provider Connection that holds the necessary information to connect to your Amazon S3 bucket.

    You will find this entry from the AWS Security Blog useful: [How to quickly find and update your access keys \[...\]](https://aws.amazon.com/blogs/security/how-to-find-update-access-keys-password-mfa-aws-management-console/).

    You will need the following:

    * Your Amazon S3 Secret Access Key
    * Your Amazon S3 Access Key ID
    * The Amazon S3 region
    * The endpoint URL associated with your Amazon S3 region

    <Steps>
      1. Store Your Credentials using the FlexAI Secret Manager

         Store your Amazon S3 Secret Access Key

         To store your Amazon S3 Secret Access Key as an FlexAI Secret, run the following command:

         ```bash theme={null}
         flexai secret create s3-secret-access-key
         ```

         You will be prompted to enter your Amazon S3 Secret Access Key (of course you can paste it in!). Once you have entered it, hit Enter, and the Secret `s3-secret-access-key` will be created.

      2. Create the Storage Provider Connection

         Create the Storage Provider Connection

         With the Amazon S3 Secret Access Key stored as an FlexAI Secret, you can now create a Storage Provider Connection for Amazon S3 using the `flexai storage` command by following the example shown by the command string below:

         ```bash theme={null}
         flexai storage create <storage_provider_connection_name> \
           --provider s3 \
           --region <s3_region> \
           --endpoint <s3_endpoint> \
           --access-key-id <access_key_id> \
           --secret-access-key-name <name_of_the_secret_with_the_secret_access_key>
         ```

         > Note that the value of `--endpoint` will depend on the *region* where your Amazon S3 bucket is located. You can find the official [list of Amazon S3 endpoints here](https://docs.aws.amazon.com/general/latest/gr/s3.html).

         A Remote Storage Connection for an Amazon S3 bucket located in the `eu-west-1` region with the endpoint `s3.eu-west-1.amazonaws.com` and an Access Key ID `AKIAIOSFODIN7AAF89GU` would look like this:

         ```bash theme={null}
         flexai storage create aws-storage-conn-eu \
           --provider s3 \
           --region eu-west-1 \
           --endpoint s3.eu-west-1.amazonaws.com \
           --access-key-id AKIAIOSFODIN7AAF89GU \
           --secret-access-key-name s3-secret-access-key
         ```

      3. Upload Datasets from Amazon S3 to FlexAI

         Upload Datasets from Amazon S3 to FlexAI

         Now you can use your newly created `aws-storage-conn-eu` Storage Provider Connection to upload Datasets from an Amazon S3 bucket directly to FlexAI by using the `flexai dataset push` command as shown by the command string below:

         ```bash theme={null}
         flexai dataset push <dataset_name> \
           --storage-provider aws-storage-conn-eu \
           --source-path <s3_bucket_name>/<s3_object_key>
         ```

         For instance, creating an FlexAI Dataset named `s3-dataset-audio` from an Amazon S3 bucket named `data-sets` with the object key `files/wav-files` using the `aws-storage-conn-eu` Storage Provider Connection would look like this:

         ```bash theme={null}
         flexai dataset push s3-dataset-audio \
           --storage-provider aws-storage-conn-eu \
           --source-path data-sets/files/wav-files
         ```

      4. Monitor the Dataset Upload Progress

         Monitor the Dataset Upload Progress

         The progress of the Dataset upload can be monitored by using the `inspect` subcommand from `flexai dataset`:

         ```bash theme={null}
         flexai dataset inspect <dataset_name>
         ```

         Which for our example would look like this:

         ```bash theme={null}
         flexai dataset inspect s3-dataset-audio
         ```
    </Steps>
  </Tab>

  <Tab title="Google Cloud Storage">
    To upload Datasets from Google Cloud Storage (GCS) to FlexAI, you first need to create a Storage Provider Connection that holds the necessary information to connect to your GCS bucket. Setting up a Storage Provider Connection for GCS leverages Google Cloud Platform's Service Account JSON key file.

    You can find more information on how to create a Service Account JSON key file in the [Google Cloud documentation](https://cloud.google.com/iam/docs/keys-create-delete#iam-service-account-keys-create-console) (Pick the Console tab for instructions on how to obtain it using the GCP web console).

    You will need the following:

    * A Google Cloud Service Account JSON key file with sufficient permissions to access the target GCS bucket.

    <Steps>
      1. Store Your Credentials using the FlexAI Secret Manager

         Store Your Google Cloud Service Account Key

         To store your Google Cloud Service Account JSON key file as an FlexAI Secret, run the following command:

         ```bash theme={null}
         cat &lt;service_account_key.json&gt; | flexai secret create &lt;secret_name&gt; --value-stdin
         ```

         For example, if your Google Cloud Service Account JSON key file is named `gcp-service-account.json`, you can store it as an FlexAI Secret named `gcp-sa` by running the following command:

         ```bash theme={null}
         cat gcp-service-account.json | flexai secret create gcp-sa --value-stdin
         ```

         This command reads the contents of your Google Cloud Service Account JSON key file and securely stores its entirety as an FlexAI Secret named `gcp-sa`.

      2. Create the Storage Provider Connection

         Create the Storage Provider Connection

         With the Google Cloud Service Account JSON key file stored as an FlexAI Secret, you can now create a Storage Provider Connection for GCS using the `flexai storage` command as shown by the example below:

         ```bash theme={null}
         flexai storage create <storage_provider_connection_name> \
           --provider gcs \
           --service-account-file-name <secret_with_the_service_account_key_json_file_contents>
         ```

         For example, creating a Storage Provider Connection named `gcs-conn` that has the Service Account Key JSON details stored in the `gcp-sa` FlexAI Secret would look like this:

         ```bash theme={null}
         flexai storage create gcs-conn \
           --provider gcs \
           --service-account-file-name gcp-sa
         ```

         After running the command, the Storage Provider Connection `gcs-conn` will be created.

      3. Upload Datasets from Google Cloud Storage to FlexAI

         Upload Datasets from Google Cloud Storage to FlexAI
         Now you can use the `gcs-conn` Storage Provider Connection to upload Datasets from a GCS bucket to FlexAI by using the `flexai dataset push` command as shown by the command string below:

         ```bash theme={null}
         flexai dataset push <dataset_name> \
           --storage-provider <storage_provider_connection_name> \
           --source-path <gcs_bucket_name>/<gcs_object_key>
         ```

         For instance, creating an FlexAI Dataset named `gcs-dataset-audio` from a GCS bucket named `data-sets` with the object key `files/wav-files` using the `gcs-conn` Storage Provider Connection would look like this:

         ```bash theme={null}
         flexai dataset push gcs-dataset-audio \
           --storage-provider gcs-conn \
           --source-path data-sets/files/wav-files
         ```

         After running the command, the dataset `gcs-dataset-audio` will begin to be synced by asynchronously copying the contents of the GCS bucket resource `data-sets/files/wav-files` into the root of the Dataset.

      4. Monitor the Dataset Upload Progress

         Monitor the Dataset Upload Progress

         The progress of the Dataset upload can be monitored by using the `inspect` subcommand from `flexai dataset`:

         ```bash theme={null}
         flexai dataset inspect <dataset_name>
         ```

         Which for our example would look like this:

         ```bash theme={null}
         flexai dataset inspect gcs-dataset-audio
         ```
    </Steps>
  </Tab>

  <Tab title="Hugging Face">
    To upload Datasets from Hugging Face Hub to FlexAI, you first need to create a Storage Provider Connection that holds the necessary information to connect to your Hugging Face Hub account.

    Setting up a Storage Provider Connection for Hugging Face Hub requires a [Hugging Face Access Token](https://huggingface.co/settings/tokens).

    You can find more information on how to create a Hugging Face Access Token in the [Hugging Face documentation](https://huggingface.co/docs/huggingface_hub/how-to-authenticate).

    <Steps>
      1. Store Your Hugging Face Access Token using the FlexAI Secret Manager

         To store your Hugging Face Access Token as an FlexAI Secret, run the following command:

         ```bash theme={null}
         flexai secret create hf_token
         ```

         You will be prompted to enter your Hugging Face Access Token. Once you have entered it, hit Enter, and the Secret `hf_token` will be created.

      2. Create the Storage Provider Connection

         With the Hugging Face Access Token stored as an FlexAI Secret, you can now create a Storage Provider Connection for Hugging Face Hub using the `flexai storage` command as shown below:

         ```bash theme={null}
         flexai storage create hf-conn \
           --provider huggingface \
           --hf-token-name hf_token
         ```

         After running the command, the Storage Provider Connection `hf-conn` will be created.

      3. Upload Datasets from Hugging Face Hub to FlexAI

         Now you can use the `hf-conn` Storage Provider Connection to upload Datasets from Hugging Face Hub to FlexAI by using the `flexai dataset push` command as shown below:

         ```bash theme={null}
         flexai dataset push plant-disease-dataset \
           --storage-provider hf-conn \
           --source-path Diginsa/Plant-Disease-Detection-Project/dataset
         ```

         This example creates an FlexAI Dataset named `plant-disease-dataset` from the Hugging Face repository `Diginsa/Plant-Disease-Detection-Project` using the `hf-conn` Storage Provider Connection, pulling the contents from the `dataset` directory in the repo.

         After running the command, the dataset `plant-disease-dataset` will begin to be synced by asynchronously copying the contents from Hugging Face Hub into the root of the Dataset.

      4. Monitor the Dataset Upload Progress

         The progress of the Dataset upload can be monitored by using the `inspect` subcommand from `flexai dataset`:

         ```bash theme={null}
         flexai dataset inspect plant-disease-dataset
         ```
    </Steps>
  </Tab>
</Tabs>
