Skip to main content

Hugging Face

A Storage Provider Connection for Hugging Face

To upload Datasets from Hugging Face to FCS, you first need to create a Storage Provider Connection that uses a Hugging Face token to provide access to you account. You can generate a Hugging Face Token by visiting your Hugging Face account settings page.

Storing your Hugging Face Token

To store your Hugging Face access token as an FCS Secret, run the following command:

flexai secret create huggingface-token

You will be prompted to enter your Hugging Face access token. Once entered, hit Enter, and the Secret huggingface-token will be created.

Creating the Storage Provider Connection

With the Hugging Face access token stored as an FCS Secret, you can now create a Storage Provider Connection for Hugging Face using the flexai storage command as shown below:

flexai storage create <storage_provider_connection_name> \
--provider huggingface \
--hf-token-name <name_of_the_secret_with_the_hugging_face_token>

For example, creating a Storage Provider Connection named huggingface-conn that uses the token stored in the huggingface-token FCS Secret would look like this:

flexai storage create huggingface-conn \
--provider huggingface \
--hf-token-name huggingface-token

After running the command, the Storage Provider Connection huggingface-conn will be created.

Uploading Datasets from Hugging Face to FCS

Now you can use your newly created huggingface-conn Storage Provider Connection to upload Datasets from Hugging Face directly to FCS by using the flexai dataset push command as shown below:

flexai dataset push <fcs_dataset_name> \
--storage-provider huggingface-conn \
--source-path <huggingface_organization>/<dataset_name>
note

The <dataset_name> must follow the FCS resource naming conventions.

Using the following example dataset: https://huggingface.co/Diginsa/Plant-Disease-Detection-Project

flexai dataset push plant_disease_dataset \
--storage-provider huggingface-conn \
--source-path Diginsa/Plant-Disease-Detection-Project

FlexAI will create a new Dataset named plant_disease_dataset that will be asynchronously synced with the contents of https://huggingface.co/Diginsa/Plant-Disease-Detection-Project.

Monitoring the Dataset Upload Progress

The progress of the Dataset upload can be monitored by using the inspect subcommand from flexai dataset:

flexai dataset inspect plant_disease_dataset

You can use the same Remote Storage Connection to upload any other dataset hosted on Hugging Face that you have access to.