Hugging Face
A Storage Provider Connection for Hugging Face
To upload Datasets from Hugging Face to FCS, you first need to create a Storage Provider Connection that uses a Hugging Face token to provide access to you account. You can generate a Hugging Face Token by visiting your Hugging Face account settings page.
Storing your Hugging Face Token
To store your Hugging Face access token as an FCS Secret, run the following command:
flexai secret create huggingface-token
You will be prompted to enter your Hugging Face access token. Once entered, hit Enter, and the Secret huggingface-token
will be created.
Creating the Storage Provider Connection
With the Hugging Face access token stored as an FCS Secret, you can now create a Storage Provider Connection for Hugging Face using the flexai storage
command as shown below:
flexai storage create <storage_provider_connection_name> \
--provider huggingface \
--hf-token-name <name_of_the_secret_with_the_hugging_face_token>
For example, creating a Storage Provider Connection named huggingface-conn
that uses the token stored in the huggingface-token
FCS Secret would look like this:
flexai storage create huggingface-conn \
--provider huggingface \
--hf-token-name huggingface-token
After running the command, the Storage Provider Connection huggingface-conn
will be created.
Uploading Datasets from Hugging Face to FCS
Now you can use your newly created huggingface-conn
Storage Provider Connection to upload Datasets from Hugging Face directly to FCS by using the flexai dataset push
command as shown below:
flexai dataset push <fcs_dataset_name> \
--storage-provider huggingface-conn \
--source-path <huggingface_organization>/<dataset_name>
The <dataset_name>
must follow the FCS resource naming conventions.
Using the following example dataset: https://huggingface.co/Diginsa/Plant-Disease-Detection-Project
flexai dataset push plant_disease_dataset \
--storage-provider huggingface-conn \
--source-path Diginsa/Plant-Disease-Detection-Project
FlexAI will create a new Dataset named plant_disease_dataset
that will be asynchronously synced with the contents of https://huggingface.co/Diginsa/Plant-Disease-Detection-Project.
Monitoring the Dataset Upload Progress
The progress of the Dataset upload can be monitored by using the inspect
subcommand from flexai dataset
:
flexai dataset inspect plant_disease_dataset
You can use the same Remote Storage Connection to upload any other dataset hosted on Hugging Face that you have access to.