> ## Documentation Index
> Fetch the complete documentation index at: https://docs.flex.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Uploading Datasets from your local machine

> Upload dataset files from your local machine to FlexAI

## Uploading files

<Tabs>
  <Tab title="Using the FlexAI Console">
    Navigate to the Dataset Manager in the FlexAI Console:

    <Steps>
      <Step title="Visit the Add Dataset section">
        Visit the ["Add Dataset" section of the FlexAI Console](https://console.flex.ai/datasets/new)
      </Step>

      <Step title="Enter a name for your dataset">
        Enter a name for your dataset: `nanoGPT-dataset`
      </Step>

      <Step title="Select the Local option">
        Select the "Local" option for "Upload Origin"
      </Step>

      <Step title="Select the Upload Item button">
        Select the `+ Upload Item` button to open the "Upload Items" dialog
      </Step>
    </Steps>
  </Tab>

  <Tab title="Using the FlexAI CLI">
    If your Datasets are stored on your machine or local network, you can upload them to the FlexAI Dataset Manager by using the FlexAI CLI's `dataset push` subcommand:

    ```bash theme={null}
    flexai dataset push <dataset_name> (--file <source_path>=<path_in_flexai_dataset> ... | --file <source_path> ...)
    ```

    ### The `--file` flag

    The `dataset push` command's `-f`/`--file` [flag](/cli/reference/dataset/push/#flags) offers a flexible way to upload files to a Dataset as it can be used 3 different methods.
  </Tab>
</Tabs>

## One file at a time

<Tabs>
  <Tab title="Using the FlexAI Console">
    Uploading files individually is useful when you need to specify source files that may have different locations on your source machine and/or you want to set a specific destination path for each of them:

    Let's assume the following file structure on your local machine:

    ```
    ~/
      openwebtext_mini/
        - urlsf_subset00.tar
        - urlsf_subset01.tar
        - test.tar
      sf-wikitext/
        - test-00000-of-00001.parquet
        - train-00000-of-00001.parquet
    ```

    You can upload specific files from the `openwebtext_mini` and `sf-wikitext` directories to a FlexAI Dataset named `text-records-dataset-1`, while also specifying a custom destination path for each of them—including a different file name for each—, as seen below:

    ```
    text-records-dataset-1/
      owt/
        - urlsf_subset00.tar
        - urlsf_subset01.tar
      test/
        - test.tar
        - test-00000-of-00001.parquet
      wikitext/
        - train-00000-of-00001.parquet
    ```

    You can achieve this by going through the following steps iteratively for each file:

    <Steps>
      <Step title="Select the file">
        Use the "Select file" option to open a file browser dialog
      </Step>

      <Step title="Choose the file">
        Select the file you want to upload from your local machine
      </Step>

      <Step title="Enter the destination path">
        In the "Destination Path" field, enter the desired destination path within the Dataset
      </Step>

      <Step title="Add the file">
        Select the `Add` button to confirm the file selection and destination mapping
      </Step>

      <Step title="Repeat for additional files">
        Below the file list named "Upload Items" you will find an `+ Add items` button that will open up the "Upload Items" dialog again.
      </Step>

      <Step title="Continue adding files">
        Repeat the steps above for each file you want to upload
      </Step>

      <Step title="Complete the upload">
        Finally, select the `Add Dataset` button to start the upload process.
      </Step>
    </Steps>

    <Note>
      Currently, the FlexAI Console does not offer the ability to set a custom destination file name. If you require that feature, please refer to the 'Using the FlexAI CLI' instructions instead.
    </Note>
  </Tab>

  <Tab title="Using the FlexAI CLI">
    Uploading files individually is useful when you need to specify source files that may have different locations on your source machine and/or you want to set a specific destination path for each of them:

    Let's assume the following file structure on your local machine:

    ```
    ~/
      openwebtext_mini/
        - urlsf_subset00.tar
        - urlsf_subset01.tar
        - test.tar
      sf-wikitext/
        - test-00000-of-00001.parquet
        - train-00000-of-00001.parquet
    ```

    You can upload specific files from the `openwebtext_mini` and `sf-wikitext` directories to a FlexAI Dataset named `text-records-dataset-1`, while also specifying a custom destination path for each of them—including a different file name for each—, by running the following command:

    ```bash theme={null}
    flexai dataset push text-records-dataset-1 \
      --file openwebtext_mini/urlsf_subset00.tar=owt/0.tar \
      --file openwebtext_mini/urlsf_subset01.tar=owt/1.tar \
      --file openwebtext_mini/test.tar=test/owt.tar \
      --file sf-wikitext/test-00000-of-00001.parquet=test/wikitext.parquet \
      --file sf-wikitext/train-00000-of-00001.parquet=wikitext/train.parquet
    ```

    Here you have the ability to pick individual files from your local machine and set the specific destination path for each of them: some files come from the local `openwebtext_mini` directory, others from `sf-wikitext`, and in both cases, the test files will be uploaded to the Dataset's `test` directory, as shown below:

    ```
    ~/
      text-records-dataset-1/
        owt/
          - 0.tar
          - 1.tar
        test/
          - owt.tar
          - wikitext.parquet
        wikitext/
          - train.parquet
    ```

    Pushing files this way can begin to become a cumbersome task as the list of files continues to grow, fortunately you can automate the process by using a script or a loop to upload multiple files following the patterns that fit your needs. However, if having a specific Dataset structure is not a requirement, you can upload files without specifying a destination path.
  </Tab>
</Tabs>

## Multiple files, no defined destination path

<Tabs>
  <Tab title="Using the FlexAI Console">
    Considering the example above, you could decide to simply upload the files without specifying a destination path. This would result in the files being moved up to the root directory of the FlexAI Dataset:

    Since none of the files names are the same, they won't overwrite each other, ending up in a FlexAI Dataset structure that looks as follows:

    ```
    text-records-dataset-2/
      - urlsf_subset00.tar
      - urlsf_subset01.tar
      - test.tar
      - test-00000-of-00001.parquet
      - train-00000-of-00001.parquet
    ```

    <Note>
      All intermediate directories are disregarded when uploading files without specifying a destination path.
    </Note>

    However, when picking and choosing files to upload is not required for your data workflow, then you can use the third method: a bulk upload of files in a directory.
  </Tab>

  <Tab title="Using the FlexAI CLI">
    Considering the example above, you could decide to simply upload the files without specifying a destination path. This would result in the files being moved up to the root directory of the FlexAI Dataset:

    ```bash theme={null}
    flexai dataset push text-records-dataset-2 \
      --file openwebtext_mini/urlsf_subset00.tar \
      --file openwebtext_mini/urlsf_subset01.tar \
      --file openwebtext_mini/test.tar \
      --file sf-wikitext/test-00000-of-00001.parquet \
      --file sf-wikitext/train-00000-of-00001.parquet
    ```

    Since none of the files names are the same, they won't overwrite each other, ending up in a FlexAI Dataset structure that looks as follows:

    ```
    text-records-dataset-2/
      - urlsf_subset00.tar
      - urlsf_subset01.tar
      - test.tar
      - test-00000-of-00001.parquet
      - train-00000-of-00001.parquet
    ```

    <Note>
      All intermediate directories are disregarded when uploading files without specifying a destination path.
    </Note>

    However, when picking and choosing files to upload is not required for your data workflow, then you can use the third method: a bulk upload of files in a directory.
  </Tab>
</Tabs>

## Entire directory contents

<Tabs>
  <Tab title="Using the FlexAI Console">
    The `--file` flag also allows you to push the contents of a directory into a Dataset. This is particularly useful when you already have a directory containing the multiple files that make up your dataset.

    Let's assume the following file structure on your local machine:

    ```
    ~/
      my-dataset/
        train/
          - t_1.txt
          - t_2.txt
          - t_3.txt
          deep-text/
            - t_1.txt
            - t_2.txt
        test/
          - test_1.txt
          - test_2.txt
          deep-text_test
            - t_1.txt
    ```

    > Yes, file names have been deliberately kept similar to show how pushing a entire directory with nested sub-directories is handled (no overwrite risk!).

    Uploading the contents of the `my-dataset` directory to a Dataset named `text-records-dataset-3` would follow the same pattern as before:

    Resulting in the following FlexAI Dataset structure:

    ```
    text-records-dataset-3/
      train/
        - t_1.txt
        - t_2.txt
        - t_3.txt
        deep-text/
          - t_1.txt
          - t_2.txt
      test/
        - test_1.txt
        - test_2.txt
        deep-text_test/
          - t_1.txt
    ```

    <Note>
      Notice that the source `my-dataset` directory is not included in the FlexAI Dataset structure, only its contents are uploaded, the file structure is preserved.
    </Note>
  </Tab>

  <Tab title="Using the FlexAI CLI">
    The `--file` flag also allows you to push the contents of a directory into a Dataset. This is particularly useful when you already have a directory containing the multiple files that make up your dataset.

    Let's assume the following file structure on your local machine:

    ```
    ~/
      my-dataset/
        train/
          - t_1.txt
          - t_2.txt
          - t_3.txt
          deep-text/
            - t_1.txt
            - t_2.txt
        test/
          - test_1.txt
          - test_2.txt
          deep-text_test
            - t_1.txt
    ```

    > Yes, file names have been deliberately kept similar to show how pushing a entire directory with nested sub-directories is handled (no overwrite risk!).

    Uploading the contents of the `my-dataset` directory to a Dataset named `text-records-dataset-3` would follow the same pattern as before:

    ```bash theme={null}
    flexai dataset push text-records-dataset-3 --file my-dataset
    ```

    Resulting in the following FlexAI Dataset structure:

    ```
    text-records-dataset-3/
      train/
        - t_1.txt
        - t_2.txt
        - t_3.txt
        deep-text/
          - t_1.txt
          - t_2.txt
      test/
        - test_1.txt
        - test_2.txt
        deep-text_test/
          - t_1.txt
    ```

    <Note>
      Notice that the source `my-dataset` directory is not included in the FlexAI Dataset structure, only its contents are uploaded, the file structure is preserved.
    </Note>
  </Tab>
</Tabs>

## Troubleshooting

### Pushing large files

Some times you may encounter issues when trying to upload large files directly from your computer. This kind of problem is usually related to network issues. There are a few things you can try to solve this issue:

* Switch to a wired connection if possible.
* Split the file into smaller parts or chunks that you can then join back together at runtime.
* If the file is stored in a Cloud Storage Service such as Amazon S3 or Google Cloud Storage, you can upload directly to the FlexAI Dataset Manager by creating a [Remote Storage Provider Connection](/platform-services/dataset-manager/from-remote/).

#### The process fails when uploading files from a remote machine you're connected to via SSH

If you are trying to upload files from a machine that you're connected to via SSH, the process may fail due to the SSH connection being closed. To avoid this, you can use the a *terminal multiplexer*, like [`screen`](https://www.gnu.org/software/screen/) or [`tmux`](https://github.com/tmux/tmux/wiki), to keep the process running even after you close the remote session.
