This experiment is temporarily disabled.
wikitext dataset using the GPT-2 model.
You will see that this straightforward process only requires two components: a training script and a dataset. The training script is responsible for defining the model, setting up and applying hyperparameters, running the training loop, and applying its respective evaluation logic, while the dataset contains the information that will be used to train the model.
Connect to GitHub (if needed)
If you haven’t already connected FlexAI to GitHub, you’ll need to set up a code registry connection:This will allow FlexAI to pull repositories directly from GitHub using the
-u flag in training commands.Preparing the Dataset
In this experiment, we will use a pre-processed version of the the
wikitext dataset that has been set up for the GPT-2 model.If you’d like to reproduce the pre-processing steps yourself to use a different dataset or simply to learn more about the process, you can refer to the Manual Dataset Pre-processing section below.
-
Download the dataset:
-
Upload the dataset (located in
gpt2-tokenized-wikitext/) to FlexAI Storage as a new dataset:
Train the Model
Now, it’s time to train your LLM on the dataset you just pushed in the previous step, The first line defines the 3 main components required to run a Training Job in FlexAI Storage:
gpt2-tokenized-wikitext. This experiment uses the GPT-2 model, however, the training script we will use leverages the HuggingFace Transformers Trainer class, which makes it easy to replace GPT-2 with another model compatible with flash-attention.To start the Training Job, run the following command:- The Training Job’s name (
flexai-experiments-flash-attention). - The URL of the repository containing the training script (
https://github.com/flexaihq/blueprints). - The name of the dataset to be used (
gpt2-tokenized-wikitext).
code/causal-language-modeling/train.py).After the second line come the script’s arguments, which are passed to the script when it is executed to adjust the Training Job hyperparameters or customize its behavior. For instance, --max_train_samples and --max_eval_samples can be used to tweak the sample size.Checking up on the Training Job
You can check the status and life cycle events of your Training Job by running:Additionally, you can view the logs of your Training Job by running:
Fetching the Trained Model artifacts
Once the Training Job completes successfully, you will be able to download its output artifacts by running:This will download a
zip file containing the trained model artifacts to your current working directory.You can now have a look at other Experiments within this repository to explore other use cases and techniques.Optional Extra Steps
Manual Dataset Pre-processing
To prepare and save thewikitext dataset for the GPT-2 model run the following command:
--tokenized_dataset_save_dir, in this case: gpt2-tokenized-wikitext.
Keep in mind that you can use other combinations of datasets and models available on HuggingFace.