notebooks/AI_HW_Summit

{ "cells": [ { "cell_type": "markdown", "id": "455917eb", "metadata": {}, "source": [ "![](../readme_logo.png)" ] }, { "cell_type": "markdown", "id": "d1356f1f", "metadata": {}, "source": [ "# Fine-tuning GPT2-XL with 🤗 Optimum Habana\n", "\n", "This notebook shows how to fine-tune GPT2-XL for causal language modeling with Optimum Habana. You can find more information in the [documentation](https://huggingface.co/docs/optimum/habana/index) and in the [package repository](https://github.com/huggingface/optimum-habana).\n", "\n", "Any other model that has been validated for language modeling (see [here](https://huggingface.co/docs/optimum/habana/index)) can be used, like BERT or RoBERTa." ] }, { "cell_type": "markdown", "id": "d9ecd62a", "metadata": {}, "source": [ "## What is Causal Language Modeling?\n", "\n", "Causal language modeling is the task of predicting the token following a sequence of tokens. In this situation, the model **only attends to the left context** (tokens on the left of the mask). Such a training is particularly interesting for generation tasks.\n", "\n", "Here is an example of inputs that could be used for causal language modeling:\n", "\n", "> This live AI webinar is organized by Habana Labs and Hugging Face and" ] }, { "cell_type": "markdown", "id": "d7a1ee47", "metadata": {}, "source": [ "## Training Script\n", "\n", "We are going to use the `run_clm.py` example script that you can find [here](https://github.com/huggingface/optimum-habana/blob/main/examples/language-modeling/run_clm.py). It performs the following:\n", "- download and preprocess the dataset,\n", "- instantiate the model by downloading a pre-trained checkpoint or initializing a new one,\n", "- download a tokenizer,\n", "- model training\n", "- model evaluation\n", "\n", "It enables to **fine-tune** or **pre-train** a model.\n", "\n", "> The only difference with the `run_clm.py` example script of Transformers is that the `Trainer` and the `TrainingArguments` classes have been replaced by `GaudiTrainer` and `GaudiTrainingArguments` respectively." ] }, { "cell_type": "markdown", "id": "c82cf40a", "metadata": {}, "source": [ "## Dataset\n", "\n", "The **WikiText** language modeling dataset is a collection of over 100 million tokens extracted from the set of verified Good and Featured articles on Wikipedia.\n", "\n", "It is available on the Hugging Face Hub and you can find more information about it [here](https://huggingface.co/datasets/wikitext)." ] }, { "cell_type": "markdown", "id": "4022996f", "metadata": {}, "source": [ "## 1. Install Dependencies\n", "\n", "We first install the latest version of Optimum Habana:" ] }, { "cell_type": "code", "execution_count": null, "id": "5edf1a81", "metadata": {}, "outputs": [], "source": [ "!pip install optimum-habana" ] }, { "cell_type": "markdown", "id": "d72dd346", "metadata": {}, "source": [ "Let's also install the required libraries to run this example:" ] }, { "cell_type": "code", "execution_count": null, "id": "b32c8253", "metadata": {}, "outputs": [], "source": [ "!pip install datasets sentencepiece protobuf scikit-learn evaluate" ] }, { "cell_type": "markdown", "id": "685c94c7", "metadata": {}, "source": [ "## 2. Fine-tuning GPT2-XL on 8 HPUs\n", "\n", "### Training Arguments\n", "\n", "Let's specify the training arguments the same way as in Transformers." ] }, { "cell_type": "code", "execution_count": null, "id": "d97d4e32", "metadata": {}, "outputs": [], "source": [ "training_args = {\n", " \"output_dir\": \"/tmp/clm_gpt2_xl\",\n", " \"dataset_name\": \"wikitext\",\n", " \"dataset_config_name\": \"wikitext-2-raw-v1\",\n", " \"num_train_epochs\": 1,\n", " \"per_device_train_batch_size\": 4,\n", " \"per_device_eval_batch_size\": 4,\n", " \"gradient_checkpointing\": True,\n", " \"do_train\": True,\n", " \"do_eval\": True,\n", " \"overwrite_output_dir\": True,\n", " \"use_cache\": False,\n", "}" ] }, { "cell_type": "markdown", "id": "b91f0132", "metadata": {}, "source": [ "Decide below whether you want to run pre-training or fine-tuning:" ] }, { "cell_type": "code", "execution_count": null, "id": "b5385bd6", "metadata": {}, "outputs": [], "source": [ "pretraining = False\n", "model_name = \"gpt2-xl\"\n", "\n", "if pretraining:\n", " training_args[\"config_name\"] = model_name\n", " training_args[\"tokenizer_name\"] = model_name\n", "else:\n", " training_args[\"model_name_or_path\"] = model_name" ] }, { "cell_type": "markdown", "id": "6b7218bb", "metadata": {}, "source": [ "And finally the Gaudi-related arguments:" ] }, { "cell_type": "code", "execution_count": null, "id": "a2c19e02", "metadata": {}, "outputs": [], "source": [ "training_args[\"use_habana\"] = True # Whether to use HPUs or not\n", "training_args[\"use_lazy_mode\"] = True # Whether to use lazy or eager mode\n", "training_args[\"gaudi_config_name\"] = \"Habana/gpt2\" # Gaudi configuration to use\n", "training_args[\"throughput_warmup_steps\"] = 3 # Remove the first N training iterations from throughput computation" ] }, { "cell_type": "markdown", "id": "3a2d9bba", "metadata": {}, "source": [ "All the existing Gaudi configurations are [here](https://huggingface.co/habana). You can also create your own Gaudi configuration and upload it to the Hugging Face Hub!" ] }, { "cell_type": "markdown", "id": "1bb8b26b", "metadata": {}, "source": [ "### Running the Script\n", "\n", "We are going to leverage the `DistributedRunner` class to launch a distributed training. This could also be done with the [`gaudi_spawn.py`](https://github.com/huggingface/optimum-habana/blob/main/examples/gaudi_spawn.py) script. More information [here](https://huggingface.co/docs/optimum/habana/usage_guides/distributed).\n", "\n", "To be initialized, an instance of this class requires the command to execute and the number of devices to use. Since one Gaudi has 8 HPUs, we are going to use all of them.\n", "\n", "> **Disclaimer: the run below will fail!**" ] }, { "cell_type": "code", "execution_count": null, "id": "feaba9f9", "metadata": {}, "outputs": [], "source": [ "from optimum.habana.distributed import DistributedRunner\n", "\n", "\n", "# Build the command to execute\n", "training_args_command_line = \" \".join(f\"--{key} {value}\" for key, value in training_args.items())\n", "command = f\"../examples/language-modeling/run_clm.py {training_args_command_line}\"\n", "\n", "# # Instantiate a distributed runner\n", "# distributed_runner = DistributedRunner(\n", "# command_list=[command], # The command(s) to execute\n", "# world_size=8, # The number of HPUs\n", "# use_mpi=True, # OpenMPI is used for multi-processing\n", "# )\n", "\n", "# # Launch training\n", "# ret_code = distributed_runner.run()" ] }, { "cell_type": "markdown", "id": "db6f1f75", "metadata": {}, "source": [ "This run failed because it was too big to fit in HPUs memory... Let's use DeepSpeed to solve this!" ] }, { "cell_type": "markdown", "id": "02862b85", "metadata": {}, "source": [ "## 3. DeepSpeed for HPUs\n", "\n", "It is possible to use DeepSpeed with HPUs to train larger models! This will enable to spread the optimizer states and gradients across processes to use less memory.\n", "\n", "How to switch to distributed training with DeepSpeed:\n", "1. Install Habana DeepSpeed.\n", "2. Add one training argument to specify the DeepSpeed configuration to use.\n", "3. Instantiate a new distributed runner.\n", "\n", "Let's install Habana DeepSpeed:" ] }, { "cell_type": "code", "execution_count": null, "id": "6a153780", "metadata": {}, "outputs": [], "source": [ "!pip install git+https://github.com/HabanaAI/DeepSpeed.git@1.21.0" ] }, { "cell_type": "markdown", "id": "08dcd80f", "metadata": {}, "source": [ "We need a DeepSpeed configuration. We are going to use [this one](https://github.com/huggingface/optimum-habana/tree/main/notebooks/configs/deepspeed_zero_2.json)." ] }, { "cell_type": "code", "execution_count": null, "id": "428b9504", "metadata": {}, "outputs": [], "source": [ "training_args[\"deepspeed\"] = \"configs/deepspeed_zero_2.json\"" ] }, { "cell_type": "markdown", "id": "28aa0fd6", "metadata": {}, "source": [ "We now have to instantiate a new distributed runner and to run it:" ] }, { "cell_type": "code", "execution_count": null, "id": "b6d116e5", "metadata": {}, "outputs": [], "source": [ "# Build the command to execute\n", "training_args_command_line = \" \".join(f\"--{key} {value}\" for key, value in training_args.items())\n", "command = f\"../examples/language-modeling/run_clm.py {training_args_command_line}\"\n", "\n", "# Instantiate a distributed runner\n", "distributed_runner = DistributedRunner(\n", " command_list=[command], # The command(s) to execute\n", " world_size=8, # The number of HPUs\n", " use_deepspeed=True, # Enable DeepSpeed\n", ")\n", "\n", "# Launch training\n", "ret_code = distributed_runner.run()" ] }, { "cell_type": "markdown", "id": "cc1222ea", "metadata": {}, "source": [ "Let's try the model we just fine-tuned!" ] }, { "cell_type": "code", "execution_count": null, "id": "572f2849", "metadata": {}, "outputs": [], "source": [ "import torch\n", "from transformers import GPT2LMHeadModel, GPT2Tokenizer\n", "\n", "\n", "# The sequence to complete\n", "prompt_text = \"This live AI webinar is organized by Habana Labs and Hugging Face and\"\n", "\n", "path_to_model = training_args[\"output_dir\"] # the folder where everything related to our run was saved\n", "\n", "device = torch.device(\"hpu\")\n", "\n", "# Load the tokenizer and the model\n", "tokenizer = GPT2Tokenizer.from_pretrained(path_to_model)\n", "model = GPT2LMHeadModel.from_pretrained(path_to_model)\n", "model.to(device)\n", "\n", "# Encode the prompt\n", "encoded_prompt = tokenizer.encode(prompt_text, add_special_tokens=False, return_tensors=\"pt\")\n", "encoded_prompt = encoded_prompt.to(device)\n", "\n", "# Generate the following of the prompt\n", "output_sequences = model.generate(\n", " input_ids=encoded_prompt,\n", " max_length=16 + len(encoded_prompt[0]),\n", " do_sample=True,\n", " num_return_sequences=3,\n", ")\n", "\n", "# Remove the batch dimension when returning multiple sequences\n", "if len(output_sequences.shape) > 2:\n", " output_sequences.squeeze_()\n", "\n", "generated_sequences = []\n", "\n", "for generated_sequence_idx, generated_sequence in enumerate(output_sequences):\n", " print(f\"=== GENERATED SEQUENCE {generated_sequence_idx + 1} ===\")\n", " generated_sequence = generated_sequence.tolist()\n", "\n", " # Decode text\n", " text = tokenizer.decode(generated_sequence, clean_up_tokenization_spaces=True)\n", "\n", " # Remove all text after the stop token\n", " text = text[: text.find(\".\")]\n", "\n", " # Add the prompt at the beginning of the sequence. Remove the excess text that was used for pre-processing\n", " total_sequence = prompt_text + text[len(tokenizer.decode(encoded_prompt[0], clean_up_tokenization_spaces=True)) :]\n", "\n", " generated_sequences.append(total_sequence)\n", " print(total_sequence)" ] }, { "cell_type": "markdown", "id": "23f74a31", "metadata": {}, "source": [ "And here are the costs for 3 epochs with Gaudi and with Nvidia V100:" ] }, { "cell_type": "code", "execution_count": null, "id": "71284cfa", "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "\n", "\n", "gaudi_price_per_hour = 13.10904\n", "v100_price_per_hour = 12.24\n", "\n", "print(\n", " f\"Gaudi (dl1.24xlarge): training time = 630s, cost = {np.round(630 * gaudi_price_per_hour / 3600, 2)}$ ({gaudi_price_per_hour}$/hr)\"\n", ")\n", "print(\n", " f\"4 x V100 (p3.8xlarge) : training time = 858s, cost = {np.round(858 * v100_price_per_hour / 3600, 2)}$ ({v100_price_per_hour}$/hr)\"\n", ")" ] }, { "cell_type": "markdown", "id": "6316f65c", "metadata": {}, "source": [ "We successfully trained GPT2-XL which has 1.6 billion parameters.\n", "You can train even bigger models with Gaudi and DeepSpeed, try it now! More information is available in [the documentation of Optimum Habana](https://huggingface.co/docs/optimum/habana/usage_guides/deepspeed)." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.10" } }, "nbformat": 4, "nbformat_minor": 5 }

notebooks/AI_HW_Summit_2022.ipynb (442 lines of code) (raw):