notebooks/wav2vec2/wav2vec2-inference-checkpoint.ipynb

{ "cells": [ { "attachments": {}, "cell_type": "markdown", "id": "3f2aa9aa", "metadata": {}, "source": [ "# Automatic Speech Recognition (ASR) on IPU using wav2vec - Inference\n", "\n", "This notebook will demonstrate how to perform wav2vec 2.0 inference with PyTorch on Graphcore IPUs. We will use a `wav2vec2-base` model fine-tuned for a connectionist temporal classification (CTC) downstream task using the [LibriSpeech](https://huggingface.co/datasets/librispeech_asr) dataset.\n", "\n", "We will show how to use a wav2vec 2.0 model written in PyTorch from the [🤗 Transformers library](https://huggingface.co/docs/transformers/index) and parallelize it easily using the [🤗 Optimum Graphcore library](https://github.com/huggingface/optimum-graphcore). \n", "\n", "Hugging Face provides convenient access to pre-trained transformer models. The partnership between Hugging Face and Graphcore allows us to run these models on the IPU.\n", "\n", "🤗 models ported to the IPU can be found on the [Graphcore Hugging Face organisation page](https://huggingface.co/Graphcore)." ] }, { "attachments": {}, "cell_type": "markdown", "id": "ec3ab9e8", "metadata": {}, "source": [ "| Domain | Tasks | Model | Datasets | Workflow | Number of IPUs | Execution time |\n", "|---------|-------|-------|----------|----------|--------------|--------------|\n", "| Audio processing | Automatic speech recognition | wav2vec 2.0 | LibriSpeech (librispeech_asr) | Inference | 4 or 16 | 35 sec |\n", "\n", "[![Join our Slack Community](https://img.shields.io/badge/Slack-Join%20Graphcore's%20Community-blue?style=flat-square&logo=slack)](https://www.graphcore.ai/join-community)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "b091b286", "metadata": {}, "source": [ "## Environment setup\n", "\n", "The best way to run this demo is on Paperspace Gradient's cloud IPUs because everything is already set up for you.\n", "\n", "[![Run on Gradient](https://assets.paperspace.io/img/gradient-badge.svg)](https://ipu.dev/3CHite5)\n", "\n", "To run the demo using other IPU hardware, you need to have the Poplar SDK enabled. Refer to the [Getting Started guide](https://docs.graphcore.ai/en/latest/getting-started.html#getting-started) for your system for details on how to enable the Poplar SDK. Also refer to the [Jupyter Quick Start guide](https://docs.graphcore.ai/projects/jupyter-notebook-quick-start/en/latest/index.html) for how to set up Jupyter to be able to run this notebook on a remote IPU machine." ] }, { "attachments": {}, "cell_type": "markdown", "id": "7a04e024", "metadata": {}, "source": [ "In order to improve usability and support for future users, Graphcore would like to collect information about the\n", "applications and code being run in this notebook. The following information will be anonymised before being sent to Graphcore:\n", "\n", "- User progression through the notebook\n", "- Notebook details: number of cells, code being run and the output of the cells\n", "- Environment details\n", "\n", "You can disable logging at any time by running `%unload_ext graphcore_cloud_tools.notebook_logging.gc_logger` from any cell." ] }, { "attachments": {}, "cell_type": "markdown", "id": "29732272", "metadata": {}, "source": [ "## Dependencies and configuration" ] }, { "attachments": {}, "cell_type": "markdown", "id": "2556cb95", "metadata": {}, "source": [ "Install the dependencies for this notebook." ] }, { "cell_type": "code", "execution_count": null, "id": "a7babe86", "metadata": {}, "outputs": [], "source": [ "%%bash\n", "apt update\n", "apt-get install libsndfile1 -y" ] }, { "cell_type": "code", "execution_count": null, "id": "225d2188", "metadata": {}, "outputs": [], "source": [ "%pip install -r requirements.txt\n", "%load_ext graphcore_cloud_tools.notebook_logging.gc_logger" ] }, { "attachments": {}, "cell_type": "markdown", "id": "656c875f", "metadata": {}, "source": [ "We next import the utilities that will be used later in the notebook: " ] }, { "cell_type": "code", "execution_count": null, "id": "a8e2c575", "metadata": {}, "outputs": [], "source": [ "import logging\n", "from tqdm import tqdm\n", "from dataclasses import dataclass, field\n", "from pathlib import Path\n", "\n", "import torch\n", "import poptorch\n", "\n", "from datasets import load_dataset\n", "from optimum.graphcore import IPUConfig\n", "from optimum.graphcore.modeling_utils import to_pipelined\n", "from transformers import (\n", " AutoModelForCTC,\n", " Wav2Vec2Processor,\n", " HfArgumentParser,\n", ")\n", "from transformers.utils import check_min_version\n", "from transformers.utils.versions import require_version" ] }, { "attachments": {}, "cell_type": "markdown", "id": "c8313f3a", "metadata": {}, "source": [ "Values for machine size and cache directories can be configured through environment variables or directly in the notebook:" ] }, { "cell_type": "code", "execution_count": null, "id": "42e5ab85", "metadata": {}, "outputs": [], "source": [ "import os\n", "\n", "n_ipu = int(os.getenv(\"NUM_AVAILABLE_IPU\", 4))\n", "executable_cache_dir = os.getenv(\"POPLAR_EXECUTABLE_CACHE_DIR\", \"/tmp/exe_cache/\") + \"/wav2vec2_inference\"\n", "checkpoint_directory = Path(os.getenv(\"CHECKPOINT_DIR\", \"/tmp\")) / \"demo\"" ] }, { "attachments": {}, "cell_type": "markdown", "id": "bf3bea6c", "metadata": {}, "source": [ "## Preparing the model\n", "\n", "This notebook uses the model output from the fine-tuning notebook `wav2vec2-fine-tuning-checkpoint.ipynb`, so make sure that you have run that notebook before trying out this notebook. If you do not run the fine-tuning notebook, nor have a output directory, then this notebook will not run.\n", "\n", "As this model does not require optimising, the full `base` inference model can fit on a single IPU. This makes the IPU configuration very simple. The `num_device_iterations` parameter controls how many iterations the IPU performs before returning to the host. For example, if `num_device_iterations` is set to 10, then 10 samples will be sent to the IPU, processed, and sent back as a block of 10. \n", "\n", "We create the pipelined version of the model which makes changes for the IPU version of the model. Finally, we convert the model into `poptorch.inferenceModel`." ] }, { "cell_type": "code", "execution_count": null, "id": "68e28c34", "metadata": {}, "outputs": [], "source": [ "try:\n", " processor = Wav2Vec2Processor.from_pretrained(checkpoint_directory)\n", " model = AutoModelForCTC.from_pretrained(checkpoint_directory)\n", "except:\n", " print(\"Warning! Could not load the fine-tuned checkpoint, falling back on the pre-trained checkpoint: 'facebook/wav2vec2-base'\")\n", " print(\"To generate your custom fine-tuned checkpoint to use in this notebook, run our Wav2Vec2 fine-tuning notebook.\")\n", " checkpoint_directory = \"facebook/wav2vec2-base-960h\"\n", " processor = Wav2Vec2Processor.from_pretrained(checkpoint_directory)\n", " model = AutoModelForCTC.from_pretrained(checkpoint_directory)\n", "\n", "num_device_iterations = 10\n", "ipu_config = IPUConfig(inference_device_iterations=num_device_iterations, executable_cache_dir=executable_cache_dir)\n", "opts = ipu_config.to_options(for_inference=True)\n", "\n", "ipu_model = to_pipelined(model, ipu_config)\n", "ipu_model.parallelize()\n", "\n", "inference_model = poptorch.inferenceModel(ipu_model.half().eval(), options=opts)" ] }, { "cell_type": "code", "execution_count": null, "id": "ce6b371f", "metadata": {}, "outputs": [], "source": [ "model.config" ] }, { "attachments": {}, "cell_type": "markdown", "id": "b14432c0", "metadata": {}, "source": [ "### Compilation\n", "\n", "The sample batch is an example of what a batch could look like. Effectively we are setting the static size for the model input. The first dimension is the product of the `batch_size` and `num_device_iterations` parameters. However, in this case the batch size is just 1. The second dimension is the maximum audio length in samples, and we've set this to 20 seconds.\n", "\n", "The model will then compile for this input size. If the size is changed later the model will recompile." ] }, { "cell_type": "code", "execution_count": null, "id": "96d92113", "metadata": {}, "outputs": [], "source": [ "max_samples = 400000\n", "sample_batch = {\"input_values\": torch.zeros([num_device_iterations, max_samples], dtype=torch.half)}\n", "\n", "inference_model.compile(**sample_batch)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "215ad903", "metadata": {}, "source": [ "### LibriSpeech inference\n", "\n", "We will test the inference capabilities of a fine-tuned model on a portion of the LibriSpeech `test` set. First, download the dataset using the [🤗 Datasets library](https://huggingface.co/docs/datasets/index)." ] }, { "cell_type": "code", "execution_count": null, "id": "206d42ee", "metadata": {}, "outputs": [], "source": [ "ds = load_dataset(\"librispeech_asr\", \"clean\", split=\"test\")" ] }, { "attachments": {}, "cell_type": "markdown", "id": "f54c7572", "metadata": {}, "source": [ "### Create a batch\n", "\n", "Here we take samples from the LibriSpeech test set and place them into a `zeros` Tensor to create a batch." ] }, { "cell_type": "code", "execution_count": null, "id": "0b50a449", "metadata": {}, "outputs": [], "source": [ "x = torch.zeros([num_device_iterations, max_samples], dtype=torch.half)\n", "\n", "for i in range(num_device_iterations):\n", " input_values = processor(\n", " ds[i][\"audio\"][\"array\"], return_tensors=\"pt\", padding=\"longest\"\n", " ).input_values # Batch size 1\n", " length = input_values.size(1)\n", " x[i, :length] = input_values[0]\n", "\n", "batch = {\"input_values\": x}" ] }, { "attachments": {}, "cell_type": "markdown", "id": "be5d5520", "metadata": {}, "source": [ "## Run inference\n", "\n", "Running the model will perform `num_device_iterations` iterations on the IPU will before returning to the host. This means that all of our logits will be returned at once." ] }, { "cell_type": "code", "execution_count": null, "id": "6fc7160e", "metadata": {}, "outputs": [], "source": [ "output = inference_model(**batch)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "9fa60c49", "metadata": {}, "source": [ "### Decode\n", "\n", "The max arg of the logits is taken from every frame of the output, which is a 'greedy decode' strategy. `processor` will then convert the predicted indexes back into text, and the transcripts will be printed." ] }, { "cell_type": "code", "execution_count": null, "id": "ea42f4c8", "metadata": {}, "outputs": [], "source": [ "logits = output[0]\n", "predicted_ids = torch.argmax(logits, dim=-1)\n", "transcription = processor.batch_decode(predicted_ids)" ] }, { "cell_type": "code", "execution_count": null, "id": "dc0e016e", "metadata": {}, "outputs": [], "source": [ "transcription" ] }, { "attachments": {}, "cell_type": "markdown", "id": "b7f98ab7", "metadata": {}, "source": [ "### Release IPUs in use\n", "\n", "The IPython kernel has a lock on the IPUs used in running the model, preventing other users from using them. For example, if you wish to use other notebooks after working your way through this one, it may be necessary to manually run the cell below to release the IPUs you have been using. This will happen by default if using the `Run All` option to run this notebook. More information can be found in the notebook about [managing IPU resources](https://github.com/gradient-ai/Graphcore-HuggingFace/blob/main/useful-tips/managing_ipu_resources.ipynb)." ] }, { "cell_type": "code", "execution_count": null, "id": "f689124f", "metadata": {}, "outputs": [], "source": [ "if inference_model.isAttachedToDevice():\n", " inference_model.detachFromDevice()" ] }, { "attachments": {}, "cell_type": "markdown", "id": "2e33c9be", "metadata": {}, "source": [ "## Next steps\n", "\n", "You can try out the notebook on fine-tuning a wav2vec 2.0 checkpoint for automatic speech recognition on IPUs `wav2vec2-fine-tuning-checkpoint.ipynb`.\n", "\n", "Try out the other [IPU-powered Jupyter Notebooks](https://www.graphcore.ai/ipu-jupyter-notebooks) to see how IPUs perform on other tasks. " ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.10" }, "vscode": { "interpreter": { "hash": "31f2aee4e71d21fbe5cf8b01ff0e069b9275f58929596ceb00d14d90e3e16cd6" } } }, "nbformat": 4, "nbformat_minor": 5 }

notebooks/wav2vec2/wav2vec2-inference-checkpoint.ipynb (409 lines of code) (raw):