# Automatic Speech Recognition (ASR) on IPU using wav2vec - Inference

This notebook will demonstrate how to perform wav2vec 2.0 inference with PyTorch on Graphcore IPUs. We will use a `wav2vec2-base` model fine-tuned for a connectionist temporal classification (CTC) downstream task using the [LibriSpeech](https://huggingface.co/datasets/librispeech_asr) dataset.

We will show how to use a wav2vec 2.0 model written in PyTorch from the [ðŸ¤— Transformers library](https://huggingface.co/docs/transformers/index) and parallelize it easily using the [ðŸ¤— Optimum Graphcore library](https://github.com/huggingface/optimum-graphcore). 

Hugging Face provides convenient access to pre-trained transformer models. The partnership between Hugging Face and Graphcore allows us to run these models on the IPU.

ðŸ¤— models ported to the IPU can be found on the [Graphcore Hugging Face organisation page](https://huggingface.co/Graphcore).

|  Domain | Tasks | Model | Datasets | Workflow |   Number of IPUs   | Execution time |
|---------|-------|-------|----------|----------|--------------|--------------|
| Audio processing | Automatic speech recognition | wav2vec 2.0 | LibriSpeech (librispeech_asr) | Inference | 4 or 16 | 35 sec |

[![Join our Slack Community](https://img.shields.io/badge/Slack-Join%20Graphcore's%20Community-blue?style=flat-square&logo=slack)](https://www.graphcore.ai/join-community)

## Environment setup

The best way to run this demo is on Paperspace Gradient's cloud IPUs because everything is already set up for you.

[![Run on Gradient](https://assets.paperspace.io/img/gradient-badge.svg)](https://ipu.dev/3CHite5)

To run the demo using other IPU hardware, you need to have the Poplar SDK enabled. Refer to the [Getting Started guide](https://docs.graphcore.ai/en/latest/getting-started.html#getting-started) for your system for details on how to enable the Poplar SDK. Also refer to the [Jupyter Quick Start guide](https://docs.graphcore.ai/projects/jupyter-notebook-quick-start/en/latest/index.html) for how to set up Jupyter to be able to run this notebook on a remote IPU machine.

In order to improve usability and support for future users, Graphcore would like to collect information about the
applications and code being run in this notebook. The following information will be anonymised before being sent to Graphcore:

- User progression through the notebook
- Notebook details: number of cells, code being run and the output of the cells
- Environment details

You can disable logging at any time by running `%unload_ext graphcore_cloud_tools.notebook_logging.gc_logger` from any cell.

## Dependencies and configuration

Install the dependencies for this notebook.

In [None]:
%%bash
apt update
apt-get install libsndfile1 -y

In [None]:
%pip install -r requirements.txt
%load_ext graphcore_cloud_tools.notebook_logging.gc_logger

We next import the utilities that will be used later in the notebook: 

In [None]:
import logging
from tqdm import tqdm
from dataclasses import dataclass, field
from pathlib import Path

import torch
import poptorch

from datasets import load_dataset
from optimum.graphcore import IPUConfig
from optimum.graphcore.modeling_utils import to_pipelined
from transformers import (
    AutoModelForCTC,
    Wav2Vec2Processor,
    HfArgumentParser,
)
from transformers.utils import check_min_version
from transformers.utils.versions import require_version

Values for machine size and cache directories can be configured through environment variables or directly in the notebook:

In [None]:
import os

n_ipu = int(os.getenv("NUM_AVAILABLE_IPU", 4))
executable_cache_dir = os.getenv("POPLAR_EXECUTABLE_CACHE_DIR", "/tmp/exe_cache/") + "/wav2vec2_inference"
checkpoint_directory = Path(os.getenv("CHECKPOINT_DIR", "/tmp")) / "demo"

## Preparing the model

This notebook uses the model output from the fine-tuning notebook `wav2vec2-fine-tuning-checkpoint.ipynb`, so make sure that you have run that notebook before trying out this notebook. If you do not run the fine-tuning notebook, nor have a output directory, then this notebook will not run.

As this model does not require optimising, the full `base` inference model can fit on a single IPU. This makes the IPU configuration very simple. The `num_device_iterations` parameter controls how many iterations the IPU performs before returning to the host. For example, if `num_device_iterations` is set to 10, then 10 samples will be sent to the IPU, processed, and sent back as a block of 10. 

We create the pipelined version of the model which makes changes for the IPU version of the model. Finally, we convert the model into `poptorch.inferenceModel`.

In [None]:
try:
    processor = Wav2Vec2Processor.from_pretrained(checkpoint_directory)
    model = AutoModelForCTC.from_pretrained(checkpoint_directory)
except:
    print("Warning! Could not load the fine-tuned checkpoint, falling back on the pre-trained checkpoint: 'facebook/wav2vec2-base'")
    print("To generate your custom fine-tuned checkpoint to use in this notebook, run our Wav2Vec2 fine-tuning notebook.")
    checkpoint_directory = "facebook/wav2vec2-base-960h"
    processor = Wav2Vec2Processor.from_pretrained(checkpoint_directory)
    model = AutoModelForCTC.from_pretrained(checkpoint_directory)

num_device_iterations = 10
ipu_config = IPUConfig(inference_device_iterations=num_device_iterations, executable_cache_dir=executable_cache_dir)
opts = ipu_config.to_options(for_inference=True)

ipu_model = to_pipelined(model, ipu_config)
ipu_model.parallelize()

inference_model = poptorch.inferenceModel(ipu_model.half().eval(), options=opts)

In [None]:
model.config

### Compilation

The sample batch is an example of what a batch could look like. Effectively we are setting the static size for the model input. The first dimension is the product of the `batch_size` and `num_device_iterations` parameters. However, in this case the batch size is just 1. The second dimension is the maximum audio length in samples, and we've set this to 20 seconds.

The model will then compile for this input size. If the size is changed later the model will recompile.

In [None]:
max_samples = 400000
sample_batch = {"input_values": torch.zeros([num_device_iterations, max_samples], dtype=torch.half)}

inference_model.compile(**sample_batch)

### LibriSpeech inference

We will test the inference capabilities of a fine-tuned model on a portion of the LibriSpeech `test` set. First, download the dataset using the [ðŸ¤— Datasets library](https://huggingface.co/docs/datasets/index).

In [None]:
ds = load_dataset("librispeech_asr", "clean", split="test")

### Create a batch

Here we take samples from the LibriSpeech test set and place them into a `zeros` Tensor to create a batch.

In [None]:
x = torch.zeros([num_device_iterations, max_samples], dtype=torch.half)

for i in range(num_device_iterations):
    input_values = processor(
        ds[i]["audio"]["array"], return_tensors="pt", padding="longest"
    ).input_values  # Batch size 1
    length = input_values.size(1)
    x[i, :length] = input_values[0]

batch = {"input_values": x}

## Run inference

Running the model will perform `num_device_iterations` iterations on the IPU will before returning to the host. This means that all of our logits will be returned at once.

In [None]:
output = inference_model(**batch)

### Decode

The max arg of the logits is taken from every frame of the output, which is a 'greedy decode' strategy. `processor` will then convert the predicted indexes back into text, and the transcripts will be printed.

In [None]:
logits = output[0]
predicted_ids = torch.argmax(logits, dim=-1)
transcription = processor.batch_decode(predicted_ids)

In [None]:
transcription

### Release IPUs in use

The IPython kernel has a lock on the IPUs used in running the model, preventing other users from using them. For example, if you wish to use other notebooks after working your way through this one, it may be necessary to manually run the cell below to release the IPUs you have been using. This will happen by default if using the `Run All` option to run this notebook. More information can be found in the notebook about [managing IPU resources](https://github.com/gradient-ai/Graphcore-HuggingFace/blob/main/useful-tips/managing_ipu_resources.ipynb).

In [None]:
if inference_model.isAttachedToDevice():
    inference_model.detachFromDevice()

## Next steps

You can try out the notebook on fine-tuning a wav2vec 2.0 checkpoint for automatic speech recognition on IPUs `wav2vec2-fine-tuning-checkpoint.ipynb`.

Try out the other [IPU-powered Jupyter Notebooks](https://www.graphcore.ai/ipu-jupyter-notebooks) to see how IPUs perform on other tasks. 