# OpenVINO

In this notebook, we will show how to use the OpenVINO toolkit to deploy deep learning models on edge devices and quantize models to reduce model size and inference latency. We will train a simple CNN model on the MNIST dataset, convert it to OpenVINO IR format, and the quantize the model to INT8 precision. We will then compare the size and performance of the quantized model with the original FP32 model.

## Setup OpenVINO

First, we need to install OpenVINO, NNCF and torch

In [None]:
%pip install -q "openvino>=2023.1.0" torch torchvision --extra-index-url https://download.pytorch.org/whl/cpu
%pip install -q "nncf>=2.6.0"

In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms
import pathlib
import numpy as np
import openvino as ov
import nncf

## Train Model

Next, define and train a simple CNN model on the MNIST dataset

In [None]:
transform=transforms.Compose([
        transforms.ToTensor(),
        transforms.Normalize((0.1307,), (0.3081,))
        ])

train_dataset = datasets.MNIST('./data', train=True, download=True,transform=transform)
test_dataset = datasets.MNIST('./data', train=False,transform=transform)

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(in_channels=1, out_channels=12, kernel_size=3)
        self.pool = nn.MaxPool2d(kernel_size=2, stride=2)
        self.fc = nn.Linear(12 * 13 * 13, 10)

    def forward(self, x):
        x = x.view(-1, 1, 28, 28)  
        x = F.relu(self.conv1(x))
        x = self.pool(x)
        x = x.view(x.size(0), -1)  
        x = self.fc(x)
        output = F.log_softmax(x, dim=1)
        return output


train_loader = torch.utils.data.DataLoader(train_dataset, 32)
test_loader = torch.utils.data.DataLoader(test_dataset, 32)

device = "cpu"

epochs = 1

model = Net().to(device)
optimizer = optim.Adam(model.parameters())

model.train()

for epoch in range(1, epochs+1):
    for batch_idx, (data, target) in enumerate(train_loader):
        data, target = data.to(device), target.to(device)
        optimizer.zero_grad()
        output = model(data)
        loss = F.nll_loss(output, target)
        loss.backward()
        optimizer.step()
        print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
            epoch, batch_idx * len(data), len(train_loader.dataset),
            100. * batch_idx / len(train_loader), loss.item()))

MODEL_DIR = pathlib.Path("./models")
MODEL_DIR.mkdir(exist_ok=True)
torch.save(model.state_dict(), MODEL_DIR / "original_model.p")

## Convert to OpenVINO IR

Then, convert the model to OpenVINO IR format

In [None]:
core = ov.Core()
example_input = next(iter(test_loader))[0]
ov_model = ov.convert_model(model, example_input=example_input)
ov.save_model(ov_model, MODEL_DIR / f"openvino_ir.xml")

## Quantization

To quantize the model using NNCF, first, create a transformation function to convert torch tensor to NumPy array and then use the created function together with a pytorch data loader to create calibration dataset using `Dataset` class from NNCF. Next, quantize the model using the `quantize` function from NNCF. Finally, compile the quantized model and save as OpenVINO IR format.

In [None]:
def transform_fn(data_item):
    images, _ = data_item
    return images.numpy()

calibration_dataset = nncf.Dataset(train_loader, transform_fn)
quantized_model = nncf.quantize(ov_model, calibration_dataset)
model_int8 = ov.compile_model(quantized_model)
input_fp32 = next(iter(test_loader))[0][0:1]
res = model_int8(input_fp32)
ov.save_model(quantized_model, MODEL_DIR / f"quant_openvino_ir.xml")

## Check Size

Compare the size of the FP32 and INT8 models

In [None]:
%ls -lh {MODEL_DIR}

## Check Accuracy

Evaluate the accuracy of the INT8 models and compare it with the FP32 model

In [None]:
def test_ov(model, data_loader):
    compiled_model = ov.compile_model(model)
    test_loss = 0
    correct = 0
    for data, target in data_loader:
        output = torch.tensor(compiled_model(data)[0])
        test_loss += F.nll_loss(output, target, reduction='sum').item()  # sum up batch loss
        pred = output.argmax(dim=1, keepdim=True)  # get the index of the max log-probability
        correct += pred.eq(target.view_as(pred)).sum().item()

    test_loss /= len(data_loader.dataset)

    return 100. * correct / len(data_loader.dataset)

acc = test_ov(ov_model, test_loader)
print(f"Accuracy of original model: {acc}")

qacc = test_ov(quantized_model, test_loader)
print(f"Accuracy of quantized model: {qacc}")