# Run SetFit model with OpenVINO

In this notebook, we'll learn how to export SetFit model to OpenVINO and run the inference.

## Setup

If you're running this Notebook on Colab or some other cloud platform, you will need to install the `setfit` library. Uncomment the following cell and run it:

In [1]:
#%pip install setfit[openvino] ipywidgets widgetsnbextension

## Load pretrained SetFit model from the Hub

In [2]:
from time import perf_counter
from setfit import SetFitModel

NUM_ITERS=100

model_id = "lewtun/my-awesome-setfit-model"
model = SetFitModel.from_pretrained("lewtun/my-awesome-setfit-model")

# Run inference
input_text = ["i loved the spiderman movie!", "pineapple on pizza is the worst ðŸ¤®"]

start = perf_counter()
for i in range(NUM_ITERS):
    preds = model(input_text)
end = perf_counter()

preds, f"{end-start:.2f} seconds"

2023-01-02 10:18:34.394949: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-01-02 10:18:34.511144: I tensorflow/core/util/port.cc:104] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2023-01-02 10:18:35.007140: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory
2023-01-02 10:18:35.007192: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] 

(array([1, 0]), '1.74 seconds')

## Export to OpenVINO

At this step, the model is exported to OpenVINO Intermediate Representation (IR). `setfit` also provides a pure PyTorch implementation of `SetFitModel`, where the head is a dense layer instead of a classifier from `scikit-learn`. This allows one to do backprop end-to-end and have more fine-grained control over the training process. The model body (Transformer) and head are merged into the one model during the export.

To use the PyTorch model, we load a pretrained model with `use_differentiable_head=True` and specify the number of classes to include in the head.

In [3]:
from setfit import SetFitModel
from setfit.exporters.openvino import export_to_openvino

output_path = "model.xml"
export_to_openvino(model, output_path)

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


  _C._jit_pass_onnx_node_shape_type_inference(node, params_dict, opset_version)
  _C._jit_pass_onnx_graph_shape_type_inference(
  _C._jit_pass_onnx_graph_shape_type_inference(


## Run model with OpenVINO
Now, we run the model wit OpenVINO Python* API to compare results. The elapsed inference time is computed over `NUM_ITERS` to compare with PyTorch.

>**Note:** Text tokenization is a required extra step that should be done before the inference

In [4]:
from transformers import AutoTokenizer
import openvino.runtime as ov

tokenizer = AutoTokenizer.from_pretrained(model_id)
inputs = tokenizer(
    input_text,
    padding=True,
    truncation=True,
    return_attention_mask=True,
    return_token_type_ids=True,
    return_tensors="np",
)

inputs_dict = dict(inputs)

core = ov.Core()
ov_model = core.read_model(output_path)
compiled_model = core.compile_model(ov_model, "CPU")
output = compiled_model.output(0)

start = perf_counter()
for i in range(NUM_ITERS):
    ov_preds = compiled_model(inputs_dict)[output]
end = perf_counter()

ov_preds, f"{end-start:.2f} seconds"


(array([1, 0]), '1.25 seconds')