notebooks/sentence-transformers/getting-started.ipynb

{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Sentence Transformers on AWS Inferentia with Optimum Neuron\n", "\n", "This guide explains how to convert, load, and use [Sentence Transformers (SBERT)](https://www.sbert.net/) models on AWS Inferentia2 using Optimum Neuron, enabling efficient calculation of embeddings. Sentence Transformers are powerful models for generating sentence embeddings. You can use this Sentence Transformers to compute sentence / text embeddings for more than 100 languages. These embeddings can then be compared e.g. with cosine-similarity to find sentences with a similar meaning. This can be useful for semantic textual similarities, semantic search, or paraphrase mining.\n", "\n", "_Note: Currently only text models are supported, we are working on vision support for CLIP._\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Compile Sentence Transformers models for AWS Inferentia2 \n", "\n", "First, you need to convert your Sentence Transformers model to a format compatible with AWS Inferentia2. You can compile Sentence Transformers models with Optimum Neuron using the `optimum-cli` or `NeuronModelForSentenceTransformers` class. Below you will find an example for both approaches. We have to make sure `sentence-transformers` is installed. That's only needed for exporting the model. \n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!pip install sentence-transformers" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here we will use the `NeuronModelForSentenceTransformers`, which can convert any Sntence Transformers model to a format compatible with AWS Inferentia2 or load already converted models. \n When exporting models with the `NeuronModelForSentenceTransformers` you need to set `export=True` and define the input shape and batch size. The input shape is defined by the `sequence_length` and the batch size by `batch_size`. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from optimum.neuron import NeuronModelForSentenceTransformers\n", "\n", "\n", "# Sentence Transformers model from HuggingFace\n", "model_id = \"BAAI/bge-small-en-v1.5\"\n", "input_shapes = {\"batch_size\": 1, \"sequence_length\": 384} # mandatory shapes\n", "\n", "# Load Transformers model and export it to AWS Inferentia2\n", "model = NeuronModelForSentenceTransformers.from_pretrained(model_id, export=True, **input_shapes)\n", "\n", "# Save model to disk\n", "model.save_pretrained(\"bge_emb_inf2/\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here we will use the `optimum-cli` to convert the model. Similar to the `NeuronModelForSentenceTransformers` we need to define our input shape and batch size. The input shape is defined by the `sequence_length` and the batch size by `batch_size`. The `optimum-cli` will automatically convert the model to a format compatible with AWS Inferentia2 and save it to the specified output directory.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!optimum-cli export neuron -m BAAI/bge-small-en-v1.5 --sequence_length 384 --batch_size 1 --task feature-extraction bge_emb_inf2/" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Load compiled Sentence Transformers model and run inference\n", "\n", "Once we have a compiled Sentence Transformers model, which we either exported ourselves or is available on the Hugging Face Hub, we can load it and run inference. For loading the model we can use the `NeuronModelForSentenceTransformers` class, which is an abstraction layer for the `SentenceTransformer` class. The `NeuronModelForSentenceTransformers` class will automatically pad the input to the specified `sequence_length` and run inference on AWS Inferentia2." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from transformers import AutoTokenizer\n", "\n", "from optimum.neuron import NeuronModelForSentenceTransformers\n", "\n", "\n", "model_id_or_path = \"bge_emb_inf2/\"\n", "tokenizer_id = \"BAAI/bge-small-en-v1.5\"\n", "\n", "# Load model and tokenizer\n", "model = NeuronModelForSentenceTransformers.from_pretrained(model_id_or_path)\n", "tokenizer = AutoTokenizer.from_pretrained(tokenizer_id)\n", "\n", "# Run inference\n", "prompt = \"I like to eat apples\"\n", "encoded_input = tokenizer(prompt, return_tensors='pt')\n", "outputs = model(**encoded_input)\n", "\n", "token_embeddings = outputs.token_embeddings\n", "sentence_embedding = outputs.sentence_embedding\n", "\n", "print(f\"token embeddings: {token_embeddings.shape}\") # torch.Size([1, 7, 384])\n", "print(f\"sentence_embedding: {sentence_embedding.shape}\") # torch.Size([1, 384])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Production Usage\n", "\n", "For deploying these models in a production environment, refer to the [Amazon SageMaker Blog](https://www.philschmid.de/inferentia2-embeddings).\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.10" } }, "nbformat": 4, "nbformat_minor": 2 }

notebooks/sentence-transformers/getting-started.ipynb (154 lines of code) (raw):