notebooks/introduction_to_optimum_graphcore.ipynb (2,771 lines of code) (raw):

{ "cells": [ { "attachments": {}, "cell_type": "markdown", "id": "86c80779", "metadata": {}, "source": [ "# Introduction to 🤗 Optimum Graphcore: BERT-Large Fine-tuning on IPU\n", "\n", "<p align=\"center\">\n", " <img src=\"https://github.com/huggingface/optimum-graphcore/blob/main/readme_logo.png?raw=true\" />\n", "</p>\n", "\n", "## 🤗 Optimum Graphcore\n", "\n", "🤗 Optimum Graphcore is the interface between the [🤗 Transformers library](https://huggingface.co/docs/transformers/index) and [Graphcore IPUs](https://www.graphcore.ai/products/ipu).\n", "It provides a set of tools enabling parallelization and loading of models on IPUs, training and fine-tuning on all the tasks already supported by 🤗 Transformers while being compatible with the 🤗 Hub and every model available on it out of the box.\n", "\n", "🤗 Optimum Graphcore was designed with one goal in mind: make training and evaluation straightforward for any 🤗 Transformers user while leveraging the complete power of IPUs.\n", "\n", "\n", "## What is an Intelligence Processing Unit (IPU)?\n", "Quote from the Hugging Face [blog post](https://huggingface.co/blog/graphcore#what-is-an-intelligence-processing-unit):\n", ">IPUs are the processors that power Graphcore’s IPU-POD data center compute systems. This new type of processor is designed to support the very specific computational requirements of AI and machine learning. Characteristics such as fine-grained parallelism, low-precision arithmetic, and the ability to handle sparsity have been built into the silicon.\n", "\n", "> Instead of adopting a SIMD/SIMT architecture like GPUs, Graphcore’s IPU uses a massively parallel, MIMD architecture, with ultra-high bandwidth memory placed adjacent to the processor cores, right on the silicon die.\n", "\n", "> This design delivers high performance and new levels of efficiency, whether running today’s most popular models, such as BERT and EfficientNet, or exploring next-generation AI applications." ] }, { "attachments": {}, "cell_type": "markdown", "id": "7c456f52", "metadata": {}, "source": [ "## About this notebook \n", "\n", "This notebook will demonstrate how to fine-tune a pre-trained BERT model with PyTorch on the Graphcore IPU-POD4 system using Optimum Graphcore. We will use a BERT-Large model and fine-tune on the SQuADv1 Question/Answering task.\n", "\n", "We will show how to take a BERT model written in PyTorch from the Hugging Face Transformers library and run it on Graphcore IPUs using Optimum Graphcore." ] }, { "attachments": {}, "cell_type": "markdown", "id": "b3f1c67d", "metadata": {}, "source": [ "| Domain | Tasks | Model | Datasets | Workflow | Number of IPUs | Execution time |\n", "|---------|-------|-------|----------|----------|--------------|--------------|\n", "| Natural language processing | Question answering | bert-large-uncased | SQUADv1 | Fine-tuning| | |\n", "\n", "[![Join our Slack Community](https://img.shields.io/badge/Slack-Join%20Graphcore's%20Community-blue?style=flat-square&logo=slack)](https://www.graphcore.ai/join-community)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "1f86a3e1", "metadata": {}, "source": [ "## Background\n", "\n", "\n", "### BERT\n", "\n", "BERT fine-tuning is when you train a BERT model on a supervised learning task on a relatively small amount of data, by using starting weights obtained from pre-training on a large, generic text corpus. Pre-training of BERT requires a lot of unlabelled data (for instance all of Wikipedia + thousands of books) and a lot of compute. It is expensive and time-consuming, but after pre-training, BERT will have learned an extremely good language model that can be fine-tuned on downstream tasks with a small amount of labelled data, achieving great results.\n", "\n", "\n", "![bert.png](images/bert.png)\n", "\n", "\n", "In this notebook, we will fine-tune BERT (pre-trained on IPUs with the Wikipedia dataset) on a question answering task called SQuAD. Then we will perform inference on the accompanying validation dataset.\n", "\n", "### What is SQuAD?\n", "\n", "The Stanford Question Answering Dataset (SQuAD) is a reading comprehension dataset, consisting of questions posed by crowdworkers on a set of Wikipedia articles, where the answer to every question is a segment of text, or span, from the corresponding reading passage, or the question might be unanswerable.\n", "\n", "From https://rajpurkar.github.io/SQuAD-explorer/\n", "\n", "Basically you train a model to take a question and read a passage of text and predict the start and end positions of where that answer lies in the passage. The image below shows an example from the dataset:\n", "\n", "(Source: [Rajpurkar GitHub](https://rajpurkar.github.io/SQuAD-explorer/explore/1.1/dev/Normans.html))\n", "\n", "For the case of SQuADv1, there are no unanswerable questions in the dataset." ] }, { "attachments": {}, "cell_type": "markdown", "id": "4407959b", "metadata": {}, "source": [ "## Environment setup\n", "\n", "The best way to run this demo is on Paperspace Gradient's cloud IPUs because everything is already set up for you.\n", "\n", "[![Run on Gradient](https://assets.paperspace.io/img/gradient-badge.svg)](https://ipu.dev/3CExwVy)\n", "\n", "To run the demo using other IPU hardware, you need to have the Poplar SDK enabled. Refer to the [Getting Started guide](https://docs.graphcore.ai/en/latest/getting-started.html#getting-started) for your system for details on how to enable the Poplar SDK. Also refer to the [Jupyter Quick Start guide](https://docs.graphcore.ai/projects/jupyter-notebook-quick-start/en/latest/index.html) for how to set up Jupyter to be able to run this notebook on a remote IPU machine." ] }, { "attachments": {}, "cell_type": "markdown", "id": "74a9a1cc", "metadata": {}, "source": [ "## Dependencies and configuration\n", "\n", "In order to improve usability and support for future users, Graphcore would like to collect information about the\n", "applications and code being run in this notebook. The following information will be anonymised before being sent to Graphcore:\n", "\n", "- User progression through the notebook\n", "- Notebook details: number of cells, code being run and the output of the cells\n", "- Environment details\n", "\n", "You can disable logging at any time by running `%unload_ext graphcore_cloud_tools.notebook_logging.gc_logger` from any cell." ] }, { "attachments": {}, "cell_type": "markdown", "id": "52fe5072", "metadata": {}, "source": [ "Install the dependencies for this notebook." ] }, { "cell_type": "code", "execution_count": null, "id": "e087c904", "metadata": {}, "outputs": [], "source": [ "%pip install \"optimum-graphcore==0.7\"\n", "%pip install graphcore-cloud-tools[logger]@git+https://github.com/graphcore/graphcore-cloud-tools\n", "%load_ext graphcore_cloud_tools.notebook_logging.gc_logger" ] }, { "attachments": {}, "cell_type": "markdown", "id": "28345f06", "metadata": {}, "source": [ "Values for machine size and cache directories can be configured through environment variables or directly in the notebook:" ] }, { "cell_type": "code", "execution_count": null, "id": "eb1764d0", "metadata": {}, "outputs": [], "source": [ "import os\n", "\n", "n_ipu = int(os.getenv(\"NUM_AVAILABLE_IPU\", 4))\n", "executable_cache_dir = os.getenv(\"POPLAR_EXECUTABLE_CACHE_DIR\", \"/tmp/exe_cache/\") + \"/introduction_to_optimum_graphcore\"" ] }, { "cell_type": "code", "execution_count": null, "id": "cefc3902", "metadata": {}, "outputs": [], "source": [ "# Import standard packages\n", "import transformers\n", "import torch\n", "import torch.nn as nn\n", "import numpy as np\n", "from tqdm.notebook import trange, tqdm\n", "from datasets import load_dataset, load_metric\n", "import time\n", "from pathlib import Path\n", "\n", "# To run on IPU we import popart and poptorch packages\n", "from optimum.graphcore import IPUConfig, IPUTrainer, IPUTrainingArguments" ] }, { "cell_type": "code", "execution_count": null, "id": "ac4440fd", "metadata": {}, "outputs": [], "source": [ "import warnings\n", "warnings.filterwarnings(\"ignore\")" ] }, { "attachments": {}, "cell_type": "markdown", "id": "a7415744", "metadata": {}, "source": [ "## Get the data\n" ] }, { "attachments": {}, "cell_type": "markdown", "id": "368087c6", "metadata": {}, "source": [ "We use the 🤗 `datasets` package to automatically download the SQuAD dataset:" ] }, { "cell_type": "code", "execution_count": null, "id": "53501b52", "metadata": {}, "outputs": [], "source": [ "datasets = load_dataset(\"squad\")" ] }, { "attachments": {}, "cell_type": "markdown", "id": "a1db88a6", "metadata": {}, "source": [ "The SQuAD dataset consists of pre-defined training and validation splits." ] }, { "cell_type": "code", "execution_count": null, "id": "a8269c84", "metadata": {}, "outputs": [], "source": [ "datasets" ] }, { "attachments": {}, "cell_type": "markdown", "id": "7d5eeda4", "metadata": {}, "source": [ "Each row in the data consists of a passage of text - `context` - a question about the passage - `question` - and the answer(s) to the question - `answers`. The latter consists of the text in the passage and the start position in the text.\n", "\n", "Here is an example row:" ] }, { "cell_type": "code", "execution_count": null, "id": "35d928f2", "metadata": {}, "outputs": [], "source": [ "datasets[\"train\"][10016]" ] }, { "attachments": {}, "cell_type": "markdown", "id": "d3b1eca0", "metadata": {}, "source": [ "**How do we preprocess this data to train it with a deep learning model?**\n", "\n", "We need to `tokenize` the text to turn it from words into numbers. This is done using `transformers.BertTokenizer`. Let's use this to tokenize a shortened version of the example above:" ] }, { "cell_type": "code", "execution_count": null, "id": "05b9d67d", "metadata": {}, "outputs": [], "source": [ "from squad_preprocessing import tokenizer" ] }, { "cell_type": "code", "execution_count": null, "id": "dce8edb5", "metadata": {}, "outputs": [], "source": [ "example = {\"context\": \"Institutes of technology in Venezuela were developed in the 1950s\",\n", " \"question\": \"When were Institutes of technology developed?\"}\n", "tokenized_example = tokenizer(\n", " example[\"question\"],\n", " example[\"context\"],\n", " truncation=\"only_second\",\n", " max_length=32,\n", " stride=16,\n", " return_overflowing_tokens=True,\n", " return_offsets_mapping=True,\n", " padding=\"max_length\",\n", " )" ] }, { "cell_type": "code", "execution_count": null, "id": "aa946365", "metadata": { "scrolled": true }, "outputs": [], "source": [ "tokenized_example.keys()" ] }, { "attachments": {}, "cell_type": "markdown", "id": "8aa909d4", "metadata": {}, "source": [ "Let's look at the `input_ids`:" ] }, { "cell_type": "code", "execution_count": null, "id": "e2c1babc", "metadata": {}, "outputs": [], "source": [ "tokenized_example.input_ids[0]" ] }, { "cell_type": "code", "execution_count": null, "id": "803b71d1", "metadata": {}, "outputs": [], "source": [ "tokenizer.decode(tokenized_example.input_ids[0])" ] }, { "attachments": {}, "cell_type": "markdown", "id": "73aa1fbf", "metadata": {}, "source": [ "As you can see in the decoded version, the question is placed at the start followed by a `[SEP]` token, then the context, followed by padding if required." ] }, { "cell_type": "code", "execution_count": null, "id": "ba329d1e", "metadata": {}, "outputs": [], "source": [ "from squad_preprocessing import prepare_train_features, prepare_validation_features, tokenizer" ] }, { "cell_type": "code", "execution_count": null, "id": "176caef7", "metadata": {}, "outputs": [], "source": [ "train_dataset = datasets[\"train\"].map(\n", " prepare_train_features,\n", " batched=True,\n", " num_proc=1,\n", " remove_columns=datasets[\"train\"].column_names,\n", " load_from_cache_file=True,\n", ")\n", "\n", "# Create validation features from dataset\n", "validation_features = datasets[\"validation\"].map(\n", " prepare_validation_features,\n", " batched=True,\n", " num_proc=1,\n", " remove_columns=datasets[\"validation\"].column_names,\n", " load_from_cache_file=True,\n", ")" ] }, { "attachments": {}, "cell_type": "markdown", "id": "febf80ea", "metadata": {}, "source": [ "## Get the BERT model from `transformers`" ] }, { "attachments": {}, "cell_type": "markdown", "id": "feb95706", "metadata": {}, "source": [ "Create the model on the host. We can use `from_pretrained` to load pre-trained checkpoints from the Hugging Face Hub." ] }, { "cell_type": "code", "execution_count": null, "id": "61f65ec9", "metadata": {}, "outputs": [], "source": [ "model = transformers.BertForQuestionAnswering.from_pretrained(\"Graphcore/bert-large-uncased\")" ] }, { "attachments": {}, "cell_type": "markdown", "id": "c4976351", "metadata": {}, "source": [ "**Now we are ready to use Optimum!**\n", "\n", "We can now set up our pipelined execution by specifying which layers to put on each IPU, and passing it to the `parallelize` method that we defined above.\n", "\n", "We also call the `.half()` method to cast all the model weights to half-precision (FP16). `.train()` sets the PyTorch model to training mode.\n", "\n", "If you are unfamiliar with training in half precision on IPUs, then our tutorial on [Half and Mixed Precision in PopTorch](https://github.com/graphcore/examples/tree/master/tutorials/pytorch/mixed_precision) can serve as a quick introduction." ] }, { "attachments": {}, "cell_type": "markdown", "id": "8b46eea9", "metadata": { "jp-MarkdownHeadingCollapsed": true, "tags": [] }, "source": [ "## How `optimum-graphcore` runs models on IPUs\n", "\n", "`optimum-graphcore` will run the model on IPUs using both **pipelining** and **data parallelism** in order to maximise hardware use.\n", "\n", "### Parallelism through pipelining\n", "\n", "The model layers are split over 4 IPUs. We then use [*pipeline parallelism*](https://docs.graphcore.ai/projects/tf-model-parallelism/en/latest/pipelining.html) over the IPUs with gradient accumulation. We subdivide the compute batch into micro-batches that pass through the pipeline in the forward pass and then come back again in the backwards pass, accumulating gradients for the parameters as they go.\n", "\n", "A complete pipeline step has a ramp-up phase at the start and a ramp-down phase at the end. Increasing the gradient accumulation factor increases the total batch size and also increases the pipeline efficiency, and therefore throughput, because the proportion of time in ramp-up/down phases will be reduced. \n", "\n", "![pipelining.png](images/pipelining.png)\n", "\n", "### Partitioning the model\n", "\n", "BERT Large has 24 transformer layers, which we will split over our 4 IPUs. The position and word embeddings, and the first three encoder layers will sit on IPU0, the following 3 IPUs have seven transformer layers each. This partition is specified in `IPUConfig` with the `layers_per_ipu` parameter.\n", "\n", "![bert-pipelining.png](images/bert-pipelining.png)\n", "\n", "\n", "### Data parallelism\n", "\n", "An IPU-POD4 contains 4 IPUs and our pipeline is 4 IPUs long, therefore we cannot replicate the pipeline. If we were running on an IPU-POD16, then we could utilise replication by feeding four different micro-batches to the device, which quadruples the effective mini-batch size. We call this configuration a \"4x4 pipeline\".\n", "\n", "\n", "### Recomputation checkpoints\n", "\n", "We can make more efficient use of the valuable In-Processor-Memory by saving only selected activation inputs and recomputing the rest. This lets us optimise on memory savings (by not storing all activations) vs FLOP expenditure (by not having to recompute all activations). \n", "\n", "<img src=\"images/recomputation.png\" width=\"800\" />\n", "\n", "Source: [TensorFlow Model Parallelism: Recomputation](https://docs.graphcore.ai/projects/tf-model-parallelism/en/latest/pipelining.html#recomputation)\n", "\n", "Checkpoints are automatically placed between each pipeline stage. In addition to these automatic checkpoints, we are adding one at the end of every transformer layer, which leads to better performance.\n", "\n", "### Replicated tensor sharding of optimizer state\n", "\n", "If we were using multiple replicas, we can also distribute our optimizer state to reduce local memory usage, a method called [on-chip replicated tensor sharding](https://docs.graphcore.ai/projects/graphcore-glossary/en/latest/index.html#term-Replicated-tensor-sharding). To utilise this feature, you must be on a IPU-POD16 system. \n", "\n", "> To further improve memory availability we also have the option to store tensors in the IPU-POD16 Streaming Memory at the cost of increased communications.\n", "\n", "![rts.png](images/rts.png)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "376d4a70", "metadata": {}, "source": [ "## Running with Optimum Graphcore\n", "\n", "To use `optimum-graphcore`, there are three main classes you need to know about:\n", "- `IPUTrainer`: the trainer class that takes care of compiling the model to run on IPUs. It also takes care of performing training and evaluation.\n", "- `IPUTrainingArguments`: the parameters for how the model will be trained by the trainer.\n", "- `IPUConfig`: the class that specifies attributes and configuration parameters to compile and put the model on the device.\n", "\n", "The `IPUTrainer` class is very similar to the [🤗 Transformers Trainer](https://huggingface.co/docs/transformers/main_classes/trainer) class, and adapting a script that currently uses the `Trainer` class to make it work with IPUs will mostly consist of simply swapping the `Trainer` class with the `IPUTrainer` class.\n", "\n", "The `IPUTrainingArguments` class is also very similar to the [🤗 Transformers TrainingArguments](https://huggingface.co/docs/transformers/v4.20.1/en/main_classes/trainer#transformers.TrainingArguments) class with a few extra arguments for IPUs. " ] }, { "cell_type": "code", "execution_count": null, "id": "f06484c4", "metadata": {}, "outputs": [], "source": [ "ipu_config = IPUConfig.from_pretrained(\"Graphcore/bert-large-ipu\",\n", " executable_cache_dir = executable_cache_dir)" ] }, { "cell_type": "code", "execution_count": null, "id": "c6e31859", "metadata": {}, "outputs": [], "source": [ "ipu_config" ] }, { "attachments": {}, "cell_type": "markdown", "id": "77f5b256", "metadata": {}, "source": [ "`device_iterations` is the number of batches the device should run before returning to the user. Increasing `device_iterations` can be more efficient because the loop runs on the IPU directly, reducing overhead costs. Please see the [PopTorch documentation](https://docs.graphcore.ai/projects/poptorch-user-guide/en/latest/batching.html?highlight=device%20iterations#poptorch-options-deviceiterations) for more information on this parameter." ] }, { "cell_type": "code", "execution_count": null, "id": "12936f32", "metadata": {}, "outputs": [], "source": [ "global_batch_size = 256\n", "per_device_train_batch_size = 1\n", "per_device_eval_batch_size = 2\n", "replication_factor = 1\n", "gradient_accumulation = int(global_batch_size / per_device_train_batch_size / replication_factor)" ] }, { "cell_type": "code", "execution_count": null, "id": "d9193054", "metadata": {}, "outputs": [], "source": [ "training_args = IPUTrainingArguments(output_dir=\"/tmp/outputs\",\n", " do_train=True,\n", " do_eval=True,\n", " per_device_train_batch_size=per_device_train_batch_size,\n", " per_device_eval_batch_size=per_device_eval_batch_size,\n", " gradient_accumulation_steps=gradient_accumulation,\n", " learning_rate=2e-4,\n", " num_train_epochs=2,\n", " logging_steps=25,\n", " dataloader_num_workers=32,\n", " resume_from_checkpoint=True,\n", " pad_on_batch_axis=True,\n", " n_ipu=n_ipu,\n", " save_strategy=\"epoch\",\n", " report_to=\"none\",\n", " )" ] }, { "attachments": {}, "cell_type": "markdown", "id": "96a57d24", "metadata": {}, "source": [ "## Training loop" ] }, { "cell_type": "code", "execution_count": null, "id": "d041a185", "metadata": {}, "outputs": [], "source": [ "from squad_preprocessing import PadCollate" ] }, { "attachments": {}, "cell_type": "markdown", "id": "3812e1da", "metadata": {}, "source": [ "Now we create the `IPUTrainer` from `optimum-graphcore` to train our model on the IPU:" ] }, { "cell_type": "code", "execution_count": null, "id": "d4abf64f", "metadata": {}, "outputs": [], "source": [ "trainer = IPUTrainer(model=model,\n", " ipu_config=ipu_config,\n", " args=training_args, \n", " train_dataset=train_dataset,\n", " eval_dataset=validation_features,\n", " )" ] }, { "cell_type": "code", "execution_count": null, "id": "336f77b2", "metadata": { "scrolled": true }, "outputs": [], "source": [ "trainer.train(resume_from_checkpoint=False)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "fe14c2ff", "metadata": {}, "source": [ "After training, we save the model weights to disk." ] }, { "cell_type": "code", "execution_count": null, "id": "9efa9c3b", "metadata": {}, "outputs": [], "source": [ "trainer.save_model()" ] }, { "attachments": {}, "cell_type": "markdown", "id": "f7359abf", "metadata": {}, "source": [ "## Validation\n", "\n", "We will now take the model we just trained on the training data and run validation on the SQuAD validation dataset. The model will run on a 2-IPU pipeline that we will replicate eight times." ] }, { "attachments": {}, "cell_type": "markdown", "id": "2f44bdee", "metadata": {}, "source": [ "We loop over all the validation data examples and get the `raw_predictions` for the start and end positions of where the answer to the question lies in the text passage for each one." ] }, { "cell_type": "code", "execution_count": null, "id": "7d11df40", "metadata": {}, "outputs": [], "source": [ "eval_output = trainer.predict(validation_features)" ] }, { "cell_type": "code", "execution_count": null, "id": "8bc99abf", "metadata": {}, "outputs": [], "source": [ "from datasets import load_metric\n", "from squad_preprocessing import postprocess_qa_predictions" ] }, { "cell_type": "code", "execution_count": null, "id": "24948756", "metadata": {}, "outputs": [], "source": [ "raw_predictions = []\n", "raw_predictions.append(eval_output.predictions[0].astype(float))\n", "raw_predictions.append(eval_output.predictions[1].astype(float))" ] }, { "cell_type": "code", "execution_count": null, "id": "3addecf3", "metadata": {}, "outputs": [], "source": [ "raw_predictions[0].shape" ] }, { "cell_type": "code", "execution_count": null, "id": "867850e7", "metadata": {}, "outputs": [], "source": [ "validation_features" ] }, { "attachments": {}, "cell_type": "markdown", "id": "01e9b7a3", "metadata": {}, "source": [ "We now post-process the raw predictions to the question answering task to get the best prediction that's valid for each one." ] }, { "cell_type": "code", "execution_count": null, "id": "ab0fde6f", "metadata": {}, "outputs": [], "source": [ "final_predictions = postprocess_qa_predictions(datasets[\"validation\"],\n", " validation_features,\n", " raw_predictions)" ] }, { "cell_type": "code", "execution_count": null, "id": "b4193f0b", "metadata": {}, "outputs": [], "source": [ "metric = load_metric(\"squad\")\n", "formatted_predictions = [{\"id\": k, \"prediction_text\": v}\n", " for k, v in final_predictions.items()]\n", "references = [{\"id\": ex[\"id\"], \"answers\": ex[\"answers\"]}\n", " for ex in datasets[\"validation\"]]\n", "metrics = metric.compute(predictions=formatted_predictions, references=references)\n", "print(metrics)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "b0c9b39d", "metadata": {}, "source": [ "We obtained a good validation score for SQuADv1.\n", "\n", "| BERT-Large | Exact Match | F1 Score |\n", "|----------------------------------------|:-----------:|:--------:|\n", "| Reference (Devling et al. 2018) | 84.1 | 90.9 |\n", "| IPU-POD16 with IPU pre-trained weights | 84.5 | 91.0 |" ] }, { "attachments": {}, "cell_type": "markdown", "id": "37dba3b4", "metadata": {}, "source": [ "## Inference\n", "\n", "We can now use our fine-tuned model to answer questions. Let's start by defining a task:" ] }, { "cell_type": "code", "execution_count": null, "id": "83d30fb9", "metadata": {}, "outputs": [], "source": [ "# Define task\n", "question = \"What speed-up can one expect from using sequence packing for training BERT on IPU?\"\n", "answer_text = \"We find that at sequence length 512 padding tokens represent in excess of 50% of the Wikipedia\" \\\n", " \"dataset used for pretraining BERT (Bidirectional Encoder Representations from Transformers).\" \\\n", " \"Therefore by removing all padding we achieve a 2x speed-up in terms of sequences/sec.\" \\\n", " \"To exploit this characteristic of the dataset,\" \\\n", " \"we develop and contrast two deterministic packing algorithms.\"" ] }, { "attachments": {}, "cell_type": "markdown", "id": "833efdcd", "metadata": {}, "source": [ "Let's get the model inputs ready and create our model. We'll import the weights from the pre-trained, fine-tuned BERT model from the previous sections:" ] }, { "cell_type": "code", "execution_count": null, "id": "961bb39f", "metadata": {}, "outputs": [], "source": [ "# Apply the tokenizer to the input text, treating them as a text-pair.\n", "input_encoding = tokenizer.encode_plus((question, answer_text))\n", "\n", "# Extract inputs, add batch dimension\n", "input_tensor = torch.tensor(input_encoding[\"input_ids\"]).unsqueeze(0)\n", "attention_tensor= torch.tensor(input_encoding[\"attention_mask\"]).unsqueeze(0)\n", "token_types=torch.tensor(input_encoding[\"token_type_ids\"]).unsqueeze(0)\n", " \n", "# Get model and load the fine-tuned weights\n", "model = transformers.BertForQuestionAnswering.from_pretrained(\"/tmp/outputs\")" ] }, { "attachments": {}, "cell_type": "markdown", "id": "4e0c40b5", "metadata": {}, "source": [ "Optionally, instead of using the fine-tuned weights we saved in the previous section, you can download fine-tuned weights from the [Graphcore organisation on the Hugging Face Model Hub](https://huggingface.co/Graphcore). " ] }, { "cell_type": "code", "execution_count": null, "id": "bfc678bb", "metadata": {}, "outputs": [], "source": [ "# model = transformers.BertForQuestionAnswering.from_pretrained(\"Graphcore/bert-large-uncased-squad11\")" ] }, { "attachments": {}, "cell_type": "markdown", "id": "ae26ae76", "metadata": {}, "source": [ "We can now solve the task and print the answer to the question:" ] }, { "cell_type": "code", "execution_count": null, "id": "69901089", "metadata": {}, "outputs": [], "source": [ "# Solve task\n", "outputs = model(input_tensor, attention_tensor, token_types)\n", "\n", "# Extract answer\n", "answer_start, answer_stop = outputs.start_logits.argmax(), outputs.end_logits.argmax()\n", "answer_ids = input_tensor.squeeze()[answer_start:answer_stop + 1]\n", "answer_tokens = tokenizer.convert_ids_to_tokens(answer_ids, skip_special_tokens=True)\n", "answer = tokenizer.convert_tokens_to_string(answer_tokens)\n", "\n", "# Print results\n", "print(f\"Question: {question}\")\n", "print(f\"Answer: {answer}\")" ] }, { "attachments": {}, "cell_type": "markdown", "id": "49961906", "metadata": {}, "source": [ "## Sharing your model with the Hugging Face community" ] }, { "attachments": {}, "cell_type": "markdown", "id": "72fc6d47", "metadata": {}, "source": [ "We can share our model on the 🤗 Models Hub and leverage the 🤗 inference API for downstream tasks." ] }, { "cell_type": "code", "execution_count": null, "id": "ea7921a0", "metadata": {}, "outputs": [], "source": [ "# Make sure you have git-lfs and huggingface-hub\n", "!apt-get update && apt-get upgrade -y && apt-get install -y git git-lfs \n", "# !pip install -y huggingface-hub " ] }, { "attachments": {}, "cell_type": "markdown", "id": "461996db", "metadata": {}, "source": [ "You can share your model with the 🤗 community. You do this by completing the following steps:\n", "\n", "1. Store your authentication token from the 🤗 website. [Sign up to 🤗](https://huggingface.co/join) if you haven't already.\n", "2. Execute the following cell and input your username and authentication token." ] }, { "cell_type": "code", "execution_count": null, "id": "8793ce79", "metadata": {}, "outputs": [], "source": [ "from huggingface_hub import notebook_login\n", "\n", "notebook_login()" ] }, { "attachments": {}, "cell_type": "markdown", "id": "584603e2", "metadata": {}, "source": [ "Now you can upload your model to the Hugging Face Hub. Uncomment the code in the cell below, and specify an identifier made up of your 🤗 username and a name for your model:" ] }, { "cell_type": "code", "execution_count": null, "id": "57e47822", "metadata": {}, "outputs": [], "source": [ "# Upload the checkpoint to Hugging Face Model Hub.\n", "\n", "# model.push_to_hub(\"<hf-username>/<name-of-model>\")\n", "# tokenizer.push_to_hub(\"<hf-username>/<name-of-model>\")" ] }, { "attachments": {}, "cell_type": "markdown", "id": "03526605", "metadata": {}, "source": [ "You can also share this model and other users can load it with the identifier \"<hf-username>/<name-of-model>\" so for instance:\n", "\n", "```python\n", "from transformers import AutoModelForSequenceClassification\n", "\n", "model = AutoModelForSequenceClassification.from_pretrained(\"sgugger/my-awesome-model\")\n", "```" ] }, { "attachments": {}, "cell_type": "markdown", "id": "471f01d0", "metadata": {}, "source": [ "## Next steps\n", "\n", "Try out the other [IPU-powered Jupyter Notebooks](https://www.graphcore.ai/ipu-jupyter-notebooks) to see how how IPUs perform on other tasks." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.10" }, "vscode": { "interpreter": { "hash": "31f2aee4e71d21fbe5cf8b01ff0e069b9275f58929596ceb00d14d90e3e16cd6" } }, "widgets": { "application/vnd.jupyter.widget-state+json": { "state": { "008a66177cbc400587af50b64b22c8e4": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "DescriptionStyleModel", "state": { "description_width": "" } }, "00928119156c47b7b0651dd77a3b703d": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "DescriptionStyleModel", "state": { "description_width": "" } }, "01143a3b887c456cb5ba6a5554fa1781": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {} }, "0181aea86cf2426a8df2ef59eb006163": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {} }, "01ee47b5825f44b9a35b0f1a2d5f786d": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {} }, "0481fce1711a42419aaf7ceb6347ed9f": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HBoxModel", "state": { "children": [ "IPY_MODEL_9aa7e4a5d1bd44d28212358850a77f5b", "IPY_MODEL_2a2f4dde3e41485d98ae5142ecee12c9" ], "layout": "IPY_MODEL_5c8791000e03443fa70dec071b9db398" } }, "066fa664116849098018e9baa066417d": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "FloatProgressModel", "state": { "bar_style": "danger", "layout": "IPY_MODEL_f262cd7b1a3c40bbbad84197e64de588", "max": 346, "style": "IPY_MODEL_9ab28f98f30e440aa49206cacace1931", "value": 86 } }, "07779d5402db4cf48b9f78b275ae8ef0": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "ProgressStyleModel", "state": { "description_width": "" } }, "07c9aed3f6ba41b4b17ea0f6e86477e5": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {} }, "0808d61da7ab444288336a14f34d5bc0": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HTMLModel", "state": { "layout": "IPY_MODEL_9a3803c989da4009ad588ca1283cf70e", "style": "IPY_MODEL_c681ea503e094c149100da1cbf572358", "value": " 3/3 [10:40&lt;00:00, 213.47s/it]" } }, "0872f45312754a6dbe556f9bdac4d8f4": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "ProgressStyleModel", "state": { "description_width": "initial" } }, "0920a4e34496461681c9376f086a4f02": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {} }, "0b13343e39544e9881cf2e26f5ffba3c": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {} }, "0b8bfb3536424f1c8093ad658dd2349c": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "ProgressStyleModel", "state": { "description_width": "initial" } }, "0e89f27f9ade48b9876e365e8dc8de42": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HBoxModel", "state": { "children": [ "IPY_MODEL_3b7a4b71108b4abdbc455be2b956b575", "IPY_MODEL_1c3a111026ef43c4a751de7bb0cf6b63", "IPY_MODEL_e1e5b2d4470c49478b1de0f8b8d512a4" ], "layout": "IPY_MODEL_fe16a84150d94088a7748459d74e7a57" } }, "0f3d414b44974449b24c07cddd248e13": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "ProgressStyleModel", "state": { "description_width": "initial" } }, "1261d257152a40a18368f595466aac1f": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "FloatProgressModel", "state": { "bar_style": "success", "description": "Epoch: 2 - Step: 345 - Loss: 0.380 - Throughput: 556.949 seq/s: 100%", "layout": "IPY_MODEL_fce00d252d75456b9199a4313faf0963", "max": 346, "style": "IPY_MODEL_0f3d414b44974449b24c07cddd248e13", "value": 346 } }, "1344574ddf854e83a8f85ff6c1b69251": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HTMLModel", "state": { "layout": "IPY_MODEL_419c4e1e232d4b6b89d2e9edb83a3fc7", "style": "IPY_MODEL_f4666e5976f847538a3eb3de5de251dd", "value": " 86/346 [00:46&lt;02:17, 1.89it/s]" } }, "14207f0293e741eaa1153902e55fb2e6": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "FloatProgressModel", "state": { "bar_style": "success", "layout": "IPY_MODEL_a0330f3dd67346d0ad29977b1c930c5c", "max": 346, "style": "IPY_MODEL_72db2eeaaf7f4469898b0d345f125a1a", "value": 346 } }, "1434848ba8e34924a81900027b727f3f": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "FloatProgressModel", "state": { "bar_style": "success", "layout": "IPY_MODEL_e6cdeccedbeb4ddca7cab4f0658850d0", "max": 346, "style": "IPY_MODEL_7bb5161067c440d6ab72ae1095b71407", "value": 346 } }, "14adb7851e6f49a0a5692bbb67365c9b": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "DescriptionStyleModel", "state": { "description_width": "" } }, "175769e2baa448fe9e1d0a6d0cd21f96": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HTMLModel", "state": { "layout": "IPY_MODEL_d4613b1f9f4c4682ad7dd0bd99b64ccc", "style": "IPY_MODEL_cca7d495a9e24bcc9e2818500e465b91", "value": " 346/346 [07:23&lt;00:00, 1.28s/it]" } }, "17831a84adfd4a3facbf706ef22f7316": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "DescriptionStyleModel", "state": { "description_width": "" } }, "17e50d4efe934e978da14755e29a6127": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {} }, "18535685b6b3480393a392e5c004582b": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HBoxModel", "state": { "children": [ "IPY_MODEL_568c40a5d1ef47f6864f19d915ec20e0", "IPY_MODEL_7ed2a2aab41e4acd9dd1e3b3fa3731b4", "IPY_MODEL_6b359900df9a4c278ce08a408aa511cf" ], "layout": "IPY_MODEL_6dba9e1af5c44890b8b7d86130197d9a" } }, "1c3a111026ef43c4a751de7bb0cf6b63": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "FloatProgressModel", "state": { "bar_style": "success", "layout": "IPY_MODEL_6e0e5239bdd64b6daf4f4e21162dfc3d", "max": 2, "style": "IPY_MODEL_51a1eb91ba6047138e3635987fd00f45", "value": 2 } }, "1d4ec2582038466b9c8b3c273b9b8f1e": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "ProgressStyleModel", "state": { "description_width": "" } }, "1d7981b404744ba4b791c98922818cd3": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "ProgressStyleModel", "state": { "description_width": "" } }, "1f111acf18bf4dc4a9728746be834514": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "FloatProgressModel", "state": { "bar_style": "success", "layout": "IPY_MODEL_6306d37f90f845fdab4a15ac205627e0", "max": 10570, "style": "IPY_MODEL_86d9f694d52d4b6b89ba9aebe142d77c", "value": 10570 } }, "2353d9e0a42c42e8a1c8fef1f137aca9": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HBoxModel", "state": { "children": [ "IPY_MODEL_9a8efa64777841fa9a312d45fbe91ff1", "IPY_MODEL_0808d61da7ab444288336a14f34d5bc0" ], "layout": "IPY_MODEL_bb2ef00251204bddaaecac5555b9e81f" } }, "2380d82578cc45a5ba4cf481f59354fd": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {} }, "24d0269086554bc7a14231dd7d03481a": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "DescriptionStyleModel", "state": { "description_width": "" } }, "25edc3cac1fd427db24c82a0693cf7a7": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "DescriptionStyleModel", "state": { "description_width": "" } }, "2669b96b18ab46a2a5e0736fef3d5f91": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HTMLModel", "state": { "layout": "IPY_MODEL_d660db84f39b4ca693057292e4d6e2a8", "style": "IPY_MODEL_5878ce9ce7124591b8935390d0b6968e", "value": " 346/346 [03:20&lt;00:00, 1.73it/s]" } }, "269f1e466dcc41afa319def078b1045f": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "ProgressStyleModel", "state": { "description_width": "initial" } }, "29d8efdddad049f2b815f0e4e7a3bbaa": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {} }, "2a2f4dde3e41485d98ae5142ecee12c9": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HTMLModel", "state": { "layout": "IPY_MODEL_2b1e2a04955541e1ae33736b01b7c8c3", "style": "IPY_MODEL_aee35c3dde87426aaee859d7dab30775", "value": " 113/113 [04:58&lt;00:00, 2.64s/it]" } }, "2b1e2a04955541e1ae33736b01b7c8c3": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {} }, "2d7b09ef38c74d1484db4147a543a3b2": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "FloatProgressModel", "state": { "bar_style": "success", "layout": "IPY_MODEL_d582c554820a44918b0cd2fa8009bce2", "max": 3, "style": "IPY_MODEL_63cc107be8f44e279c4299b7d17b4ffb", "value": 3 } }, "32bfe2d3c6fd4bc6b7c1ecc8b9f754d4": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HTMLModel", "state": { "layout": "IPY_MODEL_b1ff12dbafba44d792d197a66f9741b7", "style": "IPY_MODEL_50611ea36f7842ca9dd42476ba0e768b", "value": " 346/346 [04:07&lt;00:00, 1.40it/s]" } }, "32efc833332242d2b9bdc5732f3f83e2": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HTMLModel", "state": { "layout": "IPY_MODEL_5abaedd8a0df4c5b93e37e1e1c4015fb", "style": "IPY_MODEL_d660f1b593e74ac695623b27234cfd14", "value": "Epoch: 1 - Step: 345 - Loss: 0.609 - Throughput: 631.668 seq/s: 100%" } }, "3515d1d1982240f792fa7f4b8c9a4bca": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {} }, "37954b48eb164ffe8053637d740ec665": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HTMLModel", "state": { "layout": "IPY_MODEL_c7f5d6085c264e66aca41b2941d66c08", "style": "IPY_MODEL_caa86e7c770c4b65979cb298694ecc3c", "value": "Epoch: 0 - Step: 85 - Loss: 1.309 - Throughput: 631.741 seq/s: 25%" } }, "37efcdb2a015414397f9d7551c31bd69": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HTMLModel", "state": { "layout": "IPY_MODEL_43e083c3b8214747ae10324e2b29ef12", "style": "IPY_MODEL_dfe6bcd0588846ed980238cce0ce3f23", "value": "100%" } }, "394806f1ef1549019f5e98749e4942c8": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "ProgressStyleModel", "state": { "description_width": "initial" } }, "3b7a4b71108b4abdbc455be2b956b575": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HTMLModel", "state": { "layout": "IPY_MODEL_a416cc948ae445088d67e79fdef5de7f", "style": "IPY_MODEL_7abf963c624e403bb0db7464d1622555", "value": "100%" } }, "3c65035318444b85bf2ca4602daec6b2": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "DescriptionStyleModel", "state": { "description_width": "" } }, "3daf6e67d09444c8bb0dc4f6383431d3": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HBoxModel", "state": { "children": [ "IPY_MODEL_c8b69b41ac3946d49f9ded898ac17637", "IPY_MODEL_ef0f5be231f94dbeba93712230d0ad87" ], "layout": "IPY_MODEL_77f2d799553347bdb5d7977a2a870f24" } }, "3e4f986cad4244f195cac0ce9a86f761": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {} }, "3f50e999fa794c6fbdc6b4911051d16c": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {} }, "3faac7495c4c467da776fbf856c4618b": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HTMLModel", "state": { "layout": "IPY_MODEL_c440a5eef70547fdbdd428f07aa3ee68", "style": "IPY_MODEL_8e18e492dae0467cb9ebc44757c91fd6", "value": " 3/3 [09:31&lt;00:00, 189.92s/it]" } }, "419c4e1e232d4b6b89d2e9edb83a3fc7": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {} }, "424fd49e58684ddf81b01383c45c2c35": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HBoxModel", "state": { "children": [ "IPY_MODEL_c701f026d5cb415c90f45b15a19f8b0d", "IPY_MODEL_51fd1ad896b5445486cc114348bdccbf" ], "layout": "IPY_MODEL_83ab52267b0c45ed97ef1115182cd8c4" } }, "43235a59a1f04f3b919ddfa67b993ec1": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HBoxModel", "state": { "children": [ "IPY_MODEL_a6fd50dcdd8c4b3b99acb647debbf585", "IPY_MODEL_f03846bf22224128a81aaf092477426f" ], "layout": "IPY_MODEL_0b13343e39544e9881cf2e26f5ffba3c" } }, "43431bea444146bc9612c4baa0464363": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HBoxModel", "state": { "children": [ "IPY_MODEL_a146b51061ae45ba9a99eb5149313a22", "IPY_MODEL_4a828a817d264c7dbd6ca1ffbea8b149" ], "layout": "IPY_MODEL_01143a3b887c456cb5ba6a5554fa1781" } }, "43e083c3b8214747ae10324e2b29ef12": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {} }, "449028d1c04c44f29d97b7291d88336a": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {} }, "454af4b9e64b415a987450c467273256": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "ProgressStyleModel", "state": { "description_width": "" } }, "45bd07013d0148f1919d350b8794b383": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {} }, "483e09719495454eb0f9554d42772516": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HBoxModel", "state": { "children": [ "IPY_MODEL_63fb867af23946af9f2ff7887197f4df", "IPY_MODEL_53e5ba5bba5c451c93d47af31d56bed0" ], "layout": "IPY_MODEL_07c9aed3f6ba41b4b17ea0f6e86477e5" } }, "4892ddbb6757414fba7450c4d2895183": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "ProgressStyleModel", "state": { "description_width": "initial" } }, "4a828a817d264c7dbd6ca1ffbea8b149": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HTMLModel", "state": { "layout": "IPY_MODEL_17e50d4efe934e978da14755e29a6127", "style": "IPY_MODEL_cc9f73329ad44784936f9e39175e2762", "value": " 346/346 [05:06&lt;00:00, 1.13it/s]" } }, "4f41a49093244e499f19edd0b4ce07b9": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {} }, "4fffcf0014b243e389a16493818ae436": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {} }, "50611ea36f7842ca9dd42476ba0e768b": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "DescriptionStyleModel", "state": { "description_width": "" } }, "519cd5a0c1b744ca82150c05427aa678": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {} }, "51a1eb91ba6047138e3635987fd00f45": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "ProgressStyleModel", "state": { "description_width": "" } }, "51fd1ad896b5445486cc114348bdccbf": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HTMLModel", "state": { "layout": "IPY_MODEL_7629aaac2c0e402a9a6f85bb698ee85b", "style": "IPY_MODEL_5e90c8d3f1d7464f96d52b62093c742d", "value": " 10570/10570 [00:22&lt;00:00, 480.14it/s]" } }, "5229f392494d4a8190aece85619fa3c9": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {} }, "53e5ba5bba5c451c93d47af31d56bed0": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HTMLModel", "state": { "layout": "IPY_MODEL_8f0558c88cfe40dbb463a84ad1ce2f3b", "style": "IPY_MODEL_008a66177cbc400587af50b64b22c8e4", "value": " 10570/10570 [00:22&lt;00:00, 473.43it/s]" } }, "56545676fdee435a9862ac1da85e0618": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HTMLModel", "state": { "layout": "IPY_MODEL_bd184564c54d475bb65c7cc429f284d4", "style": "IPY_MODEL_6db795a8ad43439e874755e4ea93fea7", "value": "Epochs: 0%" } }, "5688c41f7eaf437081e532ba6c98d6eb": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "FloatProgressModel", "state": { "bar_style": "success", "layout": "IPY_MODEL_66e4b0e39fd940d586bc6cb05467dd5f", "max": 113, "style": "IPY_MODEL_1d4ec2582038466b9c8b3c273b9b8f1e", "value": 113 } }, "568c40a5d1ef47f6864f19d915ec20e0": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HTMLModel", "state": { "layout": "IPY_MODEL_b3fdedd4fa4d42ccb734e5476b46f461", "style": "IPY_MODEL_00928119156c47b7b0651dd77a3b703d", "value": "Epoch: 2 - Step: 345 - Loss: 0.353 - Throughput: 571.998 seq/s: 100%" } }, "573939d58d194a30b18ff15aed870bec": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "DescriptionStyleModel", "state": { "description_width": "" } }, "57ced115d7474cd5b2bc902eb5d61f64": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {} }, "5878ce9ce7124591b8935390d0b6968e": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "DescriptionStyleModel", "state": { "description_width": "" } }, "5abaedd8a0df4c5b93e37e1e1c4015fb": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {} }, "5c8791000e03443fa70dec071b9db398": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {} }, "5e90c8d3f1d7464f96d52b62093c742d": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "DescriptionStyleModel", "state": { "description_width": "" } }, "6306d37f90f845fdab4a15ac205627e0": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {} }, "632e54ed92974d278c2e9200c8a630f3": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HBoxModel", "state": { "children": [ "IPY_MODEL_acffceb0aa434c169979520f44582137", "IPY_MODEL_2d7b09ef38c74d1484db4147a543a3b2", "IPY_MODEL_3faac7495c4c467da776fbf856c4618b" ], "layout": "IPY_MODEL_a35d9b8305bb40a2a8a90f7e32bae8d8" } }, "63cc107be8f44e279c4299b7d17b4ffb": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "ProgressStyleModel", "state": { "description_width": "" } }, "63fb867af23946af9f2ff7887197f4df": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "FloatProgressModel", "state": { "bar_style": "success", "description": "100%", "layout": "IPY_MODEL_86b64788c81042079a204561c008ce26", "max": 10570, "style": "IPY_MODEL_269f1e466dcc41afa319def078b1045f", "value": 10570 } }, "66e4b0e39fd940d586bc6cb05467dd5f": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {} }, "6748f83e2145488bbd3e62e5beba7511": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {} }, "687c78cdd0e445389d54faea359c4ff3": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HTMLModel", "state": { "layout": "IPY_MODEL_01ee47b5825f44b9a35b0f1a2d5f786d", "style": "IPY_MODEL_a2ceecf934a648858b7e86f5d0ca1548", "value": " 113/113 [02:57&lt;00:00, 1.57s/it]" } }, "6a55285da8ba42a5b42433a9641f9688": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HTMLModel", "state": { "layout": "IPY_MODEL_b72f3358acb34d979d11623b3407e5b3", "style": "IPY_MODEL_8efb053df531477da4af56ff6de17e33", "value": "Step: 112 - throughput: 3287.240 samples/s: 100%" } }, "6b359900df9a4c278ce08a408aa511cf": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HTMLModel", "state": { "layout": "IPY_MODEL_cf662aa184e043cb9616489a749f9d41", "style": "IPY_MODEL_25edc3cac1fd427db24c82a0693cf7a7", "value": " 346/346 [03:07&lt;00:00, 1.75it/s]" } }, "6b5bc74901b34b15ae9b91bccd5f707d": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HTMLModel", "state": { "layout": "IPY_MODEL_a11cc6f332ee425f889b83dc4d47ddf4", "style": "IPY_MODEL_17831a84adfd4a3facbf706ef22f7316", "value": " 346/346 [03:12&lt;00:00, 1.91it/s]" } }, "6db795a8ad43439e874755e4ea93fea7": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "DescriptionStyleModel", "state": { "description_width": "" } }, "6dba9e1af5c44890b8b7d86130197d9a": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {} }, "6e0e5239bdd64b6daf4f4e21162dfc3d": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {} }, "6fb85aeae69f490da819efd2624435d2": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "DescriptionStyleModel", "state": { "description_width": "" } }, "7269ab27965a4204a63d42dc3bc659ce": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {} }, "72db2eeaaf7f4469898b0d345f125a1a": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "ProgressStyleModel", "state": { "description_width": "" } }, "731d640ba4444bd49b786b907eb37ad4": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {} }, "74fe05961b97400588a29f7322e2db81": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {} }, "7629aaac2c0e402a9a6f85bb698ee85b": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {} }, "76537c65be544568b91176fe115fdf4e": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {} }, "765e2d80436446dd9c3bbc1e68a7923e": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "FloatProgressModel", "state": { "bar_style": "success", "layout": "IPY_MODEL_ef246036865e45c3ba3637dc1ebbe561", "max": 2, "style": "IPY_MODEL_07779d5402db4cf48b9f78b275ae8ef0", "value": 2 } }, "76d1147989b0470c888d003311de0685": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "ProgressStyleModel", "state": { "description_width": "initial" } }, "77f2d799553347bdb5d7977a2a870f24": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {} }, "7abf963c624e403bb0db7464d1622555": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "DescriptionStyleModel", "state": { "description_width": "" } }, "7b6c6d3d77474913b0edc9f4db18f92b": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "DescriptionStyleModel", "state": { "description_width": "" } }, "7bb5161067c440d6ab72ae1095b71407": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "ProgressStyleModel", "state": { "description_width": "" } }, "7ed2a2aab41e4acd9dd1e3b3fa3731b4": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "FloatProgressModel", "state": { "bar_style": "success", "layout": "IPY_MODEL_731d640ba4444bd49b786b907eb37ad4", "max": 346, "style": "IPY_MODEL_454af4b9e64b415a987450c467273256", "value": 346 } }, "83ab52267b0c45ed97ef1115182cd8c4": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {} }, "8521a42dc4ea4a6998612bf0193b9ecc": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HBoxModel", "state": { "children": [ "IPY_MODEL_94c8d3393a8446919ebf351340bacabb", "IPY_MODEL_2669b96b18ab46a2a5e0736fef3d5f91" ], "layout": "IPY_MODEL_45bd07013d0148f1919d350b8794b383" } }, "86b64788c81042079a204561c008ce26": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {} }, "86d9f694d52d4b6b89ba9aebe142d77c": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "ProgressStyleModel", "state": { "description_width": "" } }, "88bad03b52904e7285df98b1f90268b3": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "DescriptionStyleModel", "state": { "description_width": "" } }, "8bb6baf370674ce4aa7bd827e01eec43": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HTMLModel", "state": { "layout": "IPY_MODEL_449028d1c04c44f29d97b7291d88336a", "style": "IPY_MODEL_ecdf7f81055c4e15ba236473d045f570", "value": " 2/2 [00:00&lt;00:00, 113.76it/s]" } }, "8e18e492dae0467cb9ebc44757c91fd6": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "DescriptionStyleModel", "state": { "description_width": "" } }, "8ea22108461f4b14a0604b3998a15d2b": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {} }, "8efb053df531477da4af56ff6de17e33": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "DescriptionStyleModel", "state": { "description_width": "" } }, "8f0558c88cfe40dbb463a84ad1ce2f3b": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {} }, "91c171a6bb0946b48a1e318a546671d9": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {} }, "92f001bc0f8f4699b5b5ab32c5726d3c": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HBoxModel", "state": { "children": [ "IPY_MODEL_c38574303a314857af145c9314ce9f70", "IPY_MODEL_687c78cdd0e445389d54faea359c4ff3" ], "layout": "IPY_MODEL_b9e96a1b01da4c4ea96d325506b5a191" } }, "94c8d3393a8446919ebf351340bacabb": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "FloatProgressModel", "state": { "bar_style": "success", "description": "Epoch: 2 - Step: 345 - Loss: 0.354 - Throughput: 589.261 seq/s: 100%", "layout": "IPY_MODEL_ee8982c161d248d2b03cd7e64d09dcf6", "max": 346, "style": "IPY_MODEL_c19d37afbe1245d0b7eeded45c3dc8ba", "value": 346 } }, "9a3803c989da4009ad588ca1283cf70e": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {} }, "9a64ac9a4c324ae789053bfbdac23803": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HTMLModel", "state": { "layout": "IPY_MODEL_519cd5a0c1b744ca82150c05427aa678", "style": "IPY_MODEL_a2cc96a7f2be4924b4eb3f2c5ca441b3", "value": " 0/3 [00:46&lt;?, ?it/s]" } }, "9a8efa64777841fa9a312d45fbe91ff1": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "FloatProgressModel", "state": { "bar_style": "success", "description": "Epochs: 100%", "layout": "IPY_MODEL_6748f83e2145488bbd3e62e5beba7511", "max": 3, "style": "IPY_MODEL_af1ce28acb6a4508834c321dc27345d3", "value": 3 } }, "9aa7e4a5d1bd44d28212358850a77f5b": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "FloatProgressModel", "state": { "bar_style": "success", "description": "Step: 112 - throughput: 3199.077 samples/s: 100%", "layout": "IPY_MODEL_5229f392494d4a8190aece85619fa3c9", "max": 113, "style": "IPY_MODEL_f5f7d6d815b64646bd467d53bed0365c", "value": 113 } }, "9ab28f98f30e440aa49206cacace1931": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "ProgressStyleModel", "state": { "description_width": "" } }, "9bfe384fbb324c5b98c58186e69eddb6": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HBoxModel", "state": { "children": [ "IPY_MODEL_c99a5d9125da4d7a80e14059376f754b", "IPY_MODEL_175769e2baa448fe9e1d0a6d0cd21f96" ], "layout": "IPY_MODEL_29d8efdddad049f2b815f0e4e7a3bbaa" } }, "9cd8f1fd9295475fb7f0f8c72b88d39a": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HBoxModel", "state": { "children": [ "IPY_MODEL_bded3a538b6440d78dd6a0b8700ae561", "IPY_MODEL_14207f0293e741eaa1153902e55fb2e6", "IPY_MODEL_ccd71a510a9949659751a3620d296894" ], "layout": "IPY_MODEL_74fe05961b97400588a29f7322e2db81" } }, "a0330f3dd67346d0ad29977b1c930c5c": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {} }, "a11cc6f332ee425f889b83dc4d47ddf4": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {} }, "a146b51061ae45ba9a99eb5149313a22": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "FloatProgressModel", "state": { "bar_style": "success", "description": "Epoch: 0 - Step: 345 - Loss: 0.791 - Throughput: 602.960 seq/s: 100%", "layout": "IPY_MODEL_91c171a6bb0946b48a1e318a546671d9", "max": 346, "style": "IPY_MODEL_e4b2f87a8b01461a8cf6478cc98091d2", "value": 346 } }, "a2cc96a7f2be4924b4eb3f2c5ca441b3": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "DescriptionStyleModel", "state": { "description_width": "" } }, "a2ceecf934a648858b7e86f5d0ca1548": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "DescriptionStyleModel", "state": { "description_width": "" } }, "a35d9b8305bb40a2a8a90f7e32bae8d8": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {} }, "a416cc948ae445088d67e79fdef5de7f": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {} }, "a53b252f9ec04ebfac3ac8de73610a31": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {} }, "a575a994ef5a4b9998bc540532de6473": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HBoxModel", "state": { "children": [ "IPY_MODEL_56545676fdee435a9862ac1da85e0618", "IPY_MODEL_c027c549f513403a8dd0e39497b0c3cc", "IPY_MODEL_9a64ac9a4c324ae789053bfbdac23803" ], "layout": "IPY_MODEL_aa45bce80ba04b66b99c79249bf05c0b" } }, "a5b48da86e624d08b6b37d37ec2938b8": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HBoxModel", "state": { "children": [ "IPY_MODEL_32efc833332242d2b9bdc5732f3f83e2", "IPY_MODEL_1434848ba8e34924a81900027b727f3f", "IPY_MODEL_6b5bc74901b34b15ae9b91bccd5f707d" ], "layout": "IPY_MODEL_0181aea86cf2426a8df2ef59eb006163" } }, "a6fd50dcdd8c4b3b99acb647debbf585": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "FloatProgressModel", "state": { "bar_style": "success", "description": "Epoch: 1 - Step: 345 - Loss: 0.701 - Throughput: 576.474 seq/s: 100%", "layout": "IPY_MODEL_0920a4e34496461681c9376f086a4f02", "max": 346, "style": "IPY_MODEL_76d1147989b0470c888d003311de0685", "value": 346 } }, "a833ce9bfc554ef7ae2c462e896b1e47": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {} }, "aa45bce80ba04b66b99c79249bf05c0b": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {} }, "acffceb0aa434c169979520f44582137": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HTMLModel", "state": { "layout": "IPY_MODEL_a53b252f9ec04ebfac3ac8de73610a31", "style": "IPY_MODEL_14adb7851e6f49a0a5692bbb67365c9b", "value": "Epochs: 100%" } }, "ad16ce91257f44eb8628fbac9eade9da": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "FloatProgressModel", "state": { "bar_style": "success", "description": "Epochs: 100%", "layout": "IPY_MODEL_d174c9c67fec40f0bed694784a211a09", "max": 3, "style": "IPY_MODEL_0872f45312754a6dbe556f9bdac4d8f4", "value": 3 } }, "aee35c3dde87426aaee859d7dab30775": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "DescriptionStyleModel", "state": { "description_width": "" } }, "af1ce28acb6a4508834c321dc27345d3": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "ProgressStyleModel", "state": { "description_width": "initial" } }, "b1ff12dbafba44d792d197a66f9741b7": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {} }, "b3fdedd4fa4d42ccb734e5476b46f461": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {} }, "b4fd01884fde493999a6d2a7be7a9140": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HBoxModel", "state": { "children": [ "IPY_MODEL_ad16ce91257f44eb8628fbac9eade9da", "IPY_MODEL_ff9adb8c41ce4e97bf8245f9dab30430" ], "layout": "IPY_MODEL_ec1436ebadf94df9a952badc66d9e51d" } }, "b72f3358acb34d979d11623b3407e5b3": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {} }, "b9e96a1b01da4c4ea96d325506b5a191": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {} }, "bb2ef00251204bddaaecac5555b9e81f": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {} }, "bb3aa322a3fb4027ac477afb4e12a637": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HTMLModel", "state": { "layout": "IPY_MODEL_e47ee37994894906a0c9ca66c6917a5c", "style": "IPY_MODEL_cc53dd8973d9476d85280ce0e40bfbfa", "value": "100%" } }, "bbcc0cb21037499b8e8a4b7adf0b38e9": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HTMLModel", "state": { "layout": "IPY_MODEL_efc5029196ca434b869a31ae9c239b57", "style": "IPY_MODEL_88bad03b52904e7285df98b1f90268b3", "value": " 113/113 [02:25&lt;00:00, 9.70it/s]" } }, "bc08f9d70ff948d5a43998903a474237": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {} }, "bd184564c54d475bb65c7cc429f284d4": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {} }, "bded3a538b6440d78dd6a0b8700ae561": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HTMLModel", "state": { "layout": "IPY_MODEL_8ea22108461f4b14a0604b3998a15d2b", "style": "IPY_MODEL_3c65035318444b85bf2ca4602daec6b2", "value": "Epoch: 0 - Step: 345 - Loss: 0.741 - Throughput: 572.293 seq/s: 100%" } }, "c027c549f513403a8dd0e39497b0c3cc": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "FloatProgressModel", "state": { "bar_style": "danger", "layout": "IPY_MODEL_7269ab27965a4204a63d42dc3bc659ce", "max": 3, "style": "IPY_MODEL_1d7981b404744ba4b791c98922818cd3" } }, "c19d37afbe1245d0b7eeded45c3dc8ba": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "ProgressStyleModel", "state": { "description_width": "initial" } }, "c38574303a314857af145c9314ce9f70": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "FloatProgressModel", "state": { "bar_style": "success", "description": "Step: 112 - throughput: 3309.235 samples/s: 100%", "layout": "IPY_MODEL_57ced115d7474cd5b2bc902eb5d61f64", "max": 113, "style": "IPY_MODEL_0b8bfb3536424f1c8093ad658dd2349c", "value": 113 } }, "c3dd127febf143c9857bead90b7a7595": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HBoxModel", "state": { "children": [ "IPY_MODEL_1261d257152a40a18368f595466aac1f", "IPY_MODEL_32bfe2d3c6fd4bc6b7c1ecc8b9f754d4" ], "layout": "IPY_MODEL_e54f4dab80634ab3ba228a78337fe16c" } }, "c440a5eef70547fdbdd428f07aa3ee68": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {} }, "c5a251e538f844e7a2e202c652b3c235": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {} }, "c681ea503e094c149100da1cbf572358": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "DescriptionStyleModel", "state": { "description_width": "" } }, "c701f026d5cb415c90f45b15a19f8b0d": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "FloatProgressModel", "state": { "bar_style": "success", "description": "100%", "layout": "IPY_MODEL_3e4f986cad4244f195cac0ce9a86f761", "max": 10570, "style": "IPY_MODEL_4892ddbb6757414fba7450c4d2895183", "value": 10570 } }, "c7b137b7e85d44f0aff6d434cbbf7df5": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HTMLModel", "state": { "layout": "IPY_MODEL_c5a251e538f844e7a2e202c652b3c235", "style": "IPY_MODEL_573939d58d194a30b18ff15aed870bec", "value": " 10570/10570 [00:21&lt;00:00, 496.49it/s]" } }, "c7f5d6085c264e66aca41b2941d66c08": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {} }, "c8b69b41ac3946d49f9ded898ac17637": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "FloatProgressModel", "state": { "bar_style": "success", "description": "Epoch: 0 - Step: 345 - Loss: 0.778 - Throughput: 579.933 seq/s: 100%", "layout": "IPY_MODEL_a833ce9bfc554ef7ae2c462e896b1e47", "max": 346, "style": "IPY_MODEL_394806f1ef1549019f5e98749e4942c8", "value": 346 } }, "c99a5d9125da4d7a80e14059376f754b": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "FloatProgressModel", "state": { "bar_style": "success", "description": "Epoch: 1 - Step: 345 - Loss: 0.664 - Throughput: 570.012 seq/s: 100%", "layout": "IPY_MODEL_e69da6300bf54664935e0fb187d7ccf1", "max": 346, "style": "IPY_MODEL_f8f79b3f429447499b402d770b144d1e", "value": 346 } }, "ca6ab92a0199442eb8c40b1c78995799": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HBoxModel", "state": { "children": [ "IPY_MODEL_6a55285da8ba42a5b42433a9641f9688", "IPY_MODEL_5688c41f7eaf437081e532ba6c98d6eb", "IPY_MODEL_bbcc0cb21037499b8e8a4b7adf0b38e9" ], "layout": "IPY_MODEL_e0df011be6ec4456a68130c56f513f99" } }, "caa86e7c770c4b65979cb298694ecc3c": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "DescriptionStyleModel", "state": { "description_width": "" } }, "cc53dd8973d9476d85280ce0e40bfbfa": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "DescriptionStyleModel", "state": { "description_width": "" } }, "cc9f73329ad44784936f9e39175e2762": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "DescriptionStyleModel", "state": { "description_width": "" } }, "cca7d495a9e24bcc9e2818500e465b91": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "DescriptionStyleModel", "state": { "description_width": "" } }, "ccd71a510a9949659751a3620d296894": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HTMLModel", "state": { "layout": "IPY_MODEL_e258e63ad0794bb88171a81134dd8b64", "style": "IPY_MODEL_dfa55558f3914ce0bb4c01ca468b43b6", "value": " 346/346 [03:12&lt;00:00, 1.74it/s]" } }, "cf662aa184e043cb9616489a749f9d41": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {} }, "d174c9c67fec40f0bed694784a211a09": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {} }, "d4613b1f9f4c4682ad7dd0bd99b64ccc": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {} }, "d582c554820a44918b0cd2fa8009bce2": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {} }, "d660db84f39b4ca693057292e4d6e2a8": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {} }, "d660f1b593e74ac695623b27234cfd14": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "DescriptionStyleModel", "state": { "description_width": "" } }, "dfa55558f3914ce0bb4c01ca468b43b6": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "DescriptionStyleModel", "state": { "description_width": "" } }, "dfe6bcd0588846ed980238cce0ce3f23": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "DescriptionStyleModel", "state": { "description_width": "" } }, "e0df011be6ec4456a68130c56f513f99": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {} }, "e1e5b2d4470c49478b1de0f8b8d512a4": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HTMLModel", "state": { "layout": "IPY_MODEL_3f50e999fa794c6fbdc6b4911051d16c", "style": "IPY_MODEL_24d0269086554bc7a14231dd7d03481a", "value": " 2/2 [00:00&lt;00:00, 110.21it/s]" } }, "e1fe817426624b48a5a87ba446def36d": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HBoxModel", "state": { "children": [ "IPY_MODEL_37954b48eb164ffe8053637d740ec665", "IPY_MODEL_066fa664116849098018e9baa066417d", "IPY_MODEL_1344574ddf854e83a8f85ff6c1b69251" ], "layout": "IPY_MODEL_76537c65be544568b91176fe115fdf4e" } }, "e258e63ad0794bb88171a81134dd8b64": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {} }, "e47ee37994894906a0c9ca66c6917a5c": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {} }, "e4b2f87a8b01461a8cf6478cc98091d2": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "ProgressStyleModel", "state": { "description_width": "initial" } }, "e54f4dab80634ab3ba228a78337fe16c": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {} }, "e69da6300bf54664935e0fb187d7ccf1": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {} }, "e6cdeccedbeb4ddca7cab4f0658850d0": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {} }, "ec1436ebadf94df9a952badc66d9e51d": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {} }, "ecdf7f81055c4e15ba236473d045f570": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "DescriptionStyleModel", "state": { "description_width": "" } }, "ee8982c161d248d2b03cd7e64d09dcf6": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {} }, "ef0f5be231f94dbeba93712230d0ad87": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HTMLModel", "state": { "layout": "IPY_MODEL_bc08f9d70ff948d5a43998903a474237", "style": "IPY_MODEL_7b6c6d3d77474913b0edc9f4db18f92b", "value": " 346/346 [04:56&lt;00:00, 1.17it/s]" } }, "ef246036865e45c3ba3637dc1ebbe561": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {} }, "efc5029196ca434b869a31ae9c239b57": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {} }, "f03846bf22224128a81aaf092477426f": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HTMLModel", "state": { "layout": "IPY_MODEL_2380d82578cc45a5ba4cf481f59354fd", "style": "IPY_MODEL_feba0c0d24794b558734331478c2d2e5", "value": " 346/346 [06:32&lt;00:00, 1.14s/it]" } }, "f262cd7b1a3c40bbbad84197e64de588": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {} }, "f4666e5976f847538a3eb3de5de251dd": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "DescriptionStyleModel", "state": { "description_width": "" } }, "f5f7d6d815b64646bd467d53bed0365c": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "ProgressStyleModel", "state": { "description_width": "initial" } }, "f8bbcce113a647cb999fe5e8017e5e5f": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HBoxModel", "state": { "children": [ "IPY_MODEL_bb3aa322a3fb4027ac477afb4e12a637", "IPY_MODEL_765e2d80436446dd9c3bbc1e68a7923e", "IPY_MODEL_8bb6baf370674ce4aa7bd827e01eec43" ], "layout": "IPY_MODEL_4f41a49093244e499f19edd0b4ce07b9" } }, "f8f79b3f429447499b402d770b144d1e": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "ProgressStyleModel", "state": { "description_width": "initial" } }, "fcd7540c0bf04eaa8be606396c899256": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HBoxModel", "state": { "children": [ "IPY_MODEL_37efcdb2a015414397f9d7551c31bd69", "IPY_MODEL_1f111acf18bf4dc4a9728746be834514", "IPY_MODEL_c7b137b7e85d44f0aff6d434cbbf7df5" ], "layout": "IPY_MODEL_4fffcf0014b243e389a16493818ae436" } }, "fce00d252d75456b9199a4313faf0963": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {} }, "fe16a84150d94088a7748459d74e7a57": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {} }, "feba0c0d24794b558734331478c2d2e5": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "DescriptionStyleModel", "state": { "description_width": "" } }, "ff9adb8c41ce4e97bf8245f9dab30430": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HTMLModel", "state": { "layout": "IPY_MODEL_3515d1d1982240f792fa7f4b8c9a4bca", "style": "IPY_MODEL_6fb85aeae69f490da819efd2624435d2", "value": " 3/3 [09:42&lt;00:00, 194.24s/it]" } } }, "version_major": 2, "version_minor": 0 } } }, "nbformat": 4, "nbformat_minor": 5 }