notebooks/115_introducing_contrastive

{ "cells": [ { "cell_type": "markdown", "id": "9174a376", "metadata": {}, "source": [ "# Contrastive Search\n", "\n", "This is a companion notebook to the [Hugging Face guest blog post entry about contrastive search](https://huggingface.co/blog/introducing-csearch). We have added the same tutorial for PyTorch and TensorFlow -- feel free to pick your framework of choice! 🤗" ] }, { "cell_type": "markdown", "id": "7443d643", "metadata": {}, "source": [ "# PyTorch Tutorial:" ] }, { "cell_type": "markdown", "id": "5f9af90e", "metadata": {}, "source": [ "## 1. Environment Installation:" ] }, { "cell_type": "code", "execution_count": null, "id": "1d4b723e", "metadata": {}, "outputs": [], "source": [ "! pip install torch\n", "! pip install \"transformers>=4.24.0\"" ] }, { "cell_type": "markdown", "id": "437cca73", "metadata": {}, "source": [ "## 2. Problems of Existing Decoding Methods:" ] }, { "cell_type": "markdown", "id": "8a749495", "metadata": {}, "source": [ "### 2.1. Deteriminstic Methods:" ] }, { "cell_type": "code", "execution_count": 2, "id": "ed8ebd1f", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.\n", "Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Output:\n", "----------------------------------------------------------------------------------------------------\n", "DeepMind Company is a leading AI research company, with a focus on deep learning and deep learning-based systems.\n", "\n", "The company's research is focused on the development of deep learning-based systems that can learn from large amounts of data, and that can be used to solve real-world problems.\n", "\n", "DeepMind's research is also used by the UK government to develop new technologies for the UK's National Health Service.\n", "\n", "DeepMind's research is also used by the UK government to develop new technologies for the UK's National Health Service.\n", "\n", "DeepMind's research is also used by the UK government to develop new technologies\n", "----------------------------------------------------------------------------------------------------\n" ] } ], "source": [ "import torch\n", "from transformers import AutoTokenizer, GPT2LMHeadModel\n", "tokenizer = AutoTokenizer.from_pretrained('gpt2-large')\n", "input_ids = tokenizer('DeepMind Company is', return_tensors='pt').input_ids\n", "model = GPT2LMHeadModel.from_pretrained('gpt2-large')\n", "\n", "output = model.generate(input_ids, max_length=128)\n", "print(\"Output:\\n\" + 100 * '-')\n", "print(tokenizer.decode(output[0], skip_special_tokens=True))\n", "print(\"\" + 100 * '-')" ] }, { "cell_type": "markdown", "id": "d2ce1307", "metadata": {}, "source": [ "### 2.2. Stochastic Methods:" ] }, { "cell_type": "code", "execution_count": 14, "id": "5d7f091d", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.\n", "Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Output:\n", "----------------------------------------------------------------------------------------------------\n", "DeepMind Company is a leading provider of AI-based research, development, and delivery of AI solutions for security, infrastructure, machine learning, communications, and so on.\"\n", "\n", "'AI is not journalism'\n", "\n", "Worse still was the message its researchers hoped would reach the world's media — that it was not really research, but rather a get-rich-quick scheme to profit from living forces' ignorance.\n", "\n", "\"The thing is, we know that people don't consciously assess the value of the others' information. They understand they will get the same on their own.\"\n", "\n", "One example? Given the details of today\n", "----------------------------------------------------------------------------------------------------\n" ] } ], "source": [ "import torch\n", "from transformers import AutoTokenizer, GPT2LMHeadModel\n", "tokenizer = AutoTokenizer.from_pretrained('gpt2-large')\n", "input_ids = tokenizer('DeepMind Company is', return_tensors='pt').input_ids\n", "model = GPT2LMHeadModel.from_pretrained('gpt2-large')\n", "\n", "torch.manual_seed(0.)\n", "output = model.generate(input_ids, do_sample=True, max_length=128, top_p=0.95, top_k=0)\n", "print(\"Output:\\n\" + 100 * '-')\n", "print(tokenizer.decode(output[0], skip_special_tokens=True))\n", "print(\"\" + 100 * '-')" ] }, { "cell_type": "markdown", "id": "67834ef1", "metadata": {}, "source": [ "## 3. Contrastive Search:" ] }, { "cell_type": "markdown", "id": "3c24e571", "metadata": {}, "source": [ "### 3.1. Generating Text with Contrastive Search:" ] }, { "cell_type": "code", "execution_count": 4, "id": "d0cb0994", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Output:\n", "----------------------------------------------------------------------------------------------------\n", "DeepMind Company is a leader in artificial intelligence (AI). We have a long history of working with companies such as Google, Facebook, Amazon, and Microsoft to build products that improve people's lives, and today we are excited to announce that DeepMind's AlphaGo program has won the game of Go, becoming the first program to defeat a professional Go player.\n", "\n", "The victory is a testament to the power of deep learning, and to the incredible work of our research team, which has been at the forefront of AI research for the past five years. AlphaGo is one of the most advanced Go programs ever created, and its performance is an important step towards the goal of human-level AI.\n", "\n", "\"This is the culmination of a decade of hard work,\" said Andy Ng, co-founder and CTO of DeepMind. \"We are thrilled to have achieved this milestone and look forward to continuing to develop AI that can be used in a wide range of applications and to help people live better lives.\"\n", "\n", "DeepMind's work on Go began in 2010, when it began to train a neural network to play Go using millions of games played by top Go players around the world. Since then, the team has refined the algorithm, adding more and more layers of reinforcement learning to make it better at recognizing patterns and making decisions based on those patterns. In the past year and a half, the team has made significant progress in the game, winning a record-tying 13 games in a row to move into the top four of the world rankings.\n", "\n", "\"The game of Go is a complex game in which players have to be very careful not to overextend their territory, and this is something that we have been able to improve over and over again,\" said Dr. Demis Hassabis, co-founder and Chief Scientific Officer of DeepMind. \"We are very proud of our team's work, and we hope that it will inspire others to take the next step in their research and apply the same techniques to other problems.\"\n", "\n", "In addition to the win in Go, DeepMind has also developed an AI system that can learn to play a number of different games, including poker, Go, and chess. This AI system, called Tarsier, was developed in partnership with Carnegie Mellon University and the University of California, Berkeley, and is being used to teach computer vision and machine learning to identify objects in images and recognize speech in natural language. Tarsier has been trained to play the game of Go and other games on a\n", "----------------------------------------------------------------------------------------------------\n" ] } ], "source": [ "import torch\n", "from transformers import GPT2Tokenizer, GPT2LMHeadModel\n", "model_name = 'gpt2-large'\n", "tokenizer = GPT2Tokenizer.from_pretrained(model_name)\n", "model = GPT2LMHeadModel.from_pretrained(model_name, pad_token_id=tokenizer.eos_token_id)\n", "model.eval()\n", "\n", "# prepare the prefix\n", "prefix_text = r'DeepMind Company is'\n", "inputs = tokenizer(prefix_text, return_tensors='pt').input_ids\n", "\n", "# generate the result with contrastive search\n", "output = model.generate(input_ids, penalty_alpha=0.6, top_k=4, max_length=512)\n", "print(\"Output:\\n\" + 100 * '-')\n", "print(tokenizer.decode(output[0], skip_special_tokens=True))\n", "print(\"\" + 100 * '-')" ] }, { "cell_type": "markdown", "id": "adc3ebe0", "metadata": {}, "source": [ "## 4. More Generated Examples:" ] }, { "cell_type": "markdown", "id": "515f4a4d", "metadata": {}, "source": [ "### 4.1. Example One - GPT-2:" ] }, { "cell_type": "code", "execution_count": 12, "id": "c5581e4e", "metadata": {}, "outputs": [], "source": [ "# Load the language model and prepare the prefix text:\n", "import torch\n", "from transformers import AutoTokenizer, GPT2LMHeadModel\n", "tokenizer = AutoTokenizer.from_pretrained('gpt2-large')\n", "model = GPT2LMHeadModel.from_pretrained('gpt2-large')\n", "prefix_text = r\"In a shocking finding, scientist discovered a herd of unicorns living in a remote, previously unexplored valley, in the Andes Mountains. Even more surprising to the researchers was the fact that the unicorns spoke perfect English.\"\n", "input_ids = tokenizer(prefix_text, return_tensors='pt').input_ids" ] }, { "cell_type": "markdown", "id": "7079d5b0", "metadata": {}, "source": [ "#### 4.1.1. Generating Text with Greedy Search:" ] }, { "cell_type": "code", "execution_count": 6, "id": "52ff51d5", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.\n", "Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Output:\n", "----------------------------------------------------------------------------------------------------\n", "In a shocking finding, scientist discovered a herd of unicorns living in a remote, previously unexplored valley, in the Andes Mountains. Even more surprising to the researchers was the fact that the unicorns spoke perfect English.\n", "\n", "The researchers, led by Dr. David R. Williams of the University of California, Santa Cruz, discovered the unicorns in the Andes Mountains of Peru. The area is known for its unique geology and is home to a number of rare species of animals.\n", "\n", "The researchers found the unicorns in the Andes Mountains of Peru.\n", "\n", "\"We were surprised to find that the unicorns were able to communicate with each other,\" Williams said. \"We were also surprised to find that they were able to communicate in English.\"\n", "\n", "The researchers believe that the unicorns are descendants of the ancient Incas, who lived in the area around 2,000 years ago.\n", "\n", "\"The Incas were the first people to use the Andes Mountains as a refuge from the harsh climate of the Andes,\" Williams said. \"They were also the first people to use the Andes Mountains as a place to hunt and gather food.\"\n", "\n", "The researchers believe that the unicorns are descendants of the ancient Incas, who lived in the area around 2,000 years ago.\n", "\n", "\"The Incas were the first people to use the Andes Mountains as a refuge from the harsh climate of the Andes,\" Williams said. \"They were also the first people to use the Andes Mountains as a place to hunt and gather food.\"\n", "\n", "The researchers believe that the unicorns are descendants of the ancient Incas, who lived in the area around 2,000 years ago.\n", "\n", "\"The Incas were the first people to use the Andes Mountains as a refuge from the harsh climate of the Andes,\" Williams said. \"They were also the first people to use the Andes Mountains as a place to hunt and gather food.\"\n", "\n", "The researchers believe that the unicorns are descendants of the ancient Incas, who lived in the area around 2,000 years ago.\n", "\n", "\"The Incas were the first people to use the Andes Mountains as a refuge from the harsh climate of the Andes,\" Williams said. \"They were also the first people to use the Andes Mountains as a place to hunt and gather food.\"\n", "\n", "The researchers believe that the unicorns are descendants of the ancient Incas, who lived in the area around 2,000 years ago.\n", "----------------------------------------------------------------------------------------------------\n" ] } ], "source": [ "output = model.generate(input_ids, max_length=512)\n", "print(\"Output:\\n\" + 100 * '-')\n", "print(tokenizer.decode(output[0], skip_special_tokens=True))\n", "print(\"\" + 100 * '-')" ] }, { "cell_type": "markdown", "id": "07f411e8", "metadata": {}, "source": [ "#### 4.1.2. Generating Text with Nucleus Sampling:" ] }, { "cell_type": "code", "execution_count": 13, "id": "284e5ff5", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.\n", "Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Output:\n", "----------------------------------------------------------------------------------------------------\n", "In a shocking finding, scientist discovered a herd of unicorns living in a remote, previously unexplored valley, in the Andes Mountains. Even more surprising to the researchers was the fact that the unicorns spoke perfect English. The study was published in the Journal of Zoology in March 2016.\n", "\n", "Polygynous mammals such as unicorns have remained largely unknown to science. Professor Gustavo Giacota, from the University of Oxford who led the study, said that they had been documented as far as Eastern Siberia in Russia, but had only been seen a handful of times in the Gobi Desert.\n", "\n", "Tiny animals with pale and shiny coats live in the presence of human beings and are hardly likely to be victims of any cruelty. However, there is some evidence of the condition occurring in both humans and animals in remote regions, which might have similarities to \"black moles\" that coexist on the skin.\n", "\n", "It is thought that Unicorns could be inside themselves, that they have different scents depending on their current environment, or just fall out and there are plenty of legends of how they have survived. Experts speculate that the moths and other animals could be remnants of the Yezidi Isis and Charon, which literally is both the word which means great bird, and the Greek word for sound. It is said that the Isis and Charon taught their young the use of voice in the form of calling out to others.\n", "\n", "The scientists think that it could be ancient folklore that has survived and is no longer attributed to a real entity.\n", "----------------------------------------------------------------------------------------------------\n" ] } ], "source": [ "torch.manual_seed(0.)\n", "output = model.generate(input_ids, do_sample=True, max_length=512, top_p=0.95, top_k=0)\n", "print(\"Output:\\n\" + 100 * '-')\n", "print(tokenizer.decode(output[0], skip_special_tokens=True))\n", "print(\"\" + 100 * '-')" ] }, { "cell_type": "markdown", "id": "0af9b61d", "metadata": {}, "source": [ "#### 4.1.3. Generating Text with Contrastive Search:" ] }, { "cell_type": "code", "execution_count": 8, "id": "83ea2c55", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.\n", "Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Output:\n", "----------------------------------------------------------------------------------------------------\n", "In a shocking finding, scientist discovered a herd of unicorns living in a remote, previously unexplored valley, in the Andes Mountains. Even more surprising to the researchers was the fact that the unicorns spoke perfect English.\n", "\n", "According to the BBC, a team of scientists led by Dr David MacKay, from the University of Bristol, spent two years searching for the unicorn herd, which they discovered during a survey of the area.\n", "\n", "\"It's a very rare find,\" MacKay told the BBC. \"There are a few in the Himalayas, but this is the first time we've been able to find one in such a remote area.\"\n", "\n", "The team was surprised to find a herd of unicorns living in a region that has been known to be a hotbed of poaching, with many of the animals poached for their horns, which are used in traditional Chinese medicine to treat everything from rheumatism to cancer.\n", "\n", "\"We knew that the area was rich in rhino horn, but we had no idea how many there were, or what they were doing there,\" MacKay said. \"This is an area of high poaching pressure, and we wanted to find out what was going on.\"\n", "\n", "In order to do so, the team used GPS collars to track the animals as they moved around the mountain and the surrounding area. The GPS data was then compared with information gathered from local villagers, who had a wealth of information about the animals' movements, including where they were eating, what they were doing at night, and how much time they spent in the mountains each day.\n", "\n", "After analyzing the data, the team determined that the herd consisted of at least three species of unicorns, including a male and two females. One of the females was the mother of the male, and the other two were her daughters. All three had the same horn color, which is believed to be a sign of purity in the animal kingdom.\n", "\n", "While the discovery is exciting, it's not the first time scientists have discovered an animal that speaks English. Last year, scientists discovered a species of porcupine that can be heard by humans, and has been dubbed \"Porcupine Man\" for his ability to converse with the human race.\n", "----------------------------------------------------------------------------------------------------\n" ] } ], "source": [ "output = model.generate(input_ids, max_length=512, penalty_alpha=0.6, top_k=4)\n", "print(\"Output:\\n\" + 100 * '-')\n", "print(tokenizer.decode(output[0], skip_special_tokens=True))\n", "print(\"\" + 100 * '-')" ] }, { "cell_type": "markdown", "id": "69f6fd3e", "metadata": {}, "source": [ "### 4.2. Example Two - OPT:" ] }, { "cell_type": "code", "execution_count": 15, "id": "172d023b", "metadata": {}, "outputs": [], "source": [ "# Load the language model and prepare the prefix text:\n", "import torch\n", "from transformers import AutoTokenizer, OPTForCausalLM\n", "model_name = r'facebook/opt-1.3b'\n", "tokenizer = AutoTokenizer.from_pretrained(model_name)\n", "model = OPTForCausalLM.from_pretrained(model_name)\n", "\n", "prefix_text = r\"Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously.\"\n", "input_ids = tokenizer(prefix_text, return_tensors='pt').input_ids" ] }, { "cell_type": "markdown", "id": "70925bab", "metadata": {}, "source": [ "#### 4.2.1. Generating Text with Greedy Search:" ] }, { "cell_type": "code", "execution_count": 16, "id": "66ed9773", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Output:\n", "----------------------------------------------------------------------------------------------------\n", "Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We show that the residual learning framework can be used to train deep neural networks that are significantly more difficult to train than those used previously. We also show that the residual learning framework can be used to train deep neural networks that are significantly more difficult to train than those used previously.\n", "\n", "The paper presents a new residual learning framework for deep neural networks that is based on the concept of residuals. The residuals are the residuals of the network that are not used in the training process. The residuals are computed by taking the residuals of the network that are used in the training process and subtracting the residuals of the network that are not used in the training process. The residuals are then used to train the network. The residuals are computed by taking the residuals of the network that are used in the training process and subtracting the residuals of the network that are not used in the training process. The residuals are then used to train the network. The residuals are computed by taking the residuals of the network that are used in the training process and subtracting the residuals of the\n", "----------------------------------------------------------------------------------------------------\n" ] } ], "source": [ "output = model.generate(input_ids, max_length=256)\n", "print(\"Output:\\n\" + 100 * '-')\n", "print(tokenizer.decode(output[0], skip_special_tokens=True))\n", "print(\"\" + 100 * '-')" ] }, { "cell_type": "markdown", "id": "8894ed1a", "metadata": {}, "source": [ "#### 4.2.2. Generating Text with Nucleus Sampling:" ] }, { "cell_type": "code", "execution_count": 17, "id": "2059c323", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Output:\n", "----------------------------------------------------------------------------------------------------\n", "Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. The theory focuses on several aspects of learning, including the dynamics of replicative and non-replicative aspects of learning. This framework emphasizes learning by entropy. New randomized algorithms enable training networks with residual learning, so that deep networks can be deployed as reliably and as efficiently as their more conventional counterparts.\n", "----------------------------------------------------------------------------------------------------\n" ] } ], "source": [ "torch.manual_seed(0.)\n", "output = model.generate(input_ids, do_sample=True, max_length=256, top_p=0.95, top_k=0)\n", "print(\"Output:\\n\" + 100 * '-')\n", "print(tokenizer.decode(output[0], skip_special_tokens=True))\n", "print(\"\" + 100 * '-')" ] }, { "cell_type": "markdown", "id": "98f17c7e", "metadata": {}, "source": [ "#### 4.2.3. Generating Text with Contrastive Search:" ] }, { "cell_type": "code", "execution_count": 18, "id": "63bac3b8", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Output:\n", "----------------------------------------------------------------------------------------------------\n", "Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously.\n", "\n", "In this paper, we propose a model-based residual learning (MBRL) framework that is based on neural networks trained on data that is sparse in terms of dimensionality (e.g., 1, 2, 3, etc.). The network parameters are chosen such that there is a high probability of convergence, i.e., the number of iterations is large enough to minimize the variance of the residuals. This is achieved by training the network on a set of training data, in which the data is sparse in terms of dimensionality, and then discarding the nonparametric part of the data after training is complete.\n", "\n", "We show that MBRL outperforms other methods for deep reinforcement learning (RL) and deep convolutional neural networks (CNNs) by a factor of at least 2. In addition, we show that, compared to CNNs, MBRL performs better in two-dimensional (2D) and three-dimensional (3D) cases.\n", "----------------------------------------------------------------------------------------------------\n" ] } ], "source": [ "output = model.generate(input_ids, max_length=256, penalty_alpha=0.6, top_k=6)\n", "print(\"Output:\\n\" + 100 * '-')\n", "print(tokenizer.decode(output[0], skip_special_tokens=True))\n", "print(\"\" + 100 * '-')" ] }, { "cell_type": "markdown", "id": "1dc282a1", "metadata": {}, "source": [ "# TensorFlow\n", "\n", "⚠️ The TensorFlow version of Contrastive Search is not yet released -- it will be part of v4.25. Until then, if you install the dev version, you can have access to it. \n", "\n", "To make it up for this inconvinience on the TF front, we're delighted to announce that Contrastive Search is XLA-compatible, meaning that you can compile your generation call to XLA to get significant speed ups 🔥 See our [TF XLA Generation blog post](https://huggingface.co/blog/tf-xla-generate) for more information." ] }, { "cell_type": "markdown", "id": "722e2d5a", "metadata": {}, "source": [ "## 1. Environment Installation:" ] }, { "cell_type": "code", "execution_count": null, "id": "f828c3bb", "metadata": {}, "outputs": [], "source": [ "! pip install tensorflow\n", "# Constrastive Search for TensorFlow will be released in v4.25.0. Meanwhile, you can install the dev version:\n", "# ! pip install \"transformers>=4.25.0\"\n", "! pip install git+https://github.com/huggingface/transformers.git" ] }, { "cell_type": "markdown", "id": "4c211eb0", "metadata": {}, "source": [ "## 2. Problems of Existing Decoding Methods:" ] }, { "cell_type": "markdown", "id": "d984779d", "metadata": {}, "source": [ "### 2.1. Deteriminstic Methods:" ] }, { "cell_type": "code", "execution_count": 19, "id": "5474b78c", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "All model checkpoint layers were used when initializing TFGPT2LMHeadModel.\n", "\n", "All the layers of TFGPT2LMHeadModel were initialized from the model checkpoint at gpt2-large.\n", "If your task is similar to the task the model of the checkpoint was trained on, you can already use TFGPT2LMHeadModel for predictions without further training.\n", "The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.\n", "Setting `pad_token_id` to 50256 (first `eos_token_id`) to generate sequence\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Output:\n", "----------------------------------------------------------------------------------------------------\n", "DeepMind Company is a leading AI research company, with a focus on deep learning and deep learning-based systems.\n", "\n", "The company's research is focused on the development of deep learning-based systems that can learn from large amounts of data, and that can be used to solve real-world problems.\n", "\n", "DeepMind's research is also used by the UK government to develop new technologies for the UK's National Health Service.\n", "\n", "DeepMind's research is also used by the UK government to develop new technologies for the UK's National Health Service.\n", "\n", "DeepMind's research is also used by the UK government to develop new technologies\n", "----------------------------------------------------------------------------------------------------\n" ] } ], "source": [ "import tensorflow as tf\n", "from transformers import AutoTokenizer, TFGPT2LMHeadModel\n", "tokenizer = AutoTokenizer.from_pretrained('gpt2-large')\n", "input_ids = tokenizer('DeepMind Company is', return_tensors='tf').input_ids\n", "model = TFGPT2LMHeadModel.from_pretrained('gpt2-large')\n", "\n", "output = model.generate(input_ids, max_length=128)\n", "print(\"Output:\\n\" + 100 * '-')\n", "print(tokenizer.decode(output[0], skip_special_tokens=True))\n", "print(\"\" + 100 * '-')" ] }, { "cell_type": "markdown", "id": "08d32600", "metadata": {}, "source": [ "### 2.2. Stochastic Methods:" ] }, { "cell_type": "code", "execution_count": 20, "id": "46479e78", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "All model checkpoint layers were used when initializing TFGPT2LMHeadModel.\n", "\n", "All the layers of TFGPT2LMHeadModel were initialized from the model checkpoint at gpt2-large.\n", "If your task is similar to the task the model of the checkpoint was trained on, you can already use TFGPT2LMHeadModel for predictions without further training.\n", "The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.\n", "Setting `pad_token_id` to 50256 (first `eos_token_id`) to generate sequence\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Output:\n", "----------------------------------------------------------------------------------------------------\n", "DeepMind Company is the Google on Earth platform, a communication platform designed to improve the tools humanity can use to get an understanding of the human brain.\n", "\n", "In May it released AlphaGo, a artificial intelligence program that defeated the world chess champion last month at the Go Global Chess Championship.\n", "\n", "With AlphaGo's victory, three others defeated top players, including two of the World Go champions, at the world championship.\n", "\n", "\"Neural networks are a next-generation generative neural network model of human thought and cognition,\" Ma said. \"When using neural networks, one cannot just rely on a few hundred training examples as this\n", "----------------------------------------------------------------------------------------------------\n" ] } ], "source": [ "import tensorflow as tf\n", "from transformers import AutoTokenizer, TFGPT2LMHeadModel\n", "tokenizer = AutoTokenizer.from_pretrained('gpt2-large')\n", "input_ids = tokenizer('DeepMind Company is', return_tensors='tf').input_ids\n", "model = TFGPT2LMHeadModel.from_pretrained('gpt2-large')\n", "\n", "tf.random.set_seed(0)\n", "output = model.generate(input_ids, do_sample=True, max_length=128, top_p=0.95, top_k=0)\n", "print(\"Output:\\n\" + 100 * '-')\n", "print(tokenizer.decode(output[0], skip_special_tokens=True))\n", "print(\"\" + 100 * '-')" ] }, { "cell_type": "markdown", "id": "24793791", "metadata": {}, "source": [ "## 3. Contrastive Search:" ] }, { "cell_type": "markdown", "id": "a0187968", "metadata": {}, "source": [ "### 3.1. Generating Text with Contrastive Search:" ] }, { "cell_type": "code", "execution_count": 21, "id": "efaabbb6", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "All model checkpoint layers were used when initializing TFGPT2LMHeadModel.\n", "\n", "All the layers of TFGPT2LMHeadModel were initialized from the model checkpoint at gpt2-large.\n", "If your task is similar to the task the model of the checkpoint was trained on, you can already use TFGPT2LMHeadModel for predictions without further training.\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Output:\n", "----------------------------------------------------------------------------------------------------\n", "DeepMind Company is a leader in artificial intelligence (AI). We have a long history of working with companies such as Google, Facebook, Amazon, and Microsoft to build products that improve people's lives, and today we are excited to announce that DeepMind's AlphaGo program has won the game of Go, becoming the first program to defeat a professional Go player.\n", "\n", "The victory is a testament to the power of deep learning, and to the incredible work of our research team, which has been at the forefront of AI research for the past five years. AlphaGo is one of the most advanced Go programs ever created, and its performance is an important step towards the goal of human-level AI.\n", "\n", "\"This is the culmination of a decade of hard work,\" said Andy Ng, co-founder and CTO of DeepMind. \"We are thrilled to have achieved this milestone and look forward to continuing to develop AI that can be used in a wide range of applications and to help people live better lives.\"\n", "\n", "DeepMind's work on Go began in 2010, when it began to train a neural network to play Go using millions of games played by top Go players around the world. Since then, the team has refined the algorithm, adding more and more layers of reinforcement learning to make it better at recognizing patterns and making decisions based on those patterns. In the past year and a half, the team has made significant progress in the game, winning a record-tying 13 games in a row to move into the top four of the world rankings.\n", "\n", "\"The game of Go is a complex game in which players have to be very careful not to overextend their territory, and this is something that we have been able to improve over and over again,\" said Dr. Demis Hassabis, co-founder and Chief Scientific Officer of DeepMind. \"We are very proud of our team's work, and we hope that it will inspire others to take the next step in their research and apply the same techniques to other problems.\"\n", "\n", "In addition to the win in Go, DeepMind has also developed an AI system that can learn to play a number of different games, including poker, Go, and chess. This AI system, called Tarsier, was developed in partnership with Carnegie Mellon University and the University of California, Berkeley, and is being used to teach computer vision and machine learning to identify objects in images and recognize speech in natural language. Tarsier has been trained to play the game of Go and other games on a\n", "----------------------------------------------------------------------------------------------------\n" ] } ], "source": [ "import tensorflow\n", "from transformers import GPT2Tokenizer, TFGPT2LMHeadModel\n", "model_name = 'gpt2-large'\n", "tokenizer = GPT2Tokenizer.from_pretrained(model_name)\n", "model = TFGPT2LMHeadModel.from_pretrained(model_name, pad_token_id=tokenizer.eos_token_id)\n", "\n", "# prepare the prefix\n", "prefix_text = r'DeepMind Company is'\n", "inputs = tokenizer(prefix_text, return_tensors='tf').input_ids\n", "\n", "# generate the result with contrastive search\n", "output = model.generate(input_ids, penalty_alpha=0.6, top_k=4, max_length=512)\n", "print(\"Output:\\n\" + 100 * '-')\n", "print(tokenizer.decode(output[0], skip_special_tokens=True))\n", "print(\"\" + 100 * '-')" ] }, { "cell_type": "markdown", "id": "bdc43804", "metadata": {}, "source": [ "## 4. More Generated Examples:" ] }, { "cell_type": "markdown", "id": "ffbe19fd", "metadata": {}, "source": [ "### 4.1. Example One - GPT-2:" ] }, { "cell_type": "code", "execution_count": 22, "id": "3403ca7a", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "All model checkpoint layers were used when initializing TFGPT2LMHeadModel.\n", "\n", "All the layers of TFGPT2LMHeadModel were initialized from the model checkpoint at gpt2-large.\n", "If your task is similar to the task the model of the checkpoint was trained on, you can already use TFGPT2LMHeadModel for predictions without further training.\n" ] } ], "source": [ "# Load the language model and prepare the prefix text:\n", "import tensorflow as tf\n", "from transformers import AutoTokenizer, TFGPT2LMHeadModel\n", "tokenizer = AutoTokenizer.from_pretrained('gpt2-large')\n", "model = TFGPT2LMHeadModel.from_pretrained('gpt2-large')\n", "prefix_text = r\"In a shocking finding, scientist discovered a herd of unicorns living in a remote, previously unexplored valley, in the Andes Mountains. Even more surprising to the researchers was the fact that the unicorns spoke perfect English.\"\n", "input_ids = tokenizer(prefix_text, return_tensors='tf').input_ids" ] }, { "cell_type": "markdown", "id": "acc61421", "metadata": {}, "source": [ "#### 4.1.1. Generating Text with Greedy Search:" ] }, { "cell_type": "code", "execution_count": 23, "id": "84e21ff3", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.\n", "Setting `pad_token_id` to 50256 (first `eos_token_id`) to generate sequence\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Output:\n", "----------------------------------------------------------------------------------------------------\n", "In a shocking finding, scientist discovered a herd of unicorns living in a remote, previously unexplored valley, in the Andes Mountains. Even more surprising to the researchers was the fact that the unicorns spoke perfect English.\n", "\n", "The researchers, led by Dr. David R. Williams of the University of California, Santa Cruz, discovered the unicorns in the Andes Mountains of Peru. The area is known for its unique geology and is home to a number of rare species of animals.\n", "\n", "The researchers found the unicorns in the Andes Mountains of Peru.\n", "\n", "\"We were surprised to find that the unicorns were able to communicate with each other,\" Williams said. \"We were also surprised to find that they were able to communicate in English.\"\n", "\n", "The researchers believe that the unicorns are descendants of the ancient Incas, who lived in the area around 2,000 years ago.\n", "\n", "\"The Incas were the first people to use the Andes Mountains as a refuge from the harsh climate of the Andes,\" Williams said. \"They were also the first people to use the Andes Mountains as a place to hunt and gather food.\"\n", "\n", "The researchers believe that the unicorns are descendants of the ancient Incas, who lived in the area around 2,000 years ago.\n", "\n", "\"The Incas were the first people to use the Andes Mountains as a refuge from the harsh climate of the Andes,\" Williams said. \"They were also the first people to use the Andes Mountains as a place to hunt and gather food.\"\n", "\n", "The researchers believe that the unicorns are descendants of the ancient Incas, who lived in the area around 2,000 years ago.\n", "\n", "\"The Incas were the first people to use the Andes Mountains as a refuge from the harsh climate of the Andes,\" Williams said. \"They were also the first people to use the Andes Mountains as a place to hunt and gather food.\"\n", "\n", "The researchers believe that the unicorns are descendants of the ancient Incas, who lived in the area around 2,000 years ago.\n", "\n", "\"The Incas were the first people to use the Andes Mountains as a refuge from the harsh climate of the Andes,\" Williams said. \"They were also the first people to use the Andes Mountains as a place to hunt and gather food.\"\n", "\n", "The researchers believe that the unicorns are descendants of the ancient Incas, who lived in the area around 2,000 years ago.\n", "----------------------------------------------------------------------------------------------------\n" ] } ], "source": [ "output = model.generate(input_ids, max_length=512)\n", "print(\"Output:\\n\" + 100 * '-')\n", "print(tokenizer.decode(output[0], skip_special_tokens=True))\n", "print(\"\" + 100 * '-')" ] }, { "cell_type": "markdown", "id": "28037d33", "metadata": {}, "source": [ "#### 4.1.2. Generating Text with Nucleus Sampling:" ] }, { "cell_type": "code", "execution_count": 24, "id": "561a8adb", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.\n", "Setting `pad_token_id` to 50256 (first `eos_token_id`) to generate sequence\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Output:\n", "----------------------------------------------------------------------------------------------------\n", "In a shocking finding, scientist discovered a herd of unicorns living in a remote, previously unexplored valley, in the Andes Mountains. Even more surprising to the researchers was the fact that the unicorns spoke perfect English.\n", "\n", "The findings of the study, published in the journal Current Biology, are a result of an attempt to track down the pinniped unicorns by placing itself in the head of a Ramaolin Indian bear bear, in order to tease out information about their behavior. The aim of the study was to calculate the levels of fertility, which is some of the most mysterious phenomena in nature, says Prof. Alfredo Marino, not only about unicorns but also about human life.\n", "\n", "The study happened in 2000, when Marino was conducting a Ph.D. thesis on the genetics of the tribe.\n", "\n", "Confronted with a very huge body of results by this research, Marino, to this day, knows nothing about them. \"I still remember when the results were announced.\" Sigh. So he searched for a mystery, for a new point of view to impart the wisdom of his time, what was good and what was bad about what had been a status quo.\n", "\n", "And so, the interview began.\n", "\n", "The name Muir is written all over this report. The pinniped, a mythical creature with 8 head or tails, was sacred to the Indian tribes of South America. Only among them could a mighty pinniped ride on a steed of fire. All other tribes considered the pinniped symbol of bad luck. There is, Marino says, a great deal of gossiping about the pinniped in Indian circles.\n", "\n", "\"It wasn't even a scientific curiosity. The main point was that this [prize] really justified the experiment.\" Each animal which received one of the pinnipeds was given, to him, the status of valedictorian of the tribe. You can see what the titles \"priest\" and \"astrologer\" looked like in the pictures here.\n", "\n", "Colonization was not in the cards for most of the world's cultures. There are gaps in human civilization. Even the link between the chicken and the pillion – to produce a pillion, you have to do a chicken dance: in India, a lapa katchi – chicken and lapa – get you royalty.\n", "\n", "Muni (part of Papua New Guinea) is quite interesting in this context. Lots of tribes who lived in\n", "----------------------------------------------------------------------------------------------------\n" ] } ], "source": [ "tf.random.set_seed(0)\n", "output = model.generate(input_ids, do_sample=True, max_length=512, top_p=0.95, top_k=0)\n", "print(\"Output:\\n\" + 100 * '-')\n", "print(tokenizer.decode(output[0], skip_special_tokens=True))\n", "print(\"\" + 100 * '-')" ] }, { "cell_type": "markdown", "id": "f8bd54ed", "metadata": {}, "source": [ "#### 4.1.3. Generating Text with Contrastive Search:" ] }, { "cell_type": "code", "execution_count": 25, "id": "fc3af8db", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.\n", "Setting `pad_token_id` to 50256 (first `eos_token_id`) to generate sequence\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Output:\n", "----------------------------------------------------------------------------------------------------\n", "In a shocking finding, scientist discovered a herd of unicorns living in a remote, previously unexplored valley, in the Andes Mountains. Even more surprising to the researchers was the fact that the unicorns spoke perfect English.\n", "\n", "According to the BBC, a team of scientists led by Dr David MacKay, from the University of Bristol, spent two years searching for the unicorn herd, which they discovered during a survey of the area.\n", "\n", "\"It's a very rare find,\" MacKay told the BBC. \"There are a few in the Himalayas, but this is the first time we've been able to find one in such a remote area.\"\n", "\n", "The team was surprised to find a herd of unicorns living in a region that has been known to be a hotbed of poaching, with many of the animals poached for their horns, which are used in traditional Chinese medicine to treat everything from rheumatism to cancer.\n", "\n", "\"We knew that the area was rich in rhino horn, but we had no idea how many there were, or what they were doing there,\" MacKay said. \"This is an area of high poaching pressure, and we wanted to find out what was going on.\"\n", "\n", "In order to do so, the team used GPS collars to track the animals as they moved around the mountain and the surrounding area. The GPS data was then compared with information gathered from local villagers, who had a wealth of information about the animals' movements, including where they were eating, what they were doing at night, and how much time they spent in the mountains each day.\n", "\n", "After analyzing the data, the team determined that the herd consisted of at least three species of unicorns, including a male and two females. One of the females was the mother of the male, and the other two were her daughters. All three had the same horn color, which is believed to be a sign of purity in the animal kingdom.\n", "\n", "While the discovery is exciting, it's not the first time scientists have discovered an animal that speaks English. Last year, scientists discovered a species of porcupine that can be heard by humans, and has been dubbed \"Porcupine Man\" for his ability to converse with the human race.\n", "----------------------------------------------------------------------------------------------------\n" ] } ], "source": [ "output = model.generate(input_ids, max_length=512, penalty_alpha=0.6, top_k=4)\n", "print(\"Output:\\n\" + 100 * '-')\n", "print(tokenizer.decode(output[0], skip_special_tokens=True))\n", "print(\"\" + 100 * '-')" ] }, { "cell_type": "markdown", "id": "c3897c15", "metadata": {}, "source": [ "### 4.2. Example Two - OPT:" ] }, { "cell_type": "code", "execution_count": 27, "id": "36fff202", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "All model checkpoint layers were used when initializing TFOPTForCausalLM.\n", "\n", "All the layers of TFOPTForCausalLM were initialized from the model checkpoint at facebook/opt-1.3b.\n", "If your task is similar to the task the model of the checkpoint was trained on, you can already use TFOPTForCausalLM for predictions without further training.\n" ] } ], "source": [ "# Load the language model and prepare the prefix text:\n", "import tensorflow as tf\n", "from transformers import AutoTokenizer, TFOPTForCausalLM\n", "model_name = r'facebook/opt-1.3b'\n", "tokenizer = AutoTokenizer.from_pretrained(model_name)\n", "model = TFOPTForCausalLM.from_pretrained(model_name)\n", "\n", "prefix_text = r\"Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously.\"\n", "input_ids = tokenizer(prefix_text, return_tensors='tf').input_ids" ] }, { "cell_type": "markdown", "id": "d29ccf0c", "metadata": {}, "source": [ "#### 4.2.1. Generating Text with Greedy Search:" ] }, { "cell_type": "code", "execution_count": 28, "id": "886a30f0", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Output:\n", "----------------------------------------------------------------------------------------------------\n", "Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We show that the residual learning framework can be used to train deep neural networks that are significantly more difficult to train than those used previously. We also show that the residual learning framework can be used to train deep neural networks that are significantly more difficult to train than those used previously.\n", "\n", "The paper presents a new residual learning framework for deep neural networks that is based on the concept of residuals. The residuals are the residuals of the network that are not used in the training process. The residuals are computed by taking the residuals of the network that are used in the training process and subtracting the residuals of the network that are not used in the training process. The residuals are then used to train the network. The residuals are computed by taking the residuals of the network that are used in the training process and subtracting the residuals of the network that are not used in the training process. The residuals are then used to train the network. The residuals are computed by taking the residuals of the network that are used in the training process and subtracting the residuals of the\n", "----------------------------------------------------------------------------------------------------\n" ] } ], "source": [ "output = model.generate(input_ids, max_length=256)\n", "print(\"Output:\\n\" + 100 * '-')\n", "print(tokenizer.decode(output[0], skip_special_tokens=True))\n", "print(\"\" + 100 * '-')" ] }, { "cell_type": "markdown", "id": "ba5d9c77", "metadata": {}, "source": [ "#### 4.2.2. Generating Text with Nucleus Sampling:" ] }, { "cell_type": "code", "execution_count": 29, "id": "b90cc19f", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Output:\n", "----------------------------------------------------------------------------------------------------\n", "Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. Experimental results show that our method does in fact achieve deep networks with a lower training time and less error.\n", "\n", "Although there are numerous cross-validation methods available to improve the reproducibility of one-shot experiments with no sample replication, the underlying mechanism behind these methods is usually poorly understood. In this paper we report an agglutinin-like cell sorting (ASYSC) method that achieves significant pooled and reproducible results for phage display experiments with a distinctive comparative advantage.\n", "\n", "Overall, visual recall performance was unchanged in young children over 10 years, when the model is first trained. Here, we show how teaching technology and retention of basic concepts can influence a child’s visual recall, and how children’s activities could be adapted for retention or revision by using computer-mediated learning (CML). Our approach intends to reduce the accessibility for children to materials, thereby increasing the retention of basic concepts.\n", "\n", "BACKGROUND: Low response on cognitive task has been observed in some young children, mostly due to confusion. The purpose of this study is to find out if this difficulty can\n", "----------------------------------------------------------------------------------------------------\n" ] } ], "source": [ "tf.random.set_seed(0)\n", "output = model.generate(input_ids, do_sample=True, max_length=256, top_p=0.95, top_k=0)\n", "print(\"Output:\\n\" + 100 * '-')\n", "print(tokenizer.decode(output[0], skip_special_tokens=True))\n", "print(\"\" + 100 * '-')" ] }, { "cell_type": "markdown", "id": "1c90077e", "metadata": {}, "source": [ "#### 4.2.3. Generating Text with Contrastive Search:" ] }, { "cell_type": "code", "execution_count": 30, "id": "b040a5a7", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Output:\n", "----------------------------------------------------------------------------------------------------\n", "Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously.\n", "\n", "In this paper, we propose a model-based residual learning (MBRL) framework that is based on neural networks trained on data that is sparse in terms of dimensionality (e.g., 1, 2, 3, etc.). The network parameters are chosen such that there is a high probability of convergence, i.e., the number of iterations is large enough to minimize the variance of the residuals. This is achieved by training the network on a set of training data, in which the data is sparse in terms of dimensionality, and then discarding the nonparametric part of the data after training is complete.\n", "\n", "We show that MBRL outperforms other methods for deep reinforcement learning (RL) and deep convolutional neural networks (CNNs) by a factor of at least 2. In addition, we show that, compared to CNNs, MBRL performs better in two-dimensional (2D) and three-dimensional (3D) cases.\n", "----------------------------------------------------------------------------------------------------\n" ] } ], "source": [ "output = model.generate(input_ids, max_length=256, penalty_alpha=0.6, top_k=6)\n", "print(\"Output:\\n\" + 100 * '-')\n", "print(tokenizer.decode(output[0], skip_special_tokens=True))\n", "print(\"\" + 100 * '-')" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3.10.4 ('hf': venv)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.6" }, "vscode": { "interpreter": { "hash": "84f3c8774ca1c74eb574ae1655a273850a12d5dbb694801a64998ecbefff8fe7" } } }, "nbformat": 4, "nbformat_minor": 5 }

notebooks/115_introducing_contrastive_search.ipynb (1,165 lines of code) (raw):