notebooks/flan_t5_inference.ipynb

{ "cells": [ { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "# Flan-T5 is (probably) all you need\n", "\n", "Ultra-large language models like GPT3 and GPT4 are undoubtedly impressive and represent important advances in the field of generative AI. However, models of this scale aren't necessary to perform many NLP workloads in production today. Many commercial applications simply don't require such a broad range of generative capabilities and using them may be AI overkill. Here are a few of the issues with using (sometimes) excessive models:\n", "- You pay more. These models must be executed on additional GPUs which is reflected in the bill from your cloud compute provider.\n", "- The most advanced GPU-based systems, which are required to run these models (A100 and H100), are in high demand and short supply, increasing the cost further.\n", "- There is a greater environmental impact due to the high power requirements to both operate and cool these additional GPUs.\n", "\n", "In December 2022 Google published [Scaling Instruction-Finetuned Language Models](https://arxiv.org/abs/2210.11416) in which they perform extensive fine-tuning for a broad collection of tasks across a variety of models (PaLM, T5, U-PaLM). Part of this publication was the release of Flan-T5 checkpoints, \"which achieve strong few-shot performance\" with relatively modest parameter counts \"even compared to much larger models\" like the largest members of the GPT family.\n", "\n", "In this notebook we will demonstrate how you can use Flan-T5 on readily available IPU-POD4s (Flan-T5-Large) and IPU-POD16s (Flan-T5-XL) for common NLP workloads. We shall do this by considering the following questions:\n", "- How good is Flan-T5, really?\n", "- How do I run Flan-T5 on IPUs?\n", "- What can I use Flan-T5 for?\n", "- Why would I move up to Flan-T5-XL?\n", "\n", "| Domain | Tasks | Model | Datasets | Workflow | Number of IPUs | Execution time |\n", "|---------|-------|-------|----------|----------|--------------|--------------|\n", "| NLP | Few-shot text generation | Flan-T5 | N/A | inference | recommended: 16 (min: 4) | 6mn (15mn) |" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Environment setup\n", "\n", "The best way to run this demo is on Paperspace Gradient's cloud IPUs because everything is already set up for you.\n", "\n", "[![Run on Gradient](https://assets.paperspace.io/img/gradient-badge.svg)](https://ipu.dev/AWzcPN)\n", "\n", "To run the demo using other IPU hardware, you need to have the Poplar SDK enabled and a PopTorch wheel installed. Refer to the [Getting Started guide for your system](https://docs.graphcore.ai/en/latest/getting-started.html) for details on how to do this. Also refer to the Jupyter Quick Start guide for how to set up Jupyter to be able to run this notebook on a remote IPU machine." ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## How good is Flan-T5, really?\n", "\n", "Let's start by looking at some numbers from the paper:\n", "\n", "<img src=\"images/t5_vs_flan_t5.png\" style=\"width: 640px;\"/>\n", "\n", "> Part of table 5 from [Scaling Instruction-Finetuned Language Models](https://arxiv.org/abs/2210.11416)\n", "\n", "These results are astounding. Notice that:\n", "- Flan-T5 performs ~2x better than T5 in MMLU, BBH & MGSM\n", "- In TyDiQA we even see the emergence of new abilities\n", "- Flan-T5-Large is better than all previous variants of T5 (even XXL)\n", "\n", "This establishes Flan-T5 as an entirely different beast to the T5 you may know. Now let's see how Flan-T5-Large and Flan-T5-XL compare to other models in the MMLU benchmark:\n", "\n", "| Rank | Model | Average (%) | Parameters (Billions) |\n", "|------|-------|-------------|-----------------------|\n", "| 22 | GPT-3 (finetuned) | 53.9 | 175 |\n", "| 23 | GAL 120B (zero-shot) | 52.6 | 120 |\n", "| 24 | Flan-T5-XL | 52.4 | 3 |\n", "| 25 | Flan-PaLM 8B | 49.3 | 8 |\n", "| 30 | Flan-T5-XL (CoT) | 45.5 | 3 |\n", "| 31 | Flan-T5-Large | 45.1 | 0.8 |\n", "| 33 | GPT-3 (few-shot, k=5) | 43.9 | 175 |\n", "| 35 | Flan-PaLM 8B (CoT) | 41.3 | 8 |\n", "| 36 | Flan-T5-Large (CoT) | 40.5 | 0.8 |\n", "| 38 | LLaMA 7B (few-shot, k=5) | 35.1 | 7 |\n", "\n", "> Part of the MMLU leaderboard from [Papers With Code](https://paperswithcode.com/sota/multi-task-language-understanding-on-mmlu) (CoT = Chain of Thought)\n", "\n", "Noting that Flan-T5 had MMLU held out from training, this table shows that:\n", "- Flan-T5-Large and Flan-T5-XL (with 0.8B and 3B parameters respectively) perform similarly to other models with significantly more parameters, for example GPT-3 (175B parameters) and Galactica (120B parameters).\n", "- GPT-3 needs to be fine-tuned for the benchmark task in order to beat Flan-T5-XL.\n", "- Flan-T5 also outperforms smaller versions of more recent LLMs like PaLM and LLaMA (while also being multiple times smaller)." ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## How do I run Flan-T5 on IPUs?\n", "\n", "Since the Flan-T5 checkpoints are available on Hugging Face, you can use Graphcore's Hugging Face integration ([🤗 Optimum Graphcore](https://github.com/huggingface/optimum-graphcore)) to easily run Flan-T5 with a standard inference pipeline.\n", "\n", "If you already have an existing Hugging Face based application that you'd like to try on IPUs, then it's as simple as:\n", "\n", "```diff\n", "->>> from transformers import pipeline\n", "+>>> from optimum.graphcore import pipeline\n", "\n", "->>> text_generator = pipeline(\"text2text-generation\", model=\"google/flan-t5-large\")\n", "+>>> text_generator = pipeline(\"text2text-generation\", model=\"google/flan-t5-large\", ipu_config=\"Graphcore/t5-large-ipu\")\n", ">>> text_generator(\"Please solve the following equation: x^2 - 9 = 0\")\n", "[{'generated_text': '3'}]\n", "```\n", "\n", "Now let's define a text generator of our own to use in the rest of this notebook. First, make sure your Python virtual environment has the latest version of [🤗 Optimum Graphcore](https://github.com/huggingface/optimum-graphcore) installed:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%pip install \"optimum-graphcore==0.7\"\n", "%pip install \"graphcore-cloud-tools[logger] @ git+https://github.com/graphcore/graphcore-cloud-tools\"\n", "%load_ext graphcore_cloud_tools.notebook_logging.gc_logger" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "The location of the cache directories can be configured through environment variables or directly in the notebook:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import os\n", "executable_cache_dir=os.getenv(\"POPLAR_EXECUTABLE_CACHE_DIR\", \"./exe_cache/\")\n", "num_available_ipus=int(os.getenv(\"NUM_AVAILABLE_IPU\", 4))" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Next, let's import `pipeline` from `optimum.graphcore` and create our Flan-T5 pipeline for the appropriate number of IPUs:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from optimum.graphcore import pipeline, IPUConfig\n", "\n", "size = {4: \"large\", 16: \"xl\"}\n", "flan_t5 = pipeline(\n", " \"text2text-generation\",\n", " model=f\"google/flan-t5-{size[num_available_ipus]}\",\n", " ipu_config=IPUConfig.from_pretrained(\n", " f\"Graphcore/t5-{size[num_available_ipus]}-ipu\",\n", " executable_cache_dir=executable_cache_dir,\n", " ),\n", " max_input_length=896,\n", ")" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Now, let's ask it some random questions:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "\n", "questions = [\n", " \"Solve the following equation for x: x^2 - 9 = 0\",\n", " \"At what temperature does nitrogen freeze?\",\n", " \"In order to reduce symptoms of asthma such as tightness in the chest, wheezing, and difficulty breathing, what do you recommend?\",\n", " \"Which country is home to the tallest mountain in the world?\"\n", "]\n", "for out in flan_t5(questions):\n", " print(out)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "> Note that some of these answers may be wrong, information retrieval from the model itself is not the purpose of Flan-T5. However, if you use Flan-T5-XL they are less likely to be wrong (come back to this notebook with an IPU-POD16 to see the difference!)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## What can I use Flan-T5 for?\n", "\n", "Flan-T5 has been fine-tuned on thousands of different tasks across hundreds of datasets. So no matter what your task might be, it's worth seeing if Flan-T5 can meet your requirements. Here we will demonstrate a few of the common ones:\n", "\n", "### Sentiment Analysis" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "sentiment_analysis = (\n", " \"Review: It gets too hot, the battery only can last 4 hours. Sentiment: Negative\\n\"\n", " \"Review: Nice looking phone. Sentiment: Positive\\n\"\n", " \"Review: Sometimes it freezes and you have to close all the open pages and then reopen where you were. Sentiment: Negative\\n\"\n", " \"Review: Wasn't that impressed, went back to my old phone. Sentiment:\"\n", ")\n", "flan_t5(sentiment_analysis)[0][\"generated_text\"]" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### Advanced Named Entity Recognition\n", "\n", "> The following snippets are adapted from the Wikipedia pages corresponding to each mentioned company" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "advanced_ner = \"\"\"Microsoft Corporation is a company that makes computer software and video games. Bill Gates and Paul Allen founded the company in 1975\n", "[Company]: Microsoft, [Founded]: 1975, [Founders]: Bill Gates, Paul Allen\n", "\n", "Amazon.com, Inc., known as Amazon , is an American online business and cloud computing company. It was founded on July 5, 1994 by Jeff Bezos\n", "[Company]: Amazon, [Founded]: 1994, [Founders]: Jeff Bezos\n", "\n", "Apple Inc. is a multinational company that makes personal computers, mobile devices, and software. Apple was started in 1976 by Steve Jobs and Steve Wozniak.\"\"\"\n", "flan_t5(advanced_ner)[0][\"generated_text\"]" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### Question Answering\n", "\n", "> The following snippet came from the [squad](https://huggingface.co/datasets/squad) dataset" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "context = 'Super Bowl 50 was an American football game to determine the champion of the National Football League (NFL) for the 2015 season. The American Football Conference (AFC) champion Denver Broncos defeated the National Football Conference (NFC) champion Carolina Panthers 24-10 to earn their third Super Bowl title. The game was played on February 7, 2016, at Levi\\'s Stadium in the San Francisco Bay Area at Santa Clara, California. As this was the 50th Super Bowl, the league emphasized the \"golden anniversary\" with various gold-themed initiatives, as well as temporarily suspending the tradition of naming each Super Bowl game with Roman numerals (under which the game would have been known as \"Super Bowl L\"), so that the logo could prominently feature the Arabic numerals 50.'\n", "question = \"Which NFL team represented the AFC at Super Bowl 50?\"\n", "# The correct answer is Denver Broncos\n", "flan_t5(f\"{context} {question}\")[0]['generated_text']" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### Intent Classification" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "intent_classification = \"\"\"[Text]: I really need to get a gym membership, I'm exhausted.\n", "[Intent]: get gym membership\n", "\n", "[Text]: What do I need to make a carbonara?\n", "[Intent]: cook carbonara\n", "\n", "[Text]: I need all these documents sorted and filed by Monday.\n", "[Intent]:\"\"\"\n", "flan_t5([intent_classification])[0][\"generated_text\"]" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### Summarization\n", "\n", "> The following snippets came from the [xsum](https://huggingface.co/datasets/xsum) dataset" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "summarization=\"\"\"\n", "Document: Firstsource Solutions said new staff will be based at its Cardiff Bay site which already employs about 800 people.\n", "The 300 new jobs include sales and customer service roles working in both inbound and outbound departments.\n", "The company's sales vice president Kathryn Chivers said: \"Firstsource Solutions is delighted to be able to continue to bring new employment to Cardiff.\"\n", "Summary: Hundreds of new jobs have been announced for a Cardiff call centre.\n", "\n", "Document: The visitors raced into a three-goal first-half lead at Hampden.\n", "Weatherson opened the scoring with an unstoppable 15th-minute free-kick, and he made it 2-0 in the 27th minute.\n", "Matt Flynn made it 3-0 six minutes later with a fine finish.\n", "Queen's pulled a consolation goal back in stoppage time through John Carter.\n", "Summary: Peter Weatherson netted a brace as Annan recorded only their second win in eight matches.\n", "\n", "Document: Officers searched properties in the Waterfront Park and Colonsay View areas of the city on Wednesday.\n", "Detectives said three firearms, ammunition and a five-figure sum of money were recovered.\n", "A 26-year-old man who was arrested and charged appeared at Edinburgh Sheriff Court on Thursday.\n", "Summary:\n", "\"\"\"\n", "flan_t5(summarization)[0][\"generated_text\"]" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### Text Classification" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "text_classification_1 = \"\"\"A return ticket is better value than a single.\n", "topic: travel cost\n", "\n", "You can start from the basic stitches, and go from there.\n", "topic: learning knitting\n", "\n", "The desk which I bought yesterday is very big.\n", "topic: furniture size\n", "\n", "George Washington was president of the United States from 1789 to 1797.\n", "topic:\"\"\"\n", "flan_t5(text_classification_1)[0][\"generated_text\"]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "\n", "text_classification_2 = \"\"\"FLAN-T5 was released in the paper Scaling Instruction-Finetuned Language Models - it is an enhanced version of T5 that has been finetuned in a mixture of tasks.\n", "keywords: released, enhanced, finetuned\n", "\n", "The IPU, or Intelligence Processing Unit, is a highly flexible, easy-to-use parallel processor designed from the ground up for AI workloads.\n", "keywords: processor, AI\n", "\n", "Paperspace is the platform for AI developers. providing the speed and scale needed to take AI models from concept to production.\n", "keywords:\"\"\"\n", "\n", "flan_t5(text_classification_2)[0][\"generated_text\"]" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Why would I move up to Flan-T5-XL?\n", "\n", "As we saw earlier when looking at the results from the paper, Flan-T5-XL is roughly 40% (on average) better than Flan-T5-Large across its validation tasks. Therefore when deciding if Flan-T5-XL is worth the cost for you, ask yourself the following questions:\n", "- Does my data need greater linguistic understanding for the task to be performed?\n", "- Is my task too complicated for a model as small as Flan-T5-Large and too easy for a model as large as GPT-3?\n", "- Does my task require longer output sequences that Flan-T5-XL is needed to generate?\n", "\n", "To demonstrate, let's now look at an example of a task where the answer to all of the above questions is yes. Let's say you have a customer service AI that you use to answer basic questions in order to reduce the workload of your customer service personnel. This needs:\n", "- Strong linguistic ability to both parse and generate medium-sized chunks of text\n", "- An LLM that is able to learn well from context, but doesn't have all of human history embedded in its parameters\n", "- The ability to produce multiple-sentence responses, but not much longer than this\n", "\n", "Looking at the code below, we see some context about Graphcore provided in the input, as well as a primer for a conversational response from the model. As you can see from the example conversation (read the example conversation before executing the code block), Flan-T5-XL was able to understand the information provided in the context and provide useful and natural answers to the questions it was asked." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Starting session\n", "[Customer]: What is an IPU?\n", "[Virtual Assistant]: The Intelligence Processing Unit (IPU) is a computer chip that is used to process artificial intelligence.\n", "[Customer]: Who makes it?\n", "[Virtual Assistant]: Graphcore is the manufacturer of the IPU.\n", "[Customer]: Can I use them?\n", "[Virtual Assistant]: Yes, I'm sure you can.\n", "[Customer]: Where?\n", "[Virtual Assistant]: The IPU is available on Paperspace, Gcore and Graphcloud.\n", "Ending session\n" ] } ], "source": [ "import os\n", "from IPython.display import clear_output\n", "\n", "class ChatBot:\n", " def __init__(self, model, context) -> None:\n", " self.example_prompts = os.getenv(\"EXAMPLE_PROMPTS\")\n", " self.model = model\n", " self.initial_context = context\n", " self.context = self.initial_context\n", " self.user, self.persona = [x.split(\":\")[0] for x in context.split(\"\\n\")[-2:]]\n", "\n", " def input(self):\n", " if self.example_prompts is None:\n", " return input()\n", " if isinstance(self.example_prompts, str):\n", " self.example_prompts = self.example_prompts.splitlines(keepends=True)\n", " if self.example_prompts:\n", " return self.example_prompts.pop(0)\n", " return \"\"\n", "\n", " def ask(self, question):\n", " question += \".\" if question[-1] not in [\".\", \"?\", \"!\"] else \"\"\n", " x = f\"{self.context}\\n{self.user}: {question}\\n{self.persona}: \"\n", " y = self.model(x)\n", " response = y[0][\"generated_text\"]\n", " self.context = f\"{x}{response}\"\n", " return response\n", "\n", " def session(self):\n", " print(\"Starting session\", flush=True)\n", " prompt = self.input()\n", " while prompt != \"\":\n", " if prompt == \"reset\":\n", " clear_output()\n", " print(\"Starting session\", flush=True)\n", " self.context = self.initial_context\n", " prompt = self.input()\n", " print(f\"{self.user.title()}: {prompt}\", flush=True)\n", " answer = self.ask(prompt)\n", " print(f\"{self.persona.title()}: {answer}\", flush=True)\n", " prompt = self.input()\n", " print(\"Ending session\", flush=True)\n", "\n", "\n", "context = f\"\"\"This is a conversation between a [customer] and a [virtual assistant].\n", "The [virtual assistant] works at Graphcore. Here is some informaton about Graphcore:\n", "- Graphcore is located in Bristol.\n", "- Graphcore invented the intelligence processing unit (IPU). It is purpose built for AI applications.\n", "- The currently available IPU models are: Classic IPU, Bow IPU, C600.\n", "- IPUs are available on: Paperspace, Gcore Cloud and Graphcloud.\n", "\n", "[virtual assistant]: Hello, welcome to Graphcore, how can I help you today?\n", "[customer]: I'd like to ask some questions about your company.\n", "[virtual assistant]: Ok, I can help you with that.\"\"\"\n", "chatbot = ChatBot(flan_t5, context)\n", "chatbot.session()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "flan_t5.model.detachFromDevice()" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Conclusion\n", "\n", "In summary, the answers to the questions we posed in the introduction are:\n", "- Q: How good is Flan-T5, really?\n", " - A: Twice as good as T5 and on par with GPT-3 according to the MMLU benchmark.\n", "- Q: How do I run Flan-T5 on IPUs?\n", " - A: Change one import and add one keyword argument to your `pipeline` instantiation.\n", "- Q: What can I use Flan-T5 for?\n", " - A: Given its wide variety of fine-tuned tasks, almost anything.\n", "- Q: Why would I move up to Flan-T5-XL?\n", " - A: For an approximately 40% performance increase over Flan-T5-Large, enabling more demanding tasks.\n", "\n", "Did you know? We offer a notebook similar to this one, where you can use Flan-T5 XL and XXL to perform inference on an IPU-POD4. It can be found in `doing-more-with-flan-t5/Flan-T5-generative-inference.ipynb`.\n", "\n", "And if you're interested in fine-tuning Flan-T5 XL and XXL on a custom dataset, check out `doing-more-with-flan-t5/Flan-T5-textual-entailment-fine-tuning.ipynb`.\n", "\n", "If you'd like to continue exploring NLP on the IPU, take a look at our [GPT-J Fine-Tuning](https://www.graphcore.ai/posts/fine-tuned-gpt-j-a-cost-effective-alternative-to-gpt-4-for-nlp-tasks) blog and [corresponding notebook](https://ipu.dev/0uiXjd).\n", "\n", "If you'd like to learn more about how we got T5 to work properly in Float16, see our [technical blog](https://www.graphcore.ai/posts/running-flan-t5-xl-inference-in-float16-for-ipu-how-we-did-it) on the subject." ] } ], "metadata": { "kernelspec": { "display_name": "3.2.0+1277_poptorch", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.10" } }, "nbformat": 4, "nbformat_minor": 2 }

notebooks/flan_t5_inference.ipynb (539 lines of code) (raw):