notebooks/eval-t511-xl-xsum.ipynb

{ "cells": [ { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Copyright 2022 Google LLC\n", "#\n", "# Licensed under the Apache License, Version 2.0 (the \"License\");\n", "# you may not use this file except in compliance with the License.\n", "# You may obtain a copy of the License at\n", "#\n", "# http://www.apache.org/licenses/LICENSE-2.0\n", "#\n", "# Unless required by applicable law or agreed to in writing, software\n", "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", "# See the License for the specific language governing permissions and\n", "# limitations under the License." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Evaluating T5 1.1 XL model fined tuned for abstractive summarization\n", "\n", "\n", "---\n", "\n", "\n", "This notebook demonstrates how to evaluate a T5X XL model fine-tuned for text summarization on the abstractive summarization task. The model is evaluated on the `test` split of the [XSum dataset](https://www.tensorflow.org/datasets/catalog/xsum).\n", "\n", "During fine-tuning, evaluations are run on the `validation` split of a dataset. As running evaluations on the full split may be computationally intensive you often limit the number of evaluation steps during fine tuning. This is how the example job that fine tunes T5 XL on XSum (`finetune-t511-xl-xsum.ipynb` is configured.\n", "\n", "After fine-tuning is completed, you can run evaluations on the full `test` split to obtain the test performance metrics. You can run evaluations using a few checkpoints that showed the best performance on the `validation` split. For example, you could run evaluations \n", "on the 1006000 and 1007000 splits created during the following fine tuning run.\n", "\n", "![Metrics](../images/metrics.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Imports and initialization\n", "\n", "Please refer to the [environment setup](../README.md) section in the README \n", "file to setup the development environment and install the required libraries \n", "before importing them." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# reloads modules automatically before executing any code/script\n", "%load_ext autoreload\n", "%autoreload 2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Import libraries" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import os\n", "import time\n", "from datetime import datetime\n", "import pandas as pd\n", "\n", "import utils\n", "\n", "# import vertex ai sdk for python\n", "from google.cloud import aiplatform as vertex_ai" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Configure environment settings\n", "\n", "- **`PROJECT_ID`:** Google Cloud Project ID\n", "- **`REGION`:** [Region](https://cloud.google.com/vertex-ai/docs/general/locations) \n", " to be used for Vertex AI operations throughout the rest of this notebook\n", "- **`BUCKET`:** Google Cloud Storage bucket name to be used by vertex AI for \n", " any operations such as to stage the code, save any artifacts generated etc.\n", "- **`TFDS_DATA_DIR`:** GCS location of the prestaged datasets. \n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Project definitions\n", "PROJECT_ID = '<YOUR PROJECT ID>' # Change to your project id.\n", "REGION = '<YOUR REGION>' # Change to your region.\n", "\n", "# Bucket definitions\n", "BUCKET = '<YOUR BUCKET NAME>' # Change to your bucket." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Project definitions\n", "PROJECT_ID = 'jk-mlops-dev' # Change to your project id.\n", "REGION = 'us-central1' # Change to your region.\n", "\n", "# Bucket definitions\n", "BUCKET = 'jk-t5x-staging' # Change to your bucket.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Configure dataset location\n", "\n", "Configure the Cloud Storage location where the TFDS XSum dataset is \n", "staged. If you followed the the [environment setup]() instructions the dataset is in `gs://<YOUR BUCKET>/datasets`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "TFDS_DATA_DIR = f'gs://{BUCKET}/datasets'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Configure custom training container image\n", "\n", "In this example, you use the base T5X custom training container." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# configure the custom container image name\n", "IMAGE_NAME = 't5x-base' # Change to your image name\n", "IMAGE_URI = f'gcr.io/{PROJECT_ID}/{IMAGE_NAME}'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Validate image exists in the Container Registry" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "! gcloud container images describe $IMAGE_URI" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Configure experiment settings\n", "\n", "Choose a Vertex AI experiment to store the evaluation results. If you have an existing experiment that contains a run with fine tuned checkpoints you want to evaluate, set `EXPERIMENT_NAME` to that experiment's name. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "EXPERIMENT_NAME = '<YOUR EXPERIMENT NAME>' # Change to your experiment name\n", "\n", "EXPERIMENT_WORKSPACE = f'gs://{BUCKET}/experiments/{EXPERIMENT_NAME}'\n", "EXPERIMENT_RUNS = f'{EXPERIMENT_WORKSPACE}/runs'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Initialize Vertex AI SDK for Python\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "vertex_ai.init(\n", " project=PROJECT_ID,\n", " location=REGION,\n", " staging_bucket=EXPERIMENT_WORKSPACE,\n", " experiment=EXPERIMENT_NAME\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Run model evaluation job" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Define the job's gin file\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "JOB_GIN_FILE = '../configs/eval_t511_xl_xsum.gin'\n", "\n", "!cat {JOB_GIN_FILE}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The default evaluation configuration as defined in `t5x/t5x/configs/runs/eval.gin` does not put any constraints on the length of input and target features. Their dimensions are computed by looking for the longest sequences in the data split used for evaluation. In the preprocessed `test` split of the XSum dataset, the longest sequence is 15861 tokens so the feature dimension of input batches would be set to 15861. This can lead to out of memory errors during evaluation. To mitigate this, the maximum length of inputs and targets is set to 2048 and 64 respectively. These numbers were set based on the statistics of the `test` split. The 99th percentile of inputs is 1988 and the 99th percentile of targets is 54.\n", "\n", "This configuration has been tested on a v2-32 TPU slice using a 4-way model parallelism." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Select the checkpoint for evaluation\n", "\n", "If your experiment contains a previous run(s) you can retrieve a location of the checkpoint from the run's record. It is recommended to review TensorBoard logs generated during the run to select a checkpoint or checkpoints for full evaluation. \n", "\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# get all experiment runs and run directories\n", "utils.get_all_experiment_run_directories(EXPERIMENT_NAME)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**NOTE**: Model checkpoint path is a sub-folder under the `RUN_DIR` with prefix `checkpoint_`. There could be multiple checkpoint folders. Pick the model checkpoint you would like to evaluate the model with. For example:\n", "```\n", "! gsutil ls -r $RUN_DIR\n", " gs://{BUCKET_NAME}/experiments/{EXPERIMENT_NAME}/runs/{RUN_ID}/checkpoint_1000000/\n", " gs://{BUCKET_NAME}/experiments/{EXPERIMENT_NAME}/runs/{RUN_ID}/checkpoint_1005000/\n", "```" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "CHECKPOINT_PATH = '<YOUR CHECKPOINT PATH>' # Change to the checkpoint path on GCS" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Configure Vertex AI CustomJob \n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "EVAL_RUN_NAME = f'<YOUR RUN NAME>' # Change to your run name for the custom job\n", "EVAL_RUN_ID = f'{EXPERIMENT_NAME}-{EVAL_RUN_NAME}-{datetime.now().strftime(\"%Y%m%d%H%M\")}'\n", "EVAL_RUN_DIR = f'{EXPERIMENT_RUNS}/{EVAL_RUN_ID}'\n", "RUN_MODE = 'eval'" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "GIN_FILES = [JOB_GIN_FILE]\n", "GIN_OVERWRITES = [\n", " 'USE_CACHED_TASKS=False',\n", " f'CHECKPOINT_PATH=\"{CHECKPOINT_PATH}\"',\n", " f'EVAL_OUTPUT_DIR=\"{EVAL_RUN_DIR}\"'\n", "]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Log local variables defined for any troubleshooting" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "for key in [\n", " \"PROJECT_ID\", \"REGION\", \"BUCKET\",\n", " \"IMAGE_NAME\", \"IMAGE_URI\", \n", " \"EXPERIMENT_NAME\", \"EXPERIMENT_WORKSPACE\", \"EXPERIMENT_RUNS\", \n", " \"TFDS_DATA_DIR\", \"GIN_FILES\", \"GIN_OVERWRITES\", \n", " \"EVAL_RUN_NAME\", \"EVAL_RUN_ID\", \"EVAL_RUN_DIR\", \"RUN_MODE\",\n", " \"CHECKPOINT_PATH\"]:\n", " print(f\"{key}={eval(key)}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Configure Cloud TPU topology. Double check if your [region](https://cloud.google.com/vertex-ai/docs/general/locations#accelerators) supports the configured TPU topology." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "MACHINE_TYPE = 'cloud-tpu'\n", "ACCELERATOR_TYPE = 'TPU_V2'\n", "ACCELERATOR_COUNT = 32" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Create the custom job spec" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "job = utils.create_t5x_custom_job(\n", " display_name=EVAL_RUN_ID,\n", " machine_type=MACHINE_TYPE,\n", " accelerator_type=ACCELERATOR_TYPE,\n", " accelerator_count=ACCELERATOR_COUNT,\n", " image_uri=IMAGE_URI,\n", " run_mode=RUN_MODE,\n", " gin_files=GIN_FILES,\n", " model_dir=EVAL_RUN_DIR,\n", " tfds_data_dir=TFDS_DATA_DIR,\n", " gin_overwrites=GIN_OVERWRITES\n", ")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "job.job_spec" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Submit the custom job to Vertex AI and track the experiment\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "utils.submit_and_track_t5x_vertex_job(\n", " custom_job=job,\n", " job_display_name=EVAL_RUN_ID,\n", " run_name=EVAL_RUN_ID,\n", " experiment_name=EXPERIMENT_NAME,\n", " execution_name=EVAL_RUN_ID,\n", " tfds_data_dir=TFDS_DATA_DIR,\n", " model_dir=EVAL_RUN_DIR,\n", " vertex_ai=vertex_ai,\n", " run_mode=RUN_MODE\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Explore and log metrics\n", "\n", "After the job is completed, you can explore the metrics and log them to the \n", "experiment. \n", "\n", "Run the below utility function to parse the run logs and push the metrics to Vertex AI Experiments." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "GCS_EVAL_DIR = os.path.join(EVAL_RUN_DIR, 'inference_eval')\n", "\n", "results = utils.parse_and_log_eval_metrics(\n", " summary_dir=GCS_EVAL_DIR,\n", " run_name=EVAL_RUN_ID,\n", " vertex_ai=vertex_ai\n", ")\n", "results" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "environment": { "kernel": "python3", "name": "common-cpu.m93", "type": "gcloud", "uri": "gcr.io/deeplearning-platform-release/base-cpu:m93" }, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.12" }, "vscode": { "interpreter": { "hash": "802ea0518c7535cb908ec450b955e980eb8525a80af866d337675ca6fae56b98" } } }, "nbformat": 4, "nbformat_minor": 4 }

notebooks/eval-t511-xl-xsum.ipynb (509 lines of code) (raw):