07_training/07a_ingest.ipynb

{ "cells": [ { "cell_type": "code", "execution_count": 1, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 72 }, "id": "hiQ6zAoYhyaA", "outputId": "0acee878-1207-42c3-9bee-a594acd44365" }, "outputs": [ { "data": { "text/markdown": [ "\n", "<table class=\"tfo-notebook-buttons\" align=\"left\">\n", " <td>\n", " <a target=\"_blank\" href=\"https://console.cloud.google.com/ai-platform/notebooks/deploy-notebook?name=Writing an efficient ingest Loop&url=https%3A%2F%2Fgithub.com%2FGoogleCloudPlatform%2Fpractical-ml-vision-book%2Fblob%2Fmaster%2F06_preprocessing%2F07a_ingest.ipynb&download_url=https%3A%2F%2Fgithub.com%2FGoogleCloudPlatform%2Fpractical-ml-vision-book%2Fraw%2Fmaster%2F06_preprocessing%2F07a_ingest.ipynb\">\n", " <img src=\"https://raw.githubusercontent.com/GoogleCloudPlatform/practical-ml-vision-book/master/logo-cloud.png\"/> Run in AI Platform Notebook</a>\n", " </td>\n", " </td>\n", " <td>\n", " <a target=\"_blank\" href=\"https://colab.research.google.com/github/GoogleCloudPlatform/practical-ml-vision-book/blob/master/06_preprocessing/07a_ingest.ipynb\">\n", " <img src=\"https://www.tensorflow.org/images/colab_logo_32px.png\" />Run in Google Colab</a>\n", " </td>\n", " <td>\n", " <a target=\"_blank\" href=\"https://github.com/GoogleCloudPlatform/practical-ml-vision-book/blob/master/06_preprocessing/07a_ingest.ipynb\">\n", " <img src=\"https://www.tensorflow.org/images/GitHub-Mark-32px.png\" />View source on GitHub</a>\n", " </td>\n", " <td>\n", " <a href=\"https://raw.githubusercontent.com/GoogleCloudPlatform/practical-ml-vision-book/master/06_preprocessing/07a_ingest.ipynb\">\n", " <img src=\"https://www.tensorflow.org/images/download_logo_32px.png\" />Download notebook</a>\n", " </td>\n", "</table>\n" ], "text/plain": [ "<IPython.core.display.Markdown object>" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from IPython.display import Markdown as md\n", "\n", "### change to reflect your notebook\n", "_nb_loc = \"07_training/07a_ingest.ipynb\"\n", "_nb_title = \"Writing an efficient ingest Loop\"\n", "\n", "### no need to change any of this\n", "_nb_safeloc = _nb_loc.replace('/', '%2F')\n", "md(\"\"\"\n", "<table class=\"tfo-notebook-buttons\" align=\"left\">\n", " <td>\n", " <a target=\"_blank\" href=\"https://console.cloud.google.com/ai-platform/notebooks/deploy-notebook?name={1}&url=https%3A%2F%2Fgithub.com%2FGoogleCloudPlatform%2Fpractical-ml-vision-book%2Fblob%2Fmaster%2F{2}&download_url=https%3A%2F%2Fgithub.com%2FGoogleCloudPlatform%2Fpractical-ml-vision-book%2Fraw%2Fmaster%2F{2}\">\n", " <img src=\"https://raw.githubusercontent.com/GoogleCloudPlatform/practical-ml-vision-book/master/logo-cloud.png\"/> Run in AI Platform Notebook</a>\n", " </td>\n", " </td>\n", " <td>\n", " <a target=\"_blank\" href=\"https://colab.research.google.com/github/GoogleCloudPlatform/practical-ml-vision-book/blob/master/{0}\">\n", " <img src=\"https://www.tensorflow.org/images/colab_logo_32px.png\" />Run in Google Colab</a>\n", " </td>\n", " <td>\n", " <a target=\"_blank\" href=\"https://github.com/GoogleCloudPlatform/practical-ml-vision-book/blob/master/{0}\">\n", " <img src=\"https://www.tensorflow.org/images/GitHub-Mark-32px.png\" />View source on GitHub</a>\n", " </td>\n", " <td>\n", " <a href=\"https://raw.githubusercontent.com/GoogleCloudPlatform/practical-ml-vision-book/master/{0}\">\n", " <img src=\"https://www.tensorflow.org/images/download_logo_32px.png\" />Download notebook</a>\n", " </td>\n", "</table>\n", "\"\"\".format(_nb_loc, _nb_title, _nb_safeloc))" ] }, { "cell_type": "markdown", "metadata": { "id": "a8HQYsAtC0Fv" }, "source": [ "# Efficient Ingest\n", "\n", "In this notebook, we speed the ingest of training/evaluation data into the model." ] }, { "cell_type": "markdown", "metadata": { "id": "5UOm2etrwYCs" }, "source": [ "## Enable GPU and set up helper functions\n", "\n", "This notebook and pretty much every other notebook in this repository\n", "will run faster if you are using a GPU.\n", "On Colab:\n", "- Navigate to Edit→Notebook Settings\n", "- Select GPU from the Hardware Accelerator drop-down\n", "\n", "On Cloud AI Platform Notebooks:\n", "- Navigate to https://console.cloud.google.com/ai-platform/notebooks\n", "- Create an instance with a GPU or select your instance and add a GPU\n", "\n", "Next, we'll confirm that we can connect to the GPU with tensorflow:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "ugGJcxKAwhc2", "outputId": "8e946159-46cf-4aba-f53e-622e9ea8adee" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "TensorFlow version2.3.1\n", "Built with GPU support? Yes!\n", "There are 2 GPUs\n", "Found GPU at: /device:GPU:0\n" ] } ], "source": [ "import tensorflow as tf\n", "print('TensorFlow version' + tf.version.VERSION)\n", "print('Built with GPU support? ' + ('Yes!' if tf.test.is_built_with_cuda() else 'Noooo!'))\n", "print('There are {} GPUs'.format(len(tf.config.experimental.list_physical_devices(\"GPU\"))))\n", "device_name = tf.test.gpu_device_name()\n", "if device_name != '/device:GPU:0':\n", " raise SystemError('GPU device not found')\n", "print('Found GPU at: {}'.format(device_name))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Original code\n", "\n", "This is the original code, from [../06_preprocessing/06e_colordistortion.ipynb](../06_preprocessing/06e_colordistortion.ipynb)\n", "\n", "We have a few variations of creating a preprocessed dataset." ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [], "source": [ "import matplotlib.pylab as plt\n", "import numpy as np\n", "import tensorflow as tf\n", "import tensorflow_hub as hub\n", "import os\n", "# Load compressed models from tensorflow_hub\n", "os.environ['TFHUB_MODEL_LOAD_FORMAT'] = 'COMPRESSED' \n", "\n", "from tensorflow.data.experimental import AUTOTUNE\n", "\n", "IMG_HEIGHT = 448 # note *twice* what we used to have\n", "IMG_WIDTH = 448\n", "IMG_CHANNELS = 3\n", "CLASS_NAMES = 'daisy dandelion roses sunflowers tulips'.split()\n", "\n", "def training_plot(metrics, history):\n", " f, ax = plt.subplots(1, len(metrics), figsize=(5*len(metrics), 5))\n", " for idx, metric in enumerate(metrics):\n", " ax[idx].plot(history.history[metric], ls='dashed')\n", " ax[idx].set_xlabel(\"Epochs\")\n", " ax[idx].set_ylabel(metric)\n", " ax[idx].plot(history.history['val_' + metric]);\n", " ax[idx].legend([metric, 'val_' + metric])\n", " \n", "class _Preprocessor: \n", " def __init__(self):\n", " # nothing to initialize\n", " pass\n", " \n", " def read_from_tfr(self, proto):\n", " feature_description = {\n", " 'image': tf.io.VarLenFeature(tf.float32),\n", " 'shape': tf.io.VarLenFeature(tf.int64),\n", " 'label': tf.io.FixedLenFeature([], tf.string, default_value=''),\n", " 'label_int': tf.io.FixedLenFeature([], tf.int64, default_value=0),\n", " }\n", " rec = tf.io.parse_single_example(\n", " proto, feature_description\n", " )\n", " shape = tf.sparse.to_dense(rec['shape'])\n", " img = tf.reshape(tf.sparse.to_dense(rec['image']), shape)\n", " label_int = rec['label_int']\n", " return img, label_int\n", " \n", " def read_from_jpegfile(self, filename):\n", " # same code as in 05_create_dataset/jpeg_to_tfrecord.py\n", " img = tf.io.read_file(filename)\n", " img = tf.image.decode_jpeg(img, channels=IMG_CHANNELS)\n", " img = tf.image.convert_image_dtype(img, tf.float32)\n", " return img\n", " \n", " def preprocess(self, img):\n", " return tf.image.resize_with_pad(img, IMG_HEIGHT, IMG_WIDTH)\n", "\n", "def create_preproc_dataset_plain(pattern):\n", " preproc = _Preprocessor()\n", " trainds = tf.data.TFRecordDataset(\n", " [filename for filename in tf.io.gfile.glob(pattern)],\n", " compression_type='GZIP'\n", " ).map(preproc.read_from_tfr).map(\n", " lambda img, label: (preproc.preprocess(img), label)\n", " ) \n", " return trainds\n", "\n", "# note: addition of AUTOTUNE to the map() calls\n", "def create_preproc_dataset_parallelmap(pattern):\n", " preproc = _Preprocessor()\n", " def _preproc_img_label(img, label):\n", " return (preproc.preprocess(img), label)\n", " trainds = (\n", " tf.data.TFRecordDataset(\n", " [filename for filename in tf.io.gfile.glob(pattern)],\n", " compression_type='GZIP'\n", " )\n", " .map(preproc.read_from_tfr, num_parallel_calls=AUTOTUNE)\n", " .map(_preproc_img_label, num_parallel_calls=AUTOTUNE)\n", " )\n", " return trainds\n", "\n", "# note: splits the files into two halves and interleaves datasets\n", "def create_preproc_dataset_interleave(pattern, num_parallel=None):\n", " preproc = _Preprocessor()\n", " files = [filename for filename in tf.io.gfile.glob(pattern)]\n", " if len(files) > 1:\n", " print(\"Interleaving the reading of {} files.\".format(len(files)))\n", " def _create_half_ds(x):\n", " if x == 0:\n", " half = files[:(len(files)//2)]\n", " else:\n", " half = files[(len(files)//2):]\n", " return tf.data.TFRecordDataset(half,\n", " compression_type='GZIP')\n", " trainds = tf.data.Dataset.range(2).interleave(\n", " _create_half_ds, num_parallel_calls=AUTOTUNE)\n", " else:\n", " trainds = tf.data.TFRecordDataset(files,\n", " compression_type='GZIP')\n", " def _preproc_img_label(img, label):\n", " return (preproc.preprocess(img), label)\n", " \n", " trainds = (trainds\n", " .map(preproc.read_from_tfr, num_parallel_calls=num_parallel)\n", " .map(_preproc_img_label, num_parallel_calls=num_parallel)\n", " )\n", " return trainds\n", "\n", "def create_preproc_image(filename):\n", " preproc = _Preprocessor()\n", " img = preproc.read_from_jpegfile(filename)\n", " return preproc.preprocess(img)\n", "\n", "class RandomColorDistortion(tf.keras.layers.Layer):\n", " def __init__(self, contrast_range=[0.5, 1.5], \n", " brightness_delta=[-0.2, 0.2], **kwargs):\n", " super(RandomColorDistortion, self).__init__(**kwargs)\n", " self.contrast_range = contrast_range\n", " self.brightness_delta = brightness_delta\n", " \n", " def call(self, images, training=None):\n", " if not training:\n", " return images\n", " \n", " contrast = np.random.uniform(\n", " self.contrast_range[0], self.contrast_range[1])\n", " brightness = np.random.uniform(\n", " self.brightness_delta[0], self.brightness_delta[1])\n", " \n", " images = tf.image.adjust_contrast(images, contrast)\n", " images = tf.image.adjust_brightness(images, brightness)\n", " images = tf.clip_by_value(images, 0, 1)\n", " return images" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Speeding up the reading of data\n", "\n", "To try it out, we'll simply read through the data several times and compute some quantity on the images." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "def loop_through_dataset(ds, nepochs):\n", " lowest_mean = tf.constant(1.)\n", " for epoch in range(nepochs):\n", " thresh = np.random.uniform(0.3, 0.7) # random threshold\n", " count = 0\n", " sumsofar = tf.constant(0.)\n", " for (img, label) in ds:\n", " # mean of channel values > thresh\n", " mean = tf.reduce_mean(tf.where(img > thresh, img, 0))\n", " sumsofar = sumsofar + mean\n", " count = count + 1\n", " if count%100 == 0:\n", " print('.', end='')\n", " mean = sumsofar/count\n", " print(mean)\n", " if mean < lowest_mean:\n", " lowest_mean = mean\n", " return lowest_mean" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [], "source": [ "PATTERN_SUFFIX, NUM_EPOCHS = '-0000[01]-*', 2 # 2 files, 2 epochs\n", "#PATTERN_SUFFIX, NUM_EPOCHS = '-*', 20 # 16 files, 20 epochs" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "...tf.Tensor(0.22762734, shape=(), dtype=float32)\n", "...tf.Tensor(0.21017572, shape=(), dtype=float32)\n", "CPU times: user 7.03 s, sys: 500 ms, total: 7.53 s\n", "Wall time: 7.99 s\n" ] }, { "data": { "text/plain": [ "<tf.Tensor: shape=(), dtype=float32, numpy=0.21017572>" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%%time\n", "ds = create_preproc_dataset_plain(\n", " 'gs://practical-ml-vision-book-data/flowers_tfr/train' + PATTERN_SUFFIX\n", ")\n", "loop_through_dataset(ds, NUM_EPOCHS)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "...tf.Tensor(0.14244522, shape=(), dtype=float32)\n", "...tf.Tensor(0.18533988, shape=(), dtype=float32)\n", "CPU times: user 7.93 s, sys: 375 ms, total: 8.3 s\n", "Wall time: 5.94 s\n" ] }, { "data": { "text/plain": [ "<tf.Tensor: shape=(), dtype=float32, numpy=0.14244522>" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%%time\n", "# parallel map\n", "ds = create_preproc_dataset_parallelmap(\n", " 'gs://practical-ml-vision-book-data/flowers_tfr/train' + PATTERN_SUFFIX\n", ")\n", "loop_through_dataset(ds, NUM_EPOCHS)" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Interleaving the reading of 2 files.\n", "first half\n", "second half\n", "...tf.Tensor(0.12316246, shape=(), dtype=float32)\n", "...tf.Tensor(0.15402032, shape=(), dtype=float32)\n", "CPU times: user 7.86 s, sys: 497 ms, total: 8.35 s\n", "Wall time: 5.29 s\n" ] }, { "data": { "text/plain": [ "<tf.Tensor: shape=(), dtype=float32, numpy=0.12316246>" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%%time\n", "# with interleave\n", "ds = create_preproc_dataset_interleave(\n", " 'gs://practical-ml-vision-book-data/flowers_tfr/train' + PATTERN_SUFFIX,\n", " num_parallel=None\n", ")\n", "loop_through_dataset(ds, NUM_EPOCHS)" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Interleaving the reading of 2 files.\n", "...tf.Tensor(0.18058446, shape=(), dtype=float32)\n", "...tf.Tensor(0.14600855, shape=(), dtype=float32)\n", "CPU times: user 7.99 s, sys: 443 ms, total: 8.44 s\n", "Wall time: 5.23 s\n" ] }, { "data": { "text/plain": [ "<tf.Tensor: shape=(), dtype=float32, numpy=0.14600855>" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%%time\n", "# with interleave and parallel mpas\n", "ds = create_preproc_dataset_interleave(\n", " 'gs://practical-ml-vision-book-data/flowers_tfr/train' + PATTERN_SUFFIX,\n", " num_parallel=AUTOTUNE\n", ")\n", "loop_through_dataset(ds, NUM_EPOCHS)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "When I did this, this is what I got:\n", "\n", "| Method | CPU time | Wall time |\n", "| ---------------------- | ----------- | ------------ |\n", "| Plain | 7.53s | 7.99s |\n", "| Parallel Map | 8.30s | 5.94s |\n", "| Interleave | 8.60s | 5.47s |\n", "| Interleave+Parallel Map| 8.44s | 5.23s |" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## ML model\n", "\n", "The computation above was pretty cheap involving merely adding up all the pixel values.\n", "What happens if we need a bit more complexity (gradient calc, etc.)?" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": [ "def train_simple_model(ds, nepochs):\n", " model = tf.keras.Sequential([\n", " tf.keras.layers.Flatten(\n", " input_shape=(IMG_HEIGHT, IMG_WIDTH, IMG_CHANNELS)),\n", " #tf.keras.layers.Dense(32, activation='relu'),\n", " tf.keras.layers.Dense(len(CLASS_NAMES), activation='softmax')\n", " ])\n", " model.compile(optimizer=tf.keras.optimizers.Adam(),\n", " loss=tf.keras.losses.SparseCategoricalCrossentropy(\n", " from_logits=False),\n", " metrics=['accuracy'])\n", " model.fit(ds, epochs=nepochs)" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Epoch 1/2\n", "359/359 [==============================] - 4s 11ms/step - loss: 384.5950 - accuracy: 0.2841\n", "Epoch 2/2\n", "359/359 [==============================] - 4s 11ms/step - loss: 260.8144 - accuracy: 0.3983\n", "CPU times: user 9.12 s, sys: 796 ms, total: 9.91 s\n", "Wall time: 9.39 s\n" ] } ], "source": [ "%%time\n", "ds = create_preproc_dataset_plain(\n", " 'gs://practical-ml-vision-book-data/flowers_tfr/train' + PATTERN_SUFFIX\n", ").batch(1)\n", "train_simple_model(ds, NUM_EPOCHS)" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Epoch 1/2\n", "359/359 [==============================] - 4s 10ms/step - loss: 378.1755 - accuracy: 0.2646\n", "Epoch 2/2\n", "359/359 [==============================] - 4s 10ms/step - loss: 277.4931 - accuracy: 0.4067\n", "CPU times: user 9.97 s, sys: 718 ms, total: 10.7 s\n", "Wall time: 8.17 s\n" ] } ], "source": [ "%%time\n", "# parallel map\n", "ds = create_preproc_dataset_parallelmap(\n", " 'gs://practical-ml-vision-book-data/flowers_tfr/train' + PATTERN_SUFFIX\n", ").batch(1)\n", "train_simple_model(ds, NUM_EPOCHS)" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Interleaving the reading of 2 files.\n", "Epoch 1/2\n", "359/359 [==============================] - 3s 9ms/step - loss: 359.4355 - accuracy: 0.3008\n", "Epoch 2/2\n", "359/359 [==============================] - 3s 9ms/step - loss: 296.0292 - accuracy: 0.3928\n", "CPU times: user 9.7 s, sys: 825 ms, total: 10.5 s\n", "Wall time: 7.54 s\n" ] } ], "source": [ "%%time\n", "# with interleave\n", "ds = create_preproc_dataset_interleave(\n", " 'gs://practical-ml-vision-book-data/flowers_tfr/train' + PATTERN_SUFFIX,\n", " num_parallel=None\n", ").batch(1)\n", "train_simple_model(ds, NUM_EPOCHS)" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Interleaving the reading of 2 files.\n", "Epoch 1/2\n", "359/359 [==============================] - 3s 9ms/step - loss: 403.3262 - accuracy: 0.2423\n", "Epoch 2/2\n", "359/359 [==============================] - 3s 8ms/step - loss: 260.8356 - accuracy: 0.4290\n", "CPU times: user 9.6 s, sys: 728 ms, total: 10.3 s\n", "Wall time: 7.17 s\n" ] } ], "source": [ "%%time\n", "# with interleave and parallel mpas\n", "ds = create_preproc_dataset_interleave(\n", " 'gs://practical-ml-vision-book-data/flowers_tfr/train' + PATTERN_SUFFIX,\n", " num_parallel=AUTOTUNE\n", ").batch(1)\n", "train_simple_model(ds, NUM_EPOCHS)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We note that the improvement remains:\n", "\n", "| Method | CPU time | Wall time |\n", "| -----------------------| ----------- | ------------ |\n", "| Plain | 9.91s | 9.39s |\n", "| Parallel Map | 10.7s | 8.17s |\n", "| Interleave | 10.5s | 7.54s |\n", "| Interleave+Parallel Map| 10.3s | 7.17s |" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Speeding up the handling of data" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [], "source": [ "# alias to the more efficient one\n", "def create_preproc_dataset(pattern):\n", " return create_preproc_dataset_interleave(pattern, num_parallel=AUTOTUNE)" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Interleaving the reading of 2 files.\n", "Epoch 1/2\n", "359/359 [==============================] - 3s 9ms/step - loss: 387.4389 - accuracy: 0.2813\n", "Epoch 2/2\n", "359/359 [==============================] - 3s 9ms/step - loss: 279.8046 - accuracy: 0.3760\n", "CPU times: user 10.5 s, sys: 899 ms, total: 11.4 s\n", "Wall time: 8.09 s\n" ] } ], "source": [ "%%time\n", "# add prefetching\n", "ds = create_preproc_dataset(\n", " 'gs://practical-ml-vision-book-data/flowers_tfr/train' + PATTERN_SUFFIX\n", ").prefetch(AUTOTUNE).batch(1)\n", "train_simple_model(ds, NUM_EPOCHS)" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Interleaving the reading of 2 files.\n", "Epoch 1/2\n", "45/45 [==============================] - 3s 60ms/step - loss: 113.6279 - accuracy: 0.2702\n", "Epoch 2/2\n", "45/45 [==============================] - 3s 56ms/step - loss: 120.0435 - accuracy: 0.2758\n", "CPU times: user 8.97 s, sys: 590 ms, total: 9.56 s\n", "Wall time: 6.9 s\n" ] } ], "source": [ "%%time\n", "# Add batching of different sizes\n", "ds = create_preproc_dataset(\n", " 'gs://practical-ml-vision-book-data/flowers_tfr/train' + PATTERN_SUFFIX\n", ").prefetch(AUTOTUNE).batch(8)\n", "train_simple_model(ds, NUM_EPOCHS)" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Interleaving the reading of 2 files.\n", "Epoch 1/2\n", "23/23 [==============================] - 3s 110ms/step - loss: 97.3240 - accuracy: 0.2507\n", "Epoch 2/2\n", "23/23 [==============================] - 2s 105ms/step - loss: 41.4950 - accuracy: 0.3900\n", "CPU times: user 8.97 s, sys: 939 ms, total: 9.9 s\n", "Wall time: 6.7 s\n" ] } ], "source": [ "%%time\n", "# Add batching of different sizes\n", "ds = create_preproc_dataset(\n", " 'gs://practical-ml-vision-book-data/flowers_tfr/train' + PATTERN_SUFFIX\n", ").prefetch(AUTOTUNE).batch(16)\n", "train_simple_model(ds, NUM_EPOCHS)" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Interleaving the reading of 2 files.\n", "Epoch 1/2\n", "12/12 [==============================] - 2s 187ms/step - loss: 121.8208 - accuracy: 0.2423\n", "Epoch 2/2\n", "12/12 [==============================] - 2s 185ms/step - loss: 51.0977 - accuracy: 0.3259\n", "CPU times: user 8.78 s, sys: 902 ms, total: 9.68 s\n", "Wall time: 6.37 s\n" ] } ], "source": [ "%%time\n", "# Add batching of different sizes\n", "ds = create_preproc_dataset(\n", " 'gs://practical-ml-vision-book-data/flowers_tfr/train' + PATTERN_SUFFIX\n", ").prefetch(AUTOTUNE).batch(32)\n", "train_simple_model(ds, NUM_EPOCHS)" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Interleaving the reading of 2 files.\n", "Epoch 1/2\n", "12/12 [==============================] - 3s 213ms/step - loss: 106.7971 - accuracy: 0.2423\n", "Epoch 2/2\n", "12/12 [==============================] - 1s 47ms/step - loss: 66.2376 - accuracy: 0.3203\n", "CPU times: user 5.16 s, sys: 1 s, total: 6.16 s\n", "Wall time: 4.36 s\n" ] } ], "source": [ "%%time\n", "# add caching: always do this optimization last.\n", "ds = create_preproc_dataset(\n", " 'gs://practical-ml-vision-book-data/flowers_tfr/train' + PATTERN_SUFFIX\n", ").cache().batch(32)\n", "train_simple_model(ds, NUM_EPOCHS)" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Interleaving the reading of 2 files.\n", "Epoch 1/2\n", "12/12 [==============================] - 2s 192ms/step - loss: 126.8334 - accuracy: 0.2340\n", "Epoch 2/2\n", "12/12 [==============================] - 1s 45ms/step - loss: 51.4512 - accuracy: 0.3287\n", "CPU times: user 5.01 s, sys: 756 ms, total: 5.76 s\n", "Wall time: 4.04 s\n" ] } ], "source": [ "%%time\n", "# add caching: always do this optimization last.\n", "ds = create_preproc_dataset(\n", " 'gs://practical-ml-vision-book-data/flowers_tfr/train' + PATTERN_SUFFIX\n", ").prefetch(AUTOTUNE).cache().batch(32)\n", "train_simple_model(ds, NUM_EPOCHS)" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Interleaving the reading of 2 files.\n", "Epoch 1/2\n", "12/12 [==============================] - 2s 208ms/step - loss: 148.2066 - accuracy: 0.2507\n", "Epoch 2/2\n", "12/12 [==============================] - 1s 44ms/step - loss: 71.6666 - accuracy: 0.3064\n", "CPU times: user 4.95 s, sys: 692 ms, total: 5.65 s\n", "Wall time: 4.19 s\n" ] } ], "source": [ "%%time\n", "# add caching: always do this optimization last.\n", "ds = create_preproc_dataset(\n", " 'gs://practical-ml-vision-book-data/flowers_tfr/train' + PATTERN_SUFFIX\n", ").cache().prefetch(AUTOTUNE).batch(32)\n", "train_simple_model(ds, NUM_EPOCHS)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Adding to the previous table:\n", "\n", "| Method | CPU time | Wall time |\n", "| -----------------------| ----------- | ------------ |\n", "| Plain | 9.91s | 9.39s |\n", "| Parallel Map | 10.7s | 8.17s |\n", "| Interleave | 10.5s | 7.54s |\n", "| Interleave+Parallel Map| 10.3s | 7.17s |\n", "| Interleave + Parallel, and then adding: | - | - |\n", "| Prefetch | 11.4s | 8.09s |\n", "| Batch size 8 | 9.56s | 6.90s |\n", "| Batch size 16 | 9.90s | 6.70s |\n", "| Batch size 32 | 9.68s | 6.37s |\n", "| Interleave + Parallel + batchsize 32, and then adding: | - | - |\n", "| Cache | 6.16s | 4.36s |\n", "| Prefetch + Cache | 5.76s | 4.04s |\n", "| Cache + Prefetch | 5.65s | 4.19s |\n", "\n", "So, the best option is:\n", "<pre>\n", "ds = create_preproc_dataset_interleave(pattern, num_parallel=AUTOTUNE).prefetch(AUTOTUNE).cache().batch(32)\n", "</pre>" ] }, { "cell_type": "markdown", "metadata": { "id": "Duu8mX3iXANE" }, "source": [ "## License\n", "Copyright 2020 Google Inc. Licensed under the Apache License, Version 2.0 (the \"License\"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License." ] } ], "metadata": { "accelerator": "GPU", "colab": { "collapsed_sections": [ "5UOm2etrwYCs" ], "name": "03a_transfer_learning.ipynb", "provenance": [], "toc_visible": true }, "environment": { "name": "tf2-2-3-gpu.2-3.m59", "type": "gcloud", "uri": "gcr.io/deeplearning-platform-release/tf2-2-3-gpu.2-3:m59" }, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.8" } }, "nbformat": 4, "nbformat_minor": 4 }

07_training/07a_ingest.ipynb (942 lines of code) (raw):