self-paced-labs/ai-platform-qwikstart/ai_platform_qwik

{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## AI Platform: Qwik Start\n", "\n", "This lab gives you an introductory, end-to-end experience of training and prediction on AI Platform. The lab will use a census dataset to:\n", "\n", "- Create a TensorFlow 2.x training application and validate it locally.\n", "- Run your training job on a single worker instance in the cloud.\n", "- Deploy a model to support prediction.\n", "- Request an online prediction and see the response." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import os" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Step 1: Get your training data\n", "\n", "The relevant data files, adult.data and adult.test, are hosted in a public Cloud Storage bucket.\n", "\n", "You can read the files directly from Cloud Storage or copy them to your local environment. For this lab you will download the samples for local training, and later upload them to your own Cloud Storage bucket for cloud training.\n", "\n", "Run the following command to download the data to a local file directory and set variables that point to the downloaded data files:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%bash\n", "\n", "mkdir data\n", "gsutil -m cp gs://cloud-samples-data/ml-engine/census/data/* data/" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%bash\n", "\n", "export TRAIN_DATA=$(pwd)/data/adult.data.csv\n", "export EVAL_DATA=$(pwd)/data/adult.test.csv" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Inspect what the data looks like by looking at the first couple of rows:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%bash\n", "\n", "head data/adult.data.csv" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Step 2: Run a local training job\n", "\n", "A local training job loads your Python training program and starts a training process in an environment that's similar to that of a live Cloud AI Platform cloud training job.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Step 2.1: Create files to hold the Python program\n", "\n", "To do that, let's create three files. The first, called util.py, will contain utility methods for cleaning and preprocessing the data, as well as performing any feature engineering needed by transforming and normalizing the data." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%bash\n", "mkdir -p trainer\n", "touch trainer/__init__.py" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%writefile trainer/util.py\n", "from __future__ import absolute_import\n", "from __future__ import division\n", "from __future__ import print_function\n", "\n", "import os\n", "from six.moves import urllib\n", "import tempfile\n", "\n", "import numpy as np\n", "import pandas as pd\n", "import tensorflow as tf\n", "\n", "# Storage directory\n", "DATA_DIR = os.path.join(tempfile.gettempdir(), 'census_data')\n", "\n", "# Download options.\n", "DATA_URL = (\n", " 'https://storage.googleapis.com/cloud-samples-data/ai-platform/census'\n", " '/data')\n", "TRAINING_FILE = 'adult.data.csv'\n", "EVAL_FILE = 'adult.test.csv'\n", "TRAINING_URL = '%s/%s' % (DATA_URL, TRAINING_FILE)\n", "EVAL_URL = '%s/%s' % (DATA_URL, EVAL_FILE)\n", "\n", "# These are the features in the dataset.\n", "# Dataset information: https://archive.ics.uci.edu/ml/datasets/census+income\n", "_CSV_COLUMNS = [\n", " 'age', 'workclass', 'fnlwgt', 'education', 'education_num',\n", " 'marital_status', 'occupation', 'relationship', 'race', 'gender',\n", " 'capital_gain', 'capital_loss', 'hours_per_week', 'native_country',\n", " 'income_bracket'\n", "]\n", "\n", "# This is the label (target) we want to predict.\n", "_LABEL_COLUMN = 'income_bracket'\n", "\n", "# These are columns we will not use as features for training. There are many\n", "# reasons not to use certain attributes of data for training. Perhaps their\n", "# values are noisy or inconsistent, or perhaps they encode bias that we do not\n", "# want our model to learn. For a deep dive into the features of this Census\n", "# dataset and the challenges they pose, see the Introduction to ML Fairness\n", "# Notebook: https://colab.research.google.com/github/google/eng-edu/blob\n", "# /master/ml/cc/exercises/intro_to_fairness.ipynb\n", "UNUSED_COLUMNS = ['fnlwgt', 'education', 'gender']\n", "\n", "_CATEGORICAL_TYPES = {\n", " 'workclass': pd.api.types.CategoricalDtype(categories=[\n", " 'Federal-gov', 'Local-gov', 'Never-worked', 'Private', 'Self-emp-inc',\n", " 'Self-emp-not-inc', 'State-gov', 'Without-pay'\n", " ]),\n", " 'marital_status': pd.api.types.CategoricalDtype(categories=[\n", " 'Divorced', 'Married-AF-spouse', 'Married-civ-spouse',\n", " 'Married-spouse-absent', 'Never-married', 'Separated', 'Widowed'\n", " ]),\n", " 'occupation': pd.api.types.CategoricalDtype([\n", " 'Adm-clerical', 'Armed-Forces', 'Craft-repair', 'Exec-managerial',\n", " 'Farming-fishing', 'Handlers-cleaners', 'Machine-op-inspct',\n", " 'Other-service', 'Priv-house-serv', 'Prof-specialty', 'Protective-serv',\n", " 'Sales', 'Tech-support', 'Transport-moving'\n", " ]),\n", " 'relationship': pd.api.types.CategoricalDtype(categories=[\n", " 'Husband', 'Not-in-family', 'Other-relative', 'Own-child', 'Unmarried',\n", " 'Wife'\n", " ]),\n", " 'race': pd.api.types.CategoricalDtype(categories=[\n", " 'Amer-Indian-Eskimo', 'Asian-Pac-Islander', 'Black', 'Other', 'White'\n", " ]),\n", " 'native_country': pd.api.types.CategoricalDtype(categories=[\n", " 'Cambodia', 'Canada', 'China', 'Columbia', 'Cuba', 'Dominican-Republic',\n", " 'Ecuador', 'El-Salvador', 'England', 'France', 'Germany', 'Greece',\n", " 'Guatemala', 'Haiti', 'Holand-Netherlands', 'Honduras', 'Hong',\n", " 'Hungary',\n", " 'India', 'Iran', 'Ireland', 'Italy', 'Jamaica', 'Japan', 'Laos',\n", " 'Mexico',\n", " 'Nicaragua', 'Outlying-US(Guam-USVI-etc)', 'Peru', 'Philippines',\n", " 'Poland',\n", " 'Portugal', 'Puerto-Rico', 'Scotland', 'South', 'Taiwan', 'Thailand',\n", " 'Trinadad&Tobago', 'United-States', 'Vietnam', 'Yugoslavia'\n", " ]),\n", " 'income_bracket': pd.api.types.CategoricalDtype(categories=[\n", " '<=50K', '>50K'\n", " ])\n", "}\n", "\n", "\n", "def _download_and_clean_file(filename, url):\n", " \"\"\"Downloads data from url, and makes changes to match the CSV format.\n", "\n", " The CSVs may use spaces after the comma delimters (non-standard) or include\n", " rows which do not represent well-formed examples. This function strips out\n", " some of these problems.\n", "\n", " Args:\n", " filename: filename to save url to\n", " url: URL of resource to download\n", " \"\"\"\n", " temp_file, _ = urllib.request.urlretrieve(url)\n", " with tf.io.gfile.GFile(temp_file, 'r') as temp_file_object:\n", " with tf.io.gfile.GFile(filename, 'w') as file_object:\n", " for line in temp_file_object:\n", " line = line.strip()\n", " line = line.replace(', ', ',')\n", " if not line or ',' not in line:\n", " continue\n", " if line[-1] == '.':\n", " line = line[:-1]\n", " line += '\\n'\n", " file_object.write(line)\n", " tf.io.gfile.remove(temp_file)\n", "\n", "\n", "def download(data_dir):\n", " \"\"\"Downloads census data if it is not already present.\n", "\n", " Args:\n", " data_dir: directory where we will access/save the census data\n", " \"\"\"\n", " tf.io.gfile.makedirs(data_dir)\n", "\n", " training_file_path = os.path.join(data_dir, TRAINING_FILE)\n", " if not tf.io.gfile.exists(training_file_path):\n", " _download_and_clean_file(training_file_path, TRAINING_URL)\n", "\n", " eval_file_path = os.path.join(data_dir, EVAL_FILE)\n", " if not tf.io.gfile.exists(eval_file_path):\n", " _download_and_clean_file(eval_file_path, EVAL_URL)\n", "\n", " return training_file_path, eval_file_path\n", "\n", "\n", "def preprocess(dataframe):\n", " \"\"\"Converts categorical features to numeric. Removes unused columns.\n", "\n", " Args:\n", " dataframe: Pandas dataframe with raw data\n", "\n", " Returns:\n", " Dataframe with preprocessed data\n", " \"\"\"\n", " dataframe = dataframe.drop(columns=UNUSED_COLUMNS)\n", "\n", " # Convert integer valued (numeric) columns to floating point\n", " numeric_columns = dataframe.select_dtypes(['int64']).columns\n", " dataframe[numeric_columns] = dataframe[numeric_columns].astype('float32')\n", "\n", " # Convert categorical columns to numeric\n", " cat_columns = dataframe.select_dtypes(['object']).columns\n", " dataframe[cat_columns] = dataframe[cat_columns].apply(lambda x: x.astype(\n", " _CATEGORICAL_TYPES[x.name]))\n", " dataframe[cat_columns] = dataframe[cat_columns].apply(lambda x: x.cat.codes)\n", " return dataframe\n", "\n", "\n", "def standardize(dataframe):\n", " \"\"\"Scales numerical columns using their means and standard deviation to get\n", " z-scores: the mean of each numerical column becomes 0, and the standard\n", " deviation becomes 1. This can help the model converge during training.\n", "\n", " Args:\n", " dataframe: Pandas dataframe\n", "\n", " Returns:\n", " Input dataframe with the numerical columns scaled to z-scores\n", " \"\"\"\n", " dtypes = list(zip(dataframe.dtypes.index, map(str, dataframe.dtypes)))\n", " # Normalize numeric columns.\n", " for column, dtype in dtypes:\n", " if dtype == 'float32':\n", " dataframe[column] -= dataframe[column].mean()\n", " dataframe[column] /= dataframe[column].std()\n", " return dataframe\n", "\n", "\n", "def load_data():\n", " \"\"\"Loads data into preprocessed (train_x, train_y, eval_y, eval_y)\n", " dataframes.\n", "\n", " Returns:\n", " A tuple (train_x, train_y, eval_x, eval_y), where train_x and eval_x are\n", " Pandas dataframes with features for training and train_y and eval_y are\n", " numpy arrays with the corresponding labels.\n", " \"\"\"\n", " # Download Census dataset: Training and eval csv files.\n", " training_file_path, eval_file_path = download(DATA_DIR)\n", "\n", " # This census data uses the value '?' for missing entries. We use\n", " # na_values to\n", " # find ? and set it to NaN.\n", " # https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv\n", " # .html\n", " train_df = pd.read_csv(training_file_path, names=_CSV_COLUMNS,\n", " na_values='?')\n", " eval_df = pd.read_csv(eval_file_path, names=_CSV_COLUMNS, na_values='?')\n", "\n", " train_df = preprocess(train_df)\n", " eval_df = preprocess(eval_df)\n", "\n", " # Split train and eval data with labels. The pop method copies and removes\n", " # the label column from the dataframe.\n", " train_x, train_y = train_df, train_df.pop(_LABEL_COLUMN)\n", " eval_x, eval_y = eval_df, eval_df.pop(_LABEL_COLUMN)\n", "\n", " # Join train_x and eval_x to normalize on overall means and standard\n", " # deviations. Then separate them again.\n", " all_x = pd.concat([train_x, eval_x], keys=['train', 'eval'])\n", " all_x = standardize(all_x)\n", " train_x, eval_x = all_x.xs('train'), all_x.xs('eval')\n", "\n", " # Reshape label columns for use with tf.data.Dataset\n", " train_y = np.asarray(train_y).astype('float32').reshape((-1, 1))\n", " eval_y = np.asarray(eval_y).astype('float32').reshape((-1, 1))\n", "\n", " return train_x, train_y, eval_x, eval_y" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The second file, called model.py, defines the input function and the model architecture. In this example, we use tf.data API for the data pipeline and create the model using the Keras Sequential API. We define a DNN with an input layer and 3 additonal layers using the Relu activation function. Since the task is a binary classification, the output layer uses the sigmoid activation." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%writefile trainer/model.py\n", "from __future__ import absolute_import\n", "from __future__ import division\n", "from __future__ import print_function\n", "\n", "import tensorflow as tf\n", "\n", "\n", "def input_fn(features, labels, shuffle, num_epochs, batch_size):\n", " \"\"\"Generates an input function to be used for model training.\n", "\n", " Args:\n", " features: numpy array of features used for training or inference\n", " labels: numpy array of labels for each example\n", " shuffle: boolean for whether to shuffle the data or not (set True for\n", " training, False for evaluation)\n", " num_epochs: number of epochs to provide the data for\n", " batch_size: batch size for training\n", "\n", " Returns:\n", " A tf.data.Dataset that can provide data to the Keras model for training or\n", " evaluation\n", " \"\"\"\n", " if labels is None:\n", " inputs = features\n", " else:\n", " inputs = (features, labels)\n", " dataset = tf.data.Dataset.from_tensor_slices(inputs)\n", "\n", " if shuffle:\n", " dataset = dataset.shuffle(buffer_size=len(features))\n", "\n", " # We call repeat after shuffling, rather than before, to prevent separate\n", " # epochs from blending together.\n", " dataset = dataset.repeat(num_epochs)\n", " dataset = dataset.batch(batch_size)\n", " return dataset\n", "\n", "\n", "def create_keras_model(input_dim, learning_rate):\n", " \"\"\"Creates Keras Model for Binary Classification.\n", "\n", " The single output node + Sigmoid activation makes this a Logistic\n", " Regression.\n", "\n", " Args:\n", " input_dim: How many features the input has\n", " learning_rate: Learning rate for training\n", "\n", " Returns:\n", " The compiled Keras model (still needs to be trained)\n", " \"\"\"\n", " Dense = tf.keras.layers.Dense\n", " model = tf.keras.Sequential(\n", " [\n", " Dense(100, activation=tf.nn.relu, kernel_initializer='uniform',\n", " input_shape=(input_dim,)),\n", " Dense(75, activation=tf.nn.relu),\n", " Dense(50, activation=tf.nn.relu),\n", " Dense(25, activation=tf.nn.relu),\n", " Dense(1, activation=tf.nn.sigmoid)\n", " ])\n", "\n", " # Custom Optimizer:\n", " # https://www.tensorflow.org/api_docs/python/tf/train/RMSPropOptimizer\n", " optimizer = tf.keras.optimizers.RMSprop(lr=learning_rate)\n", "\n", " # Compile Keras model\n", " model.compile(\n", " loss='binary_crossentropy', optimizer=optimizer, metrics=['accuracy'])\n", " return model" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The last file, called task.py, trains on data loaded and preprocessed in util.py. Using the tf.distribute.MirroredStrategy() scope, it is possible to train on a distributed fashion. The trained model is then saved in a TensorFlow SavedModel format." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%writefile trainer/task.py\n", "from __future__ import absolute_import\n", "from __future__ import division\n", "from __future__ import print_function\n", "\n", "import argparse\n", "import os\n", "\n", "from . import model\n", "from . import util\n", "\n", "import tensorflow as tf\n", "\n", "\n", "def get_args():\n", " \"\"\"Argument parser.\n", "\n", " Returns:\n", " Dictionary of arguments.\n", " \"\"\"\n", " parser = argparse.ArgumentParser()\n", " parser.add_argument(\n", " '--job-dir',\n", " type=str,\n", " required=True,\n", " help='local or GCS location for writing checkpoints and exporting '\n", " 'models')\n", " parser.add_argument(\n", " '--num-epochs',\n", " type=int,\n", " default=20,\n", " help='number of times to go through the data, default=20')\n", " parser.add_argument(\n", " '--batch-size',\n", " default=128,\n", " type=int,\n", " help='number of records to read during each training step, default=128')\n", " parser.add_argument(\n", " '--learning-rate',\n", " default=.01,\n", " type=float,\n", " help='learning rate for gradient descent, default=.01')\n", " parser.add_argument(\n", " '--verbosity',\n", " choices=['DEBUG', 'ERROR', 'FATAL', 'INFO', 'WARN'],\n", " default='INFO')\n", " args, _ = parser.parse_known_args()\n", " return args\n", "\n", "\n", "def train_and_evaluate(args):\n", " \"\"\"Trains and evaluates the Keras model.\n", "\n", " Uses the Keras model defined in model.py and trains on data loaded and\n", " preprocessed in util.py. Saves the trained model in TensorFlow SavedModel\n", " format to the path defined in part by the --job-dir argument.\n", "\n", " Args:\n", " args: dictionary of arguments - see get_args() for details\n", " \"\"\"\n", "\n", " train_x, train_y, eval_x, eval_y = util.load_data()\n", "\n", " # dimensions\n", " num_train_examples, input_dim = train_x.shape\n", " num_eval_examples = eval_x.shape[0]\n", "\n", " # Create the Keras Model\n", " keras_model = model.create_keras_model(\n", " input_dim=input_dim, learning_rate=args.learning_rate)\n", "\n", " # Pass a numpy array by passing DataFrame.values\n", " training_dataset = model.input_fn(\n", " features=train_x.values,\n", " labels=train_y,\n", " shuffle=True,\n", " num_epochs=args.num_epochs,\n", " batch_size=args.batch_size)\n", "\n", " # Pass a numpy array by passing DataFrame.values\n", " validation_dataset = model.input_fn(\n", " features=eval_x.values,\n", " labels=eval_y,\n", " shuffle=False,\n", " num_epochs=args.num_epochs,\n", " batch_size=num_eval_examples)\n", "\n", " # Setup Learning Rate decay.\n", " lr_decay_cb = tf.keras.callbacks.LearningRateScheduler(\n", " lambda epoch: args.learning_rate + 0.02 * (0.5 ** (1 + epoch)),\n", " verbose=True)\n", "\n", " # Setup TensorBoard callback.\n", " tensorboard_cb = tf.keras.callbacks.TensorBoard(\n", " os.path.join(args.job_dir, 'keras_tensorboard'),\n", " histogram_freq=1)\n", "\n", " # Train model\n", " keras_model.fit(\n", " training_dataset,\n", " steps_per_epoch=int(num_train_examples / args.batch_size),\n", " epochs=args.num_epochs,\n", " validation_data=validation_dataset,\n", " validation_steps=1,\n", " verbose=1,\n", " callbacks=[lr_decay_cb, tensorboard_cb])\n", "\n", " export_path = os.path.join(args.job_dir, 'keras_export')\n", " tf.keras.models.save_model(keras_model, export_path)\n", " print('Model exported to: {}'.format(export_path))\n", "\n", "\n", "\n", "if __name__ == '__main__':\n", " strategy = tf.distribute.MirroredStrategy()\n", " with strategy.scope():\n", " args = get_args()\n", " tf.compat.v1.logging.set_verbosity(args.verbosity)\n", " train_and_evaluate(args)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Step 2.2: Run a training job locally using the Python training program\n", "\n", "**NOTE** When you run the same training job on AI Platform later in the lab, you'll see that the command is not much different from the above.\n", "\n", "Specify an output directory and set a MODEL_DIR variable to hold the trained model, then run the training job locally by running the following command (by default, verbose logging is turned off. You can enable it by setting the --verbosity tag to DEBUG):" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%bash\n", "\n", "MODEL_DIR=output\n", "gcloud ai-platform local train \\\n", " --module-name trainer.task \\\n", " --package-path trainer/ \\\n", " --job-dir $MODEL_DIR \\\n", " -- \\\n", " --train-files $TRAIN_DATA \\\n", " --eval-files $EVAL_DATA \\\n", " --train-steps 1000 \\\n", " --eval-steps 100" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Check if the output has been written to the output folder:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%bash\n", "\n", "ls output/keras_export/" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Step 2.3: Prepare input for prediction\n", "\n", "To receive valid and useful predictions, you must preprocess input for prediction in the same way that training data was preprocessed. In a production system, you may want to create a preprocessing pipeline that can be used identically at training time and prediction time.\n", "\n", "For this exercise, use the training package's data-loading code to select a random sample from the evaluation data. This data is in the form that was used to evaluate accuracy after each epoch of training, so it can be used to send test predictions without further preprocessing.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Run the following snippet of code to preprocess the raw data from the adult.test.csv file. Here, we are grabbing 5 examples to run predictions on:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from trainer import util\n", "_, _, eval_x, eval_y = util.load_data()\n", "\n", "prediction_input = eval_x.sample(5)\n", "prediction_targets = eval_y[prediction_input.index]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Check the numerical representation of the features by printing the preprocessed data:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(prediction_input)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Notice that categorical fields, like occupation, have already been converted to integers (with the same mapping that was used for training). Numerical fields, like age, have been scaled to a z-score. Some fields have been dropped from the original data.\n", "\n", "Export the prediction input to a newline-delimited JSON file:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import json\n", "\n", "with open('test.json', 'w') as json_file:\n", " for row in prediction_input.values.tolist():\n", " json.dump(row, json_file)\n", " json_file.write('\\n')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Inspect the .json file:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%bash\n", "\n", "cat test.json" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Step 2.4: Use your trained model for prediction\n", "\n", "Once you've trained your TensorFlow model, you can use it for prediction on new data. In this case, you've trained a census model to predict income category given some information about a person.\n", "\n", "Run the following command to run prediction on the test.json file we created above:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Note:** If you get a \"Bad magic number in .pyc file\" error, go to the terminal and run:\n", "> cd ../../usr/lib/google-cloud-sdk/lib/googlecloudsdk/command_lib/ml_engine/\n", "\n", "> sudo rm *.pyc" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%bash\n", "\n", "gcloud ai-platform local predict \\\n", " --model-dir output/keras_export/ \\\n", " --json-instances ./test.json" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Since the model's last layer uses a sigmoid function for its activation, outputs between 0 and 0.5 represent negative predictions **(\"<=50K\")** and outputs between 0.5 and 1 represent positive ones **(\">50K\")**." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Step 3: Run your training job in the cloud\n", "\n", "Now that you've validated your model by running it locally, you will now get practice training using Cloud AI Platform.\n", "\n", "**Note:** The initial job request will take several minutes to start, but subsequent jobs run more quickly. This enables quick iteration as you develop and validate your training job.\n", "\n", "First, set the following variables:\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%bash\n", "export PROJECT=$(gcloud config list project --format \"value(core.project)\")\n", "echo \"Your current GCP Project Name is: \"${PROJECT}" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "PROJECT = \"YOUR_PROJECT_NAME\" # Replace with your project name\n", "BUCKET_NAME=PROJECT+\"-aiplatform\"\n", "REGION=\"your-region\"" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "os.environ[\"PROJECT\"] = PROJECT\n", "os.environ[\"BUCKET_NAME\"] = BUCKET_NAME\n", "os.environ[\"REGION\"] = REGION\n", "os.environ[\"TFVERSION\"] = \"2.11\"\n", "os.environ[\"PYTHONVERSION\"] = \"3.7\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Step 3.1: Set up a Cloud Storage bucket\n", "\n", "The AI Platform services need to access Cloud Storage (GCS) to read and write data during model training and batch prediction.\n", "\n", "Create a bucket using BUCKET_NAME as the name for the bucket and copy the data into it." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%bash\n", "\n", "if ! gsutil ls | grep -q gs://${BUCKET_NAME}; then\n", " gsutil mb -l ${REGION} gs://${BUCKET_NAME}\n", "fi\n", "gsutil cp -r data gs://$BUCKET_NAME/data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Set the TRAIN_DATA and EVAL_DATA variables to point to the files:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%bash\n", "\n", "export TRAIN_DATA=gs://$BUCKET_NAME/data/adult.data.csv\n", "export EVAL_DATA=gs://$BUCKET_NAME/data/adult.test.csv" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Use gsutil again to copy the JSON test file test.json to your Cloud Storage bucket:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%bash\n", "\n", "gsutil cp test.json gs://$BUCKET_NAME/data/test.json" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Set the TEST_JSON variable to point to that file:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%bash\n", "\n", "export TEST_JSON=gs://$BUCKET_NAME/data/test.json" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Go back to the lab instructions and check your progress by testing the completed tasks:**\n", "\n", "**- \"Set up a Google Cloud Storage\".**\n", "\n", "**- \"Upload the data files to your Cloud Storage bucket\".**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Step 3.2: Run a single-instance trainer in the cloud\n", "\n", "With a validated training job that runs in both single-instance and distributed mode, you're now ready to run a training job in the cloud. For this example, we will be requesting a single-instance training job.\n", "\n", "Use the default BASIC scale tier to run a single-instance training job. The initial job request can take a few minutes to start, but subsequent jobs run more quickly. This enables quick iteration as you develop and validate your training job.\n", "\n", "Select a name for the initial training run that distinguishes it from any subsequent training runs. For example, we can use date and time to compose the job id.\n", "\n", "Specify a directory for output generated by AI Platform by setting an OUTPUT_PATH variable to include when requesting training and prediction jobs. The OUTPUT_PATH represents the fully qualified Cloud Storage location for model checkpoints, summaries, and exports. You can use the BUCKET_NAME variable you defined in a previous step. It's a good practice to use the job name as the output directory.\n", "\n", "Run the following command to submit a training job in the cloud that uses a single process. This time, set the --verbosity tag to DEBUG so that you can inspect the full logging output and retrieve accuracy, loss, and other metrics. The output also contains a number of other warning messages that you can ignore for the purposes of this sample:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%bash\n", "\n", "JOB_ID=census_$(date -u +%y%m%d_%H%M%S)\n", "OUTPUT_PATH=gs://$BUCKET_NAME/$JOB_ID\n", "gcloud ai-platform jobs submit training $JOB_ID \\\n", " --job-dir $OUTPUT_PATH \\\n", " --runtime-version $TFVERSION \\\n", " --python-version $PYTHONVERSION \\\n", " --module-name trainer.task \\\n", " --package-path trainer/ \\\n", " --region $REGION \\\n", " -- \\\n", " --train-files $TRAIN_DATA \\\n", " --eval-files $EVAL_DATA \\\n", " --train-steps 1000 \\\n", " --eval-steps 100 \\\n", " --verbosity DEBUG" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Set an environment variable with the jobId generated above:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "os.environ[\"JOB_ID\"] = \"YOUR_JOB_ID\" # Replace with your job id" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can monitor the progress of your training job by watching the logs on the command line by running:\n", "\n", "`gcloud ai-platform jobs stream-logs $JOB_ID`\n", "\n", "Or monitor it in the Console at `AI Platform > Jobs`. Wait until your AI Platform training job is done. It is finished when you see a green check mark by the jobname in the Cloud Console, or when you see the message Job completed successfully from the Cloud Shell command line." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Wait for the job to complete before proceeding to the next step.\n", "Go back to the lab instructions and check your progress by testing the completed task:**\n", "\n", "**- \"Run a single-instance trainer in the cloud\".**\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Step 3.3: Deploy your model to support prediction\n", "\n", "By deploying your trained model to AI Platform to serve online prediction requests, you get the benefit of scalable serving. This is useful if you expect your trained model to be hit with many prediction requests in a short period of time.\n", "\n", "**Note:** You will get `Using endpoint [https://ml.googleapis.com/]` output after running the next cells. If you try to open that link, you will see `404` error message. **You have to ignore it and move forward.**\n", "\n", "Create an AI Platform model:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "os.environ[\"MODEL_NAME\"] = \"census\"" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%bash\n", "\n", "gcloud ai-platform models create $MODEL_NAME --regions=$REGION" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Set the environment variable MODEL_BINARIES to the full path of your exported trained model binaries `$OUTPUT_PATH/keras_export/`." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You'll deploy this trained model.\n", "\n", "Run the following command to create a version v1 of your model:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%bash\n", "\n", "OUTPUT_PATH=gs://$BUCKET_NAME/$JOB_ID\n", "MODEL_BINARIES=$OUTPUT_PATH/keras_export/\n", "gcloud ai-platform versions create v1 \\\n", "--model $MODEL_NAME \\\n", "--origin $MODEL_BINARIES \\\n", "--runtime-version $TFVERSION \\\n", "--python-version $PYTHONVERSION \\\n", "--region=global" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It may take several minutes to deploy your trained model. When done, you can see a list of your models using the models list command:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%bash\n", "\n", "gcloud ai-platform models list --region=global" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Go back to the lab instructions and check your progress by testing the completed tasks:**\n", "\n", "**- \"Create an AI Platform model\".**\n", "\n", "**- \"Create a version v1 of your model\".**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Step 3.4: Send an online prediction request to your deployed model\n", "\n", "You can now send prediction requests to your deployed model. The following command sends a prediction request using the test.json.\n", "\n", "The response includes the probabilities of each label **(>50K and <=50K)** based on the data entry in test.json, thus indicating whether the predicted income is greater than or less than 50,000 dollars." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%bash\n", "\n", "gcloud ai-platform predict \\\n", "--model $MODEL_NAME \\\n", "--version v1 \\\n", "--json-instances ./test.json \\\n", "--region global" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Note:** AI Platform supports batch prediction, too, but it's not included in this lab. See the documentation for more info." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Go back to the lab instructions to answer some multiple choice questions to reinforce your uncerstanding of some of these lab's concepts.**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Congratulations!\n", "\n", "In this lab you've learned how to train a TensorFlow model both locally and on AI Platform, how to prepare data for prediction and to perform predictions both locally and in the Cloud AI Platform." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.3" } }, "nbformat": 4, "nbformat_minor": 4 }

self-paced-labs/ai-platform-qwikstart/ai_platform_qwik_start.ipynb (1,115 lines of code) (raw):