<h1> Hyper-parameter tuning </h1>

In this notebook, you will learn how to carry out hyper-parameter tuning.

This notebook takes several hours to run.

<h2> Environment variables for project and bucket </h2>

Change the cell below to reflect your Project ID and bucket name. See Lab 3a for setup instructions.

In [None]:
import os
PROJECT = 'cloud-training-demos'    # CHANGE THIS
REGION = 'us-central1' # Choose an available region for Cloud MLE from https://cloud.google.com/ml-engine/docs/regions.
BUCKET = 'cloud-training-demos-ml' # REPLACE WITH YOUR BUCKET NAME. Use a regional bucket in the region you selected.

In [None]:
# for bash
os.environ['PROJECT'] = PROJECT
os.environ['BUCKET'] = BUCKET
os.environ['REGION'] = REGION

In [None]:
%%bash
gcloud config set project $PROJECT
gcloud config set compute/region $REGION

<h1> 1. Command-line parameters to task.py </h1>

Note the command-line parameters to task.py.  These are the things that could be hypertuned if we wanted.

In [None]:
!grep -A 2 add_argument taxifare/trainer/task.py

<h1> 2. Evaluation metric </h1>

We add a special evaluation metric. It could be any objective function we want.

In [None]:
!grep -A 5 add_eval_metrics taxifare/trainer/model.py

<h1> 3. Make sure outputs do not clobber each other </h1>

We append the trial-number to the output directory.

In [None]:
!grep -A 5 "trial" taxifare/trainer/task.py

<h1> 4. Create hyper-parameter configuration </h1>

The file specifies the search region in parameter space.  Cloud MLE carries out a smart search algorithm within these constraints (i.e. it does not try out every single value).

In [None]:
%%writefile hyperparam.yaml
trainingInput:
  scaleTier: STANDARD_1
  hyperparameters:
    goal: MINIMIZE
    maxTrials: 30
    maxParallelTrials: 3
    hyperparameterMetricTag: rmse
    params:
    - parameterName: train_batch_size
      type: INTEGER
      minValue: 64
      maxValue: 512
      scaleType: UNIT_LOG_SCALE
    - parameterName: nbuckets
      type: INTEGER
      minValue: 10
      maxValue: 20
      scaleType: UNIT_LINEAR_SCALE
    - parameterName: hidden_units
      type: CATEGORICAL
      categoricalValues: ["128 32", "256 128 16", "64 64 64 8"]       

<h1> 5. Run the training job </h1>

Just --config to the usual training command.

In [None]:
%%bash
OUTDIR=gs://${BUCKET}/taxifare/ch4/taxi_trained
JOBNAME=lab4a_$(date -u +%y%m%d_%H%M%S)
echo $OUTDIR $REGION $JOBNAME
gsutil -m rm -rf $OUTDIR
gcloud ml-engine jobs submit training $JOBNAME \
   --region=$REGION \
   --module-name=trainer.task \
   --package-path=${PWD}/taxifare/trainer \
   --job-dir=$OUTDIR \
   --staging-bucket=gs://$BUCKET \
   --scale-tier=STANDARD_1 \
   --runtime-version=1.4 \
   --config=hyperparam.yaml \
   -- \
   --train_data_paths="gs://$BUCKET/taxifare/ch4/taxi_preproc/train*" \
   --eval_data_paths="gs://${BUCKET}/taxifare/ch4/taxi_preproc/valid*"  \
   --output_dir=$OUTDIR \
   --train_steps=5000

<h2>6. Train chosen model on full dataset</h2>

Look at the last section of the <a href="feateng.ipynb">feature engineering notebook</a>.  The extra parameters are based on hyper-parameter tuning.

Copyright 2016 Google Inc. Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License