courses/machine_learning/deepdive/05_artandscience/c_neuralnetwork.ipynb (354 lines of code) (raw):
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "4f3CKqFUqL2-",
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Neural Network"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Learning Objectives:**\n",
" * Use the `DNNRegressor` class in TensorFlow to predict median housing price"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The data is based on 1990 census data from California. This data is at the city block level, so these features reflect the total number of rooms in that block, or the total number of people who live on that block, respectively.\n",
"<p>\n",
"Let's use a set of features to predict house value."
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "6TjLjL9IU80G"
},
"source": [
"## Set Up\n",
"In this first cell, we'll load the necessary libraries."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"!sudo chown -R jupyter:jupyter /home/jupyter/training-data-analyst"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Ensure the right version of Tensorflow is installed.\n",
"!pip freeze | grep tensorflow==2.5"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import math\n",
"import shutil\n",
"import numpy as np\n",
"import pandas as pd\n",
"import tensorflow as tf\n",
"\n",
"tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.INFO)\n",
"pd.options.display.max_rows = 10\n",
"pd.options.display.float_format = '{:.1f}'.format"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "ipRyUHjhU80Q"
},
"source": [
"Next, we'll load our data set."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"df = pd.read_csv(\"https://storage.googleapis.com/ml_universities/california_housing_train.csv\", sep=\",\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "HzzlSs3PtTmt",
"slideshow": {
"slide_type": "-"
}
},
"source": [
"## Examine the data\n",
"\n",
"It's a good idea to get to know your data a little bit before you work with it.\n",
"\n",
"We'll print out a quick summary of a few useful statistics on each column.\n",
"\n",
"This will include things like mean, standard deviation, max, min, and various quantiles."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"df.head()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "both",
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
},
"test": {
"output": "ignore",
"timeout": 600
}
},
"colab_type": "code",
"id": "gzb10yoVrydW",
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [],
"source": [
"df.describe()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This data is at the city block level, so these features reflect the total number of rooms in that block, or the total number of people who live on that block, respectively. Let's create a different, more appropriate feature. Because we are predicing the price of a single house, we should try to make all our features correspond to a single house as well"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"df['num_rooms'] = df['total_rooms'] / df['households']\n",
"df['num_bedrooms'] = df['total_bedrooms'] / df['households']\n",
"df['persons_per_house'] = df['population'] / df['households']\n",
"df.describe()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"df.drop(['total_rooms', 'total_bedrooms', 'population', 'households'], axis = 1, inplace = True)\n",
"df.describe()"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "Lr6wYl2bt2Ep",
"slideshow": {
"slide_type": "-"
}
},
"source": [
"## Build a neural network model\n",
"\n",
"In this exercise, we'll be trying to predict `median_house_value`. It will be our label (sometimes also called a target). We'll use the remaining columns as our input features.\n",
"\n",
"To train our model, we'll first use the [LinearRegressor](https://www.tensorflow.org/api_docs/python/tf/contrib/learn/LinearRegressor) interface. Then, we'll change to DNNRegressor\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"featcols = {\n",
" colname : tf.feature_column.numeric_column(colname) \\\n",
" for colname in 'housing_median_age,median_income,num_rooms,num_bedrooms,persons_per_house'.split(',')\n",
"}\n",
"# Bucketize lat, lon so it's not so high-res; California is mostly N-S, so more lats than lons\n",
"featcols['longitude'] = tf.feature_column.bucketized_column(tf.feature_column.numeric_column('longitude'),\n",
" np.linspace(-124.3, -114.3, 5).tolist())\n",
"featcols['latitude'] = tf.feature_column.bucketized_column(tf.feature_column.numeric_column('latitude'),\n",
" np.linspace(32.5, 42, 10).tolist())"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"featcols.keys()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Split into train and eval\n",
"msk = np.random.rand(len(df)) < 0.8\n",
"traindf = df[msk]\n",
"evaldf = df[~msk]\n",
"\n",
"SCALE = 100000\n",
"BATCH_SIZE= 100\n",
"OUTDIR = './housing_trained'\n",
"train_input_fn = tf.compat.v1.estimator.inputs.pandas_input_fn(x = traindf[list(featcols.keys())],\n",
" y = traindf[\"median_house_value\"] / SCALE,\n",
" num_epochs = None,\n",
" batch_size = BATCH_SIZE,\n",
" shuffle = True)\n",
"eval_input_fn = tf.compat.v1.estimator.inputs.pandas_input_fn(x = evaldf[list(featcols.keys())],\n",
" y = evaldf[\"median_house_value\"] / SCALE, # note the scaling\n",
" num_epochs = 1, \n",
" batch_size = len(evaldf), \n",
" shuffle=False)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Linear Regressor\n",
"def train_and_evaluate(output_dir, num_train_steps):\n",
" myopt = tf.compat.v1.train.FtrlOptimizer(learning_rate = 0.01) # note the learning rate\n",
" estimator = tf.compat.v1.estimator.LinearRegressor(\n",
" model_dir = output_dir, \n",
" feature_columns = featcols.values(),\n",
" optimizer = myopt)\n",
" \n",
" #Add rmse evaluation metric\n",
" def rmse(labels, predictions):\n",
" pred_values = tf.cast(predictions['predictions'],tf.float64)\n",
" return {'rmse': tf.compat.v1.metrics.root_mean_squared_error(labels*SCALE, pred_values*SCALE)}\n",
" estimator = tf.compat.v1.estimator.add_metrics(estimator,rmse)\n",
" \n",
" train_spec=tf.estimator.TrainSpec(\n",
" input_fn = train_input_fn,\n",
" max_steps = num_train_steps)\n",
" eval_spec=tf.estimator.EvalSpec(\n",
" input_fn = eval_input_fn,\n",
" steps = None,\n",
" start_delay_secs = 1, # start evaluating after N seconds\n",
" throttle_secs = 10, # evaluate every N seconds\n",
" )\n",
" tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec)\n",
"\n",
"# Run training \n",
"shutil.rmtree(OUTDIR, ignore_errors = True) # start fresh each time\n",
"train_and_evaluate(OUTDIR, num_train_steps = (100 * len(traindf)) / BATCH_SIZE) "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# DNN Regressor\n",
"def train_and_evaluate(output_dir, num_train_steps):\n",
" myopt = tf.compat.v1.train.FtrlOptimizer(learning_rate = 0.01) # note the learning rate\n",
" estimator = tf.compat.v1.estimator.DNNRegressor(model_dir = output_dir,\n",
" hidden_units = [100, 50, 20],\n",
" feature_columns = featcols.values(),\n",
" optimizer = myopt,\n",
" dropout = 0.1)\n",
" \n",
" #Add rmse evaluation metric\n",
" def rmse(labels, predictions):\n",
" pred_values = tf.cast(predictions['predictions'],tf.float64)\n",
" return {'rmse': tf.compat.v1.metrics.root_mean_squared_error(labels*SCALE, pred_values*SCALE)}\n",
" estimator = tf.compat.v1.estimator.add_metrics(estimator,rmse)\n",
" \n",
" train_spec=tf.estimator.TrainSpec(\n",
" input_fn = train_input_fn,\n",
" max_steps = num_train_steps)\n",
" eval_spec=tf.estimator.EvalSpec(\n",
" input_fn = eval_input_fn,\n",
" steps = None,\n",
" start_delay_secs = 1, # start evaluating after N seconds\n",
" throttle_secs = 10, # evaluate every N seconds\n",
" )\n",
" tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec)\n",
"\n",
"# Run training \n",
"shutil.rmtree(OUTDIR, ignore_errors = True) # start fresh each time\n",
"tf.compat.v1.summary.FileWriterCache.clear() \n",
"train_and_evaluate(OUTDIR, num_train_steps = (100 * len(traindf)) / BATCH_SIZE) "
]
}
],
"metadata": {
"colab": {
"default_view": {},
"name": "first_steps_with_tensor_flow.ipynb",
"provenance": [],
"version": "0.3.2",
"views": {}
},
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.3"
}
},
"nbformat": 4,
"nbformat_minor": 2
}