data_analytics/sample.ipynb (227 lines of code) (raw):

{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "#### 「%%bigquery」に続いてSQLを記述するとBigQueryにクエリを投げることができます\n", "\n", "例えば、WebUIから実行した「重複なしでバイクステーションの数をカウントする」クエリは以下のように実行します" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%bigquery\n", "SELECT\n", " COUNT(DISTINCT station_id) as cnt\n", "FROM\n", " `bigquery-public-data.new_york.citibike_stations`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "同じように、WebUIから実行した各種クエリを実行してみます。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 営業しているバイクステーション" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%bigquery\n", "SELECT\n", " COUNT(station_id) as cnt\n", "FROM\n", " `bigquery-public-data.new_york.citibike_stations`\n", "WHERE\n", " is_installed = TRUE\n", " AND is_renting = TRUE\n", " AND is_returning = TRUE" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### ユーザーの課金モデル" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%bigquery\n", "SELECT\n", " usertype,\n", " gender,\n", " COUNT(gender) AS cnt\n", "FROM\n", " `bigquery-public-data.new_york.citibike_trips`\n", "GROUP BY\n", " usertype,\n", " gender\n", "ORDER BY\n", " cnt DESC" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### バイクの借り方の傾向" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%bigquery\n", "SELECT\n", " start_station_name,\n", " end_station_name,\n", " COUNT(end_station_name) AS cnt\n", "FROM\n", " `bigquery-public-data.new_york.citibike_trips`\n", "GROUP BY\n", " start_station_name,\n", " end_station_name\n", "ORDER BY\n", " cnt DESC" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 結果の解釈(一例)\n", "\n", "- Central Parkの南に地下鉄の駅がある\n", " - 観光客がCentral Parkの観光に利用している\n", "- 12 Ave & W 40 St => West St & Chambers St\n", " - 通勤での利用(居住区からオフィス街への移動)\n", "- 南北方面ではなく東西方面の移動が多い\n", " - 地下鉄は南北方向に駅がある\n", " - NY在住者は自転車で東西方向に移動して、南北方向に地下鉄を利用する傾向がある" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "単純にBigQueryに対してクエリを実行するだけではなく、データの簡易的な可視化などの機能も提供されます。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 利用者の調査" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "最も利用者が多いstart_station_name=\"Central Park S & 6 Ave\", end_station_name=\"Central Park S & 6 Ave\"の利用時間を調査します。\n", "%%bigqueryコマンドに続いて変数名を渡すことで、BigQueryの結果をpandasのDataFrameとして保存することができます。" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%bigquery utilization_time\n", "SELECT\n", " starttime, stoptime, \n", " TIMESTAMP_DIFF(stoptime, starttime, MINUTE) as minute,\n", " usertype, birth_year, gender\n", "FROM\n", " `bigquery-public-data.new_york.citibike_trips`\n", "WHERE\n", " start_station_name = 'Central Park S & 6 Ave' and end_station_name = 'Central Park S & 6 Ave'" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# utilization_timeの中身の確認\n", "utilization_time" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Pythonによるデータ可視化\n", "\n", "データの概要を掴むためにヒストグラム(データのばらつきを確認するための図)を描きます。" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# 必要となるライブラリのインポート及び警告が表示されないような設定\n", "import pandas as pd\n", "import warnings\n", "warnings.filterwarnings('ignore')\n", "\n", "# ヒストグラムの描画\n", "utilization_time['minute'].hist(bins=range(0,100,2))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "30分程度の利用が最も多いことが確認されました。" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.3" } }, "nbformat": 4, "nbformat_minor": 2 }