data_analytics/sample.ipynb (227 lines of code) (raw):
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### 「%%bigquery」に続いてSQLを記述するとBigQueryにクエリを投げることができます\n",
"\n",
"例えば、WebUIから実行した「重複なしでバイクステーションの数をカウントする」クエリは以下のように実行します"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%bigquery\n",
"SELECT\n",
" COUNT(DISTINCT station_id) as cnt\n",
"FROM\n",
" `bigquery-public-data.new_york.citibike_stations`"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"同じように、WebUIから実行した各種クエリを実行してみます。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### 営業しているバイクステーション"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%bigquery\n",
"SELECT\n",
" COUNT(station_id) as cnt\n",
"FROM\n",
" `bigquery-public-data.new_york.citibike_stations`\n",
"WHERE\n",
" is_installed = TRUE\n",
" AND is_renting = TRUE\n",
" AND is_returning = TRUE"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### ユーザーの課金モデル"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%bigquery\n",
"SELECT\n",
" usertype,\n",
" gender,\n",
" COUNT(gender) AS cnt\n",
"FROM\n",
" `bigquery-public-data.new_york.citibike_trips`\n",
"GROUP BY\n",
" usertype,\n",
" gender\n",
"ORDER BY\n",
" cnt DESC"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### バイクの借り方の傾向"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%bigquery\n",
"SELECT\n",
" start_station_name,\n",
" end_station_name,\n",
" COUNT(end_station_name) AS cnt\n",
"FROM\n",
" `bigquery-public-data.new_york.citibike_trips`\n",
"GROUP BY\n",
" start_station_name,\n",
" end_station_name\n",
"ORDER BY\n",
" cnt DESC"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### 結果の解釈(一例)\n",
"\n",
"- Central Parkの南に地下鉄の駅がある\n",
" - 観光客がCentral Parkの観光に利用している\n",
"- 12 Ave & W 40 St => West St & Chambers St\n",
" - 通勤での利用(居住区からオフィス街への移動)\n",
"- 南北方面ではなく東西方面の移動が多い\n",
" - 地下鉄は南北方向に駅がある\n",
" - NY在住者は自転車で東西方向に移動して、南北方向に地下鉄を利用する傾向がある"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"単純にBigQueryに対してクエリを実行するだけではなく、データの簡易的な可視化などの機能も提供されます。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### 利用者の調査"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"最も利用者が多いstart_station_name=\"Central Park S & 6 Ave\", end_station_name=\"Central Park S & 6 Ave\"の利用時間を調査します。\n",
"%%bigqueryコマンドに続いて変数名を渡すことで、BigQueryの結果をpandasのDataFrameとして保存することができます。"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%bigquery utilization_time\n",
"SELECT\n",
" starttime, stoptime, \n",
" TIMESTAMP_DIFF(stoptime, starttime, MINUTE) as minute,\n",
" usertype, birth_year, gender\n",
"FROM\n",
" `bigquery-public-data.new_york.citibike_trips`\n",
"WHERE\n",
" start_station_name = 'Central Park S & 6 Ave' and end_station_name = 'Central Park S & 6 Ave'"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# utilization_timeの中身の確認\n",
"utilization_time"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Pythonによるデータ可視化\n",
"\n",
"データの概要を掴むためにヒストグラム(データのばらつきを確認するための図)を描きます。"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# 必要となるライブラリのインポート及び警告が表示されないような設定\n",
"import pandas as pd\n",
"import warnings\n",
"warnings.filterwarnings('ignore')\n",
"\n",
"# ヒストグラムの描画\n",
"utilization_time['minute'].hist(bins=range(0,100,2))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"30分程度の利用が最も多いことが確認されました。"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.3"
}
},
"nbformat": 4,
"nbformat_minor": 2
}