machine_learning/cloud_ai_building_blocks/sight_ja.ipynb (2,211 lines of code) (raw):
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"name": "sight_ja",
"version": "0.3.2",
"provenance": [],
"collapsed_sections": [
"U3-_Eikt-oTG",
"mTl1sJ3M4kqy",
"M7GpxF04k6lM",
"niWh2mIV5D0a",
"JpZZKscToBWl",
"Ac9T8ZpPz8vN",
"sYtn3ZVFoBWm",
"8YOYOEsWoBWs",
"ehUCOj1ioBWv",
"O6ZKISFloBW1",
"zYtWMAyro6iC",
"wtxvNpt_1NK_",
"NGlpjteyo6iD",
"1tXCiGJ2o6iP",
"HNfYoxwUo6iT",
"OJP4MAlPC6RQ",
"nXoa_PTKo6iY",
"kCKosft1p0Ty",
"lceRBq1L33g5",
"LC7ibzCVbqWf",
"tJDkpudSrnYN",
"w-TwW2Zgaexp",
"ukao_ygiegu5",
"y_nlyVkKS1RM",
"hrc7HffP4s5Q",
"Og_HRE254xBH",
"en-GamGnTBL3",
"3QZBvBTnUCgm",
"g52NOQInZoyA",
"h83QxnCDQ9Zd",
"IrGRKq91RBcj",
"yhlxKts3kKAJ",
"W1pf0qzVfYKC",
"ko1cS32OfIpL"
],
"machine_shape": "hm",
"include_colab_link": true
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
}
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/github/GoogleCloudPlatform/gcp-getting-started-lab-jp/blob/master/machine_learning/cloud_ai_building_blocks/sight_ja.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "5PxyE_pB2xRs",
"colab_type": "text"
},
"source": [
"\n",
"---\n",
"```\n",
"Copyright 2019 Google LLC\n",
"\n",
"Licensed under the Apache License, Version 2.0 (the \"License\");\n",
"you may not use this file except in compliance with the License.\n",
"You may obtain a copy of the License at\n",
"\n",
" https://www.apache.org/licenses/LICENSE-2.0\n",
"\n",
"Unless required by applicable law or agreed to in writing, software\n",
"distributed under the License is distributed on an \"AS IS\" BASIS,\n",
"WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
"See the License for the specific language governing permissions and\n",
"limitations under the License.\n",
"```\n",
"---"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "3pjEx-t2C66B",
"colab_type": "text"
},
"source": [
"## 事前準備\n",
"\n",
"\n",
"\n",
"1. [GCP プロジェクト](https://console.cloud.google.com/cloud-resource-manager) を作成します。\n",
"2. [課金設定](https://support.google.com/cloud/answer/6293499#enable-billing) を有効にします。\n",
"3. [API Key](https://console.cloud.google.com/apis/credentials) を作成します。\n",
"4. [Cloud Vision API と Cloud Video Intelligence API](https://console.cloud.google.com/flows/enableapi?apiid=vision.googleapis.com,videointelligence.googleapis.com) を有効にします。\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "SoTLXRKlve2l",
"colab_type": "text"
},
"source": [
"## Google Cloud API の認証情報を入力\n",
"\n",
"Google Cloud API を REST インタフェースから利用するために、 [API Key](https://console.cloud.google.com/apis/credentials) を利用します。 Google Cloud Console から API Key をコピーしましょう。"
]
},
{
"cell_type": "code",
"metadata": {
"id": "zt_6WSfwvjAa",
"colab_type": "code",
"colab": {}
},
"source": [
"import getpass\n",
"APIKEY = getpass.getpass()"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "U3-_Eikt-oTG",
"colab_type": "text"
},
"source": [
"## Cloud Vision API を使ってみよう !\n",
"\n",
"**Cloud Vision API** では、静止画を認識して画像の中に含まれる物を特定したり、文字を抽出したり (OCR) 、Web上から同一の画像を探す事ができます。このノートブックでは、以下 4 種類の認識を行います。\n",
"- **オブジェクト検知** - 画像中の物体の名称や位置を検出します\n",
"- **顔検出** - 画像中の顔検出や表情からの感情予測を行います\n",
"- **ロゴ検出** - 画像中に存在する有名ブランドのロゴを検出します\n",
"- **ランドマーク検知** - 有名な観光地の画像から、その地名と位置を特定します"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "mTl1sJ3M4kqy",
"colab_type": "text"
},
"source": [
"### Cloud Vision API の準備\n",
"\n",
"API Discovery Service を利用して **Cloud Vision API** を発見します。 Cloud Vision の REST API 仕様は [こちら](https://cloud.google.com/vision/docs/reference/rest/) に解説されています。"
]
},
{
"cell_type": "code",
"metadata": {
"id": "GGBLPTMi_-Zb",
"colab_type": "code",
"colab": {}
},
"source": [
"from googleapiclient.discovery import build\n",
"\n",
"vision_service = build('vision', 'v1p3beta1', developerKey=APIKEY)"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "M7GpxF04k6lM",
"colab_type": "text"
},
"source": [
"### オブジェクト検知\n",
"オブジェクト検知とは、画像中のどの領域にどんなオブジェクト(例えば車やリンゴといった一般的な”物”)があるかを特定します。"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "Zj9iTmgSx7ac",
"colab_type": "text"
},
"source": [
"#### Colab 環境への画像アップロード\n",
"\n",
"次のセルを実行して、ローカルに保存された objects.jpg をアップロードします。"
]
},
{
"cell_type": "code",
"metadata": {
"id": "cGd8n7eAySDG",
"colab_type": "code",
"colab": {}
},
"source": [
"from google.colab import files\n",
"uploaded = files.upload()"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "9TrSiTqeyfKl",
"colab_type": "text"
},
"source": [
"再アップロードすると、ファイルは上書きではなく別ファイル名で保存されます。ファイルを消したい場合は以下のコマンドを実行します。"
]
},
{
"cell_type": "code",
"metadata": {
"id": "wqikTUu2a2oP",
"colab_type": "code",
"colab": {}
},
"source": [
"# !rm ./objects.jpg"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "cTDww6cDcJM8",
"colab_type": "text"
},
"source": [
"#### Request メッセージの構築\n",
"\n",
"API には画像データと、どんな検知をやりたいのかという情報を渡す必要があります。今回は REST API を使っているので、 HTTP Request の Payload に、 JSON 形式で画像データと検知のタイプを指定して渡します。画像データは GCS (Google Cloud Storage) のパスを指定する方法もありますが、今回は直接ファイルの中身を Payload に入れて送ります。直接 Payload に入れる場合は、ファイルの中身を Base64 エンコードする必要があります。"
]
},
{
"cell_type": "code",
"metadata": {
"id": "szxZ-TtzAh64",
"colab_type": "code",
"colab": {}
},
"source": [
"from base64 import b64encode\n",
"\n",
"with open('objects.jpg', 'rb') as image_file:\n",
" my_image = {\n",
" 'content': b64encode(image_file.read()).decode('utf-8')\n",
" }"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "ae1fwijCDCIn",
"colab_type": "code",
"colab": {}
},
"source": [
"my_features = [\n",
" {'type':'OBJECT_LOCALIZATION', 'model':'builtin/stable'}\n",
"]"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "O4SsQLZtgP0S",
"colab_type": "code",
"colab": {}
},
"source": [
"my_body={\n",
" 'requests': [\n",
" {'image': my_image, 'features': my_features}\n",
" ]\n",
"}"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "odHDnUnLgnCU",
"colab_type": "text"
},
"source": [
"#### Request メッセージの送信"
]
},
{
"cell_type": "code",
"metadata": {
"id": "YUO3SyVT_-cc",
"colab_type": "code",
"colab": {}
},
"source": [
"response = vision_service.images().annotate(body=my_body).execute()"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "ZLhgn3O_lxeo",
"colab_type": "text"
},
"source": [
"#### 検出結果の可視化\n",
"\n",
"オブジェクト検知では、検出されたオブジェクトの名称、矩形位置情報(画像中の座標)とその確度が返ってきます。ここではその情報を使って、画像中の何が検出されたのかを描画して確かめてみましょう。次のセルを実行すると検知したオブジェクトをハイライトする便利関数 ```highlight_objects``` が定義されます。"
]
},
{
"cell_type": "code",
"metadata": {
"id": "3hqK0NGtW2dH",
"colab_type": "code",
"cellView": "form",
"colab": {}
},
"source": [
"#@title 検出したオブジェクトをハイライトする関数 highlight_objects を定義\n",
"\n",
"from PIL import Image, ImageDraw\n",
"\n",
"def highlight_objects(image_file, objects):\n",
" image = Image.open(image_file)\n",
" draw = ImageDraw.Draw(image, \"RGBA\")\n",
" \n",
" width = image.getbbox()[-2]\n",
" height = image.getbbox()[-1]\n",
" \n",
" for object in objects:\n",
" n_vertex_lt = tuple(object['boundingPoly']['normalizedVertices'][0].values())\n",
" n_vertex_rb = tuple(object['boundingPoly']['normalizedVertices'][2].values())\n",
" \n",
" vertex_lt = (int(n_vertex_lt[0] * width), int(n_vertex_lt[1] * height))\n",
" vertex_rb = (int(n_vertex_rb[0] * width), int(n_vertex_rb[1] * height))\n",
" \n",
" # bounding box\n",
" draw.rectangle(xy=(vertex_lt, vertex_rb), outline='red')\n",
" \n",
" # probability\n",
" object['name']\n",
" draw.text(xy=(vertex_lt[0], vertex_lt[1]-10),\n",
" text=object['name'] + ':' + str(format(object['score'], '.3f')),\n",
" fill='red') \n",
" display(image)"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "geNUPM2IXsDL",
"colab_type": "code",
"colab": {}
},
"source": [
"highlight_objects('objects.jpg', response['responses'][0]['localizedObjectAnnotations'])"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "RVUDyr4MdGmQ",
"colab_type": "text"
},
"source": [
"#### Response メッセージの中身を確認\n",
"\n",
"Responseメッセージは、JSONの形式で返ってきます。どんなデータが内包されているのか、確認してみましょう。"
]
},
{
"cell_type": "code",
"metadata": {
"id": "cazp4UksmSlw",
"colab_type": "code",
"colab": {}
},
"source": [
"response['responses'][0]['localizedObjectAnnotations']"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "niWh2mIV5D0a",
"colab_type": "text"
},
"source": [
"### 顔検出\n",
"顔検出では、画像中にある顔の検出、表情からの感情予測を行います。\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "Cyfg32iZ5vyh",
"colab_type": "text"
},
"source": [
"#### Colab 環境に画像を用意する\n",
"\n",
"下記セルを実行すると写真撮影用の関数 `take_photo` が定義される"
]
},
{
"cell_type": "code",
"metadata": {
"id": "fFcrHzt458eR",
"colab_type": "code",
"cellView": "form",
"colab": {}
},
"source": [
"#@title このセルを実行すると、写真撮影用の関数 \"take_photo\" が定義される\n",
"\n",
"from IPython.display import HTML, Image\n",
"from google.colab.output import eval_js\n",
"from base64 import b64decode\n",
"\n",
"VIDEO_HTML = \"\"\"\n",
"<video autoplay\n",
" width=600 height=450></video>\n",
"<script>\n",
"var video = document.querySelector('video')\n",
"navigator.mediaDevices.getUserMedia({ video: true })\n",
" .then(stream=> video.srcObject = stream)\n",
" \n",
"var data = new Promise(resolve=>{\n",
" video.onclick = ()=>{\n",
" var canvas = document.createElement('canvas')\n",
" var [w,h] = [video.offsetWidth, video.offsetHeight]\n",
" canvas.width = w\n",
" canvas.height = h\n",
" canvas.getContext('2d')\n",
" .drawImage(video, 0, 0, w, h)\n",
" video.srcObject.getVideoTracks()[0].stop()\n",
" video.replaceWith(canvas)\n",
" resolve(canvas.toDataURL('image/jpeg', %f))\n",
" }\n",
"})\n",
"</script>\n",
"\"\"\"\n",
"def take_photo(filename='photo.jpg', quality=0.8):\n",
" display(HTML(VIDEO_HTML % quality))\n",
" data = eval_js(\"data\")\n",
" binary = b64decode(data.split(',')[1])\n",
" with open(filename, 'wb') as f:\n",
" f.write(binary)\n",
" return len(binary)"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "aZT3YGYi6BET",
"colab_type": "code",
"colab": {}
},
"source": [
"take_photo(filename='my_face.jpg')"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "8eslL4tb6Qz6",
"colab_type": "text"
},
"source": [
"#### Request メッセージの構築\n",
"ここでは、上で撮影した画像を使います。"
]
},
{
"cell_type": "code",
"metadata": {
"id": "tqLm2iUN6bU0",
"colab_type": "code",
"colab": {}
},
"source": [
"from base64 import b64encode\n",
"\n",
"with open('my_face.jpg', 'rb') as image_file:\n",
" my_image = {\n",
" 'content': b64encode(image_file.read()).decode('utf-8')\n",
" }"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "xntTzeG06dl6",
"colab_type": "code",
"colab": {}
},
"source": [
"my_features = [\n",
" {'type':'FACE_DETECTION', 'model':'builtin/stable'},\n",
"]"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "h-xJGNvq6iDq",
"colab_type": "code",
"colab": {}
},
"source": [
"my_body={\n",
" 'requests': [\n",
" {'image': my_image, 'features': my_features}\n",
" ]\n",
"}"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "CPtSgYjd6jf0",
"colab_type": "text"
},
"source": [
"#### Request メッセージの送信"
]
},
{
"cell_type": "code",
"metadata": {
"id": "g8h30kDs6qRv",
"colab_type": "code",
"colab": {}
},
"source": [
"response = vision_service.images().annotate(body=my_body).execute()"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "DH673J8a6urd",
"colab_type": "text"
},
"source": [
"#### 検出結果の可視化\n",
"ここでも、検出した顔の矩形情報が得られるのでそれを描画してみましょう。次のセルを実行すると検知したオブジェクトをハイライトする便利関数 `highlight_faces` が定義されます。"
]
},
{
"cell_type": "code",
"metadata": {
"id": "zXheXxmx6-nk",
"colab_type": "code",
"cellView": "form",
"colab": {}
},
"source": [
"#@title 検出した顔をハイライトする関数 highlight_faces を定義\n",
"\n",
"from PIL import Image, ImageDraw\n",
"\n",
"def highlight_faces(image_file, faces):\n",
" image = Image.open(image_file)\n",
" draw = ImageDraw.Draw(image, \"RGBA\")\n",
" \n",
" for face in faces:\n",
" vertex_lt = tuple(face['boundingPoly']['vertices'][0].values())\n",
" vertex_rb = tuple(face['boundingPoly']['vertices'][2].values())\n",
" \n",
" # bounding box\n",
" draw.rectangle(xy=(vertex_lt, vertex_rb), outline='red')\n",
" \n",
" # probability\n",
" draw.text(xy=(vertex_lt[0], vertex_lt[1]-10),\n",
" text=str(format(face['detectionConfidence'], '.3f')),\n",
" fill='red') \n",
" display(image)"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "oQoVDoFh7BuD",
"colab_type": "code",
"colab": {}
},
"source": [
"highlight_faces('my_face.jpg', response['responses'][0]['faceAnnotations'])"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "BHJ97o4c8f_G",
"colab_type": "text"
},
"source": [
"表情から読み取れる感情の予測結果表示する。"
]
},
{
"cell_type": "code",
"metadata": {
"id": "1QMFqYB-7IcM",
"colab_type": "code",
"colab": {}
},
"source": [
"face_response = response['responses'][0]['faceAnnotations']\n",
"\n",
"keys = ['angerLikelihood', 'joyLikelihood', 'sorrowLikelihood',\n",
" 'surpriseLikelihood', 'headwearLikelihood']\n",
"\n",
"for key in keys:\n",
" print(key, \"==>\", face_response[0][key])"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "JpZZKscToBWl"
},
"source": [
"### ロゴ検知\n",
"\n",
"ロゴ検知では、画像中にある(有名ブランドなどの)ロゴを検出します。"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "Ac9T8ZpPz8vN",
"colab_type": "text"
},
"source": [
"#### Colab 環境への画像アップロード\n",
"\n",
"次のセルを実行して、ローカルに保存された googleplex.jpg をアップロードします。"
]
},
{
"cell_type": "code",
"metadata": {
"id": "dINKXDES0YZP",
"colab_type": "code",
"colab": {}
},
"source": [
"from google.colab import files\n",
"uploaded = files.upload()"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "RKz03uDT0cqm",
"colab_type": "text"
},
"source": [
"再アップロードすると、ファイルは上書きではなく別ファイル名で保存されます。ファイルを消したい場合は以下のコマンドを実行します。"
]
},
{
"cell_type": "code",
"metadata": {
"id": "jVPsFFn00e3G",
"colab_type": "code",
"colab": {}
},
"source": [
"# !rm ./googleplex.jpg"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "sYtn3ZVFoBWm"
},
"source": [
"#### Request メッセージを構築してみよう\n",
"ここでは、 **googleplex.jpg** の画像を使います。Requestのしかたは先ほどと同じで、画像データと `type`が異なるだけです。"
]
},
{
"cell_type": "code",
"metadata": {
"colab_type": "code",
"id": "7qewKSHQoBWn",
"colab": {}
},
"source": [
"from base64 import b64encode\n",
"\n",
"with open('googleplex.jpg', 'rb') as image_file:\n",
" my_image = {\n",
" 'content': b64encode(image_file.read()).decode('utf-8')\n",
" }"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"colab_type": "code",
"id": "ZCYBWJkLoBWp",
"colab": {}
},
"source": [
"my_features = [\n",
" {'type':'LOGO_DETECTION', 'model':'builtin/stable'}\n",
"]"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"colab_type": "code",
"id": "lMz2F64RoBWq",
"colab": {}
},
"source": [
"my_body={\n",
" 'requests': [\n",
" {'image': my_image, 'features': my_features}\n",
" ]\n",
"}"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "8YOYOEsWoBWs"
},
"source": [
"#### Request メッセージの送信"
]
},
{
"cell_type": "code",
"metadata": {
"colab_type": "code",
"id": "bxcARlIooBWt",
"colab": {}
},
"source": [
"response = vision_service.images().annotate(body=my_body).execute()"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "ehUCOj1ioBWv"
},
"source": [
"#### 検出結果を可視化しよう\n",
"ここでも、検出したロゴの矩形情報が得られるのでそれを描画してみましょう。次のセルを実行すると検知したオブジェクトをハイライトする便利関数 `highlight_logos` が定義されます。"
]
},
{
"cell_type": "code",
"metadata": {
"colab_type": "code",
"cellView": "form",
"id": "f1DteF_AoBWw",
"colab": {}
},
"source": [
"#@title 検出したオブジェクトをハイライトする関数 highlight_logos を定義\n",
"\n",
"from PIL import Image, ImageDraw\n",
"\n",
"def highlight_logos(image_file, objects):\n",
" image = Image.open(image_file)\n",
" draw = ImageDraw.Draw(image, \"RGBA\")\n",
" \n",
" for obj in objects:\n",
" vertex_lt = tuple(obj['boundingPoly']['vertices'][0].values())\n",
" vertex_rb = tuple(obj['boundingPoly']['vertices'][2].values())\n",
" \n",
" # bounding box\n",
" draw.rectangle(xy=(vertex_lt, vertex_rb), outline='red')\n",
" \n",
" # probability\n",
" obj['description']\n",
" draw.text(xy=(vertex_lt[0], vertex_lt[1]-10),\n",
" text=obj['description'] + ':' + str(format(obj['score'], '.3f')),\n",
" fill='red') \n",
" display(image)"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"colab_type": "code",
"id": "wYBCOmiKoBWx",
"colab": {}
},
"source": [
"highlight_logos('googleplex.jpg', response['responses'][0]['logoAnnotations'])"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "O6ZKISFloBW1"
},
"source": [
"#### Response メッセージの中身を確認\n",
"\n",
"オブジェクト検知とほぼ同じ形式です。`mid`とは、[Google Knowledge Graph](https://en.wikipedia.org/wiki/Knowledge_Graph)のIDです。"
]
},
{
"cell_type": "code",
"metadata": {
"colab_type": "code",
"id": "p6ekawcZoBW_",
"colab": {}
},
"source": [
"response['responses'][0]['logoAnnotations']"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "zYtWMAyro6iC"
},
"source": [
"### ランドマーク検知\n",
"ランドマーク検知では、有名な観光地や場所を特定することができます。"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "wtxvNpt_1NK_",
"colab_type": "text"
},
"source": [
"#### Colab 環境への画像アップロード\n",
"\n",
"次のセルを実行して、ローカルに保存された osaka.jpg をアップロードします。"
]
},
{
"cell_type": "code",
"metadata": {
"id": "RApuunnn1X8A",
"colab_type": "code",
"colab": {}
},
"source": [
"from google.colab import files\n",
"uploaded = files.upload()"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "2r4d8wkF1cVf",
"colab_type": "text"
},
"source": [
"再アップロードすると、ファイルは上書きではなく別ファイル名で保存されます。ファイルを消したい場合は以下のコマンドを実行します。"
]
},
{
"cell_type": "code",
"metadata": {
"id": "AQ6-PM421e2n",
"colab_type": "code",
"colab": {}
},
"source": [
"!rm ./osaka.jpg"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "NGlpjteyo6iD"
},
"source": [
"#### Request メッセージの構築"
]
},
{
"cell_type": "code",
"metadata": {
"colab_type": "code",
"id": "lfOACs6Eo6iE",
"colab": {}
},
"source": [
"from base64 import b64encode\n",
"\n",
"with open('osaka.jpg', 'rb') as image_file:\n",
" my_image = {\n",
" 'content': b64encode(image_file.read()).decode('utf-8')\n",
" }"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"colab_type": "code",
"id": "bZVbXbn7o6iK",
"colab": {}
},
"source": [
"my_features = [\n",
" {'type':'LANDMARK_DETECTION', 'model':'builtin/stable'}\n",
"]"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"colab_type": "code",
"id": "mMgApzGao6iN",
"colab": {}
},
"source": [
"my_body={\n",
" 'requests': [\n",
" {'image': my_image, 'features': my_features}\n",
" ]\n",
"}"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "1tXCiGJ2o6iP"
},
"source": [
"#### Request メッセージの送信"
]
},
{
"cell_type": "code",
"metadata": {
"colab_type": "code",
"id": "Vll8EhhIo6iQ",
"colab": {}
},
"source": [
"response = vision_service.images().annotate(body=my_body).execute()"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "HNfYoxwUo6iT"
},
"source": [
"#### 検出結果の可視化\n",
"ここでも、検出したランドマークの矩形情報が得られるのでそれを描画してみましょう。次のセルを実行すると検知したオブジェクトをハイライトする便利関数 `highlight_landmarks` が定義されます。"
]
},
{
"cell_type": "code",
"metadata": {
"colab_type": "code",
"cellView": "form",
"id": "Mz4fyFaFo6iU",
"colab": {}
},
"source": [
"#@title 検出したオブジェクトをハイライトする関数 highlight_landmarks を定義\n",
"\n",
"from PIL import Image, ImageDraw\n",
"\n",
"def highlight_landmarks(image_file, objects):\n",
" image = Image.open(image_file)\n",
" draw = ImageDraw.Draw(image, \"RGBA\")\n",
" \n",
" for obj in objects:\n",
" vertex_lt = tuple(obj['boundingPoly']['vertices'][0].values())\n",
" vertex_rb = tuple(obj['boundingPoly']['vertices'][2].values())\n",
" \n",
" # bounding box\n",
" draw.rectangle(xy=(vertex_lt, vertex_rb), outline='red')\n",
" \n",
" # probability\n",
" obj['description']\n",
" draw.text(xy=(vertex_lt[0], vertex_lt[1]-10),\n",
" text=obj['description'] + ':' + str(format(obj['score'], '.3f')),\n",
" fill='red') \n",
" display(image)"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"colab_type": "code",
"id": "ikE_9ONZo6iW",
"colab": {}
},
"source": [
"highlight_landmarks('osaka.jpg', response['responses'][0]['landmarkAnnotations'])"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "OJP4MAlPC6RQ",
"colab_type": "text"
},
"source": [
"#### プラス One! \n",
"Maps APIを使って検知したランドマークの位置を地図に表示しよう。地図の表示は [Google Maps Static API](https://console.cloud.google.com/apis/library/static-maps-backend.googleapis.com) を使っているので、別途 API を有効にする必要があります。"
]
},
{
"cell_type": "code",
"metadata": {
"id": "n6RxbWQM_K-F",
"colab_type": "code",
"colab": {}
},
"source": [
"from IPython.core.display import HTML\n",
"\n",
"latlng = response['responses'][0]['landmarkAnnotations'][0]['locations'][0]['latLng']\n",
"\n",
"html = \"\"\"\n",
"<img src=\"https://maps.googleapis.com/maps/api/staticmap?center={},{}&zoom=14&size=680x300&key={}\">\n",
"\"\"\".format(latlng['latitude'], latlng['longitude'], APIKEY)\n",
"display(HTML(html))"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "nXoa_PTKo6iY"
},
"source": [
"#### Response メッセージの中身を確認"
]
},
{
"cell_type": "code",
"metadata": {
"colab_type": "code",
"id": "cA8FNuBGo6ir",
"colab": {}
},
"source": [
"response['responses'][0]['landmarkAnnotations']"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "kCKosft1p0Ty",
"colab_type": "text"
},
"source": [
"#### Tips\n",
"\n",
"複数の検知を一度にリクエストすることもできます。"
]
},
{
"cell_type": "code",
"metadata": {
"id": "mxG3Kheap2SM",
"colab_type": "code",
"colab": {}
},
"source": [
"# Requestのmy_features部分を以下のように複数指定する\n",
"my_features = [\n",
" {'type':'OBJECT_LOCALIZATION', 'model':'builtin/stable'},\n",
" {'type':'LOGO_DETECTION', 'model':'builtin/stable'},\n",
" {'type':'LANDMARK_DETECTION', 'model':'builtin/stable'}\n",
"]"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "lceRBq1L33g5",
"colab_type": "text"
},
"source": [
"## Cloud Video Intelligence API を使ってみよう !\n",
"\n",
"**Cloud Video Intelligence API** を使えば動画の中にある物体やシーンの切り替えなどを検知することができます。"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "LC7ibzCVbqWf",
"colab_type": "text"
},
"source": [
"### Cloud Video Intelligence API の準備\n",
"API Discovery Service を利用して **Cloud Video Intelligence API** を発見します。 "
]
},
{
"cell_type": "code",
"metadata": {
"id": "AP2o8eDwbpGo",
"colab_type": "code",
"colab": {}
},
"source": [
"import time\n",
"from googleapiclient.discovery import build\n",
"\n",
"video_service = build('videointelligence', 'v1', developerKey=APIKEY)"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "tJDkpudSrnYN",
"colab_type": "text"
},
"source": [
"### 入力動画を確認しよう"
]
},
{
"cell_type": "code",
"metadata": {
"id": "nB619HzZCVll",
"colab_type": "code",
"cellView": "form",
"colab": {}
},
"source": [
"#@title 動画を録画するための record_video を定義\n",
"\n",
"# Install required libraries and packages\n",
"!apt-get -qq update\n",
"!apt-get -qq install -y ffmpeg\n",
"!pip install ffmpeg-python\n",
"\n",
"# Define record_audio\n",
"import base64\n",
"import ffmpeg\n",
"import subprocess\n",
"import google.colab\n",
"from io import BytesIO\n",
"\n",
"def record_video(file_id):\n",
" # Record webm file from Colaboratory.\n",
" video = google.colab._message.blocking_request(\n",
" 'user_media',\n",
" {\n",
" 'audio': True,\n",
" 'video': True,\n",
" 'duration': -1\n",
" },\n",
" timeout_sec=600)\n",
"\n",
" # Convert web file into in_memory file.\n",
" mfile = BytesIO(base64.b64decode(video[video.index(',')+1:]))\n",
"\n",
" # Store webm file locally.\n",
" print('Generating {}.webm'.format(file_id))\n",
"\n",
" with open('{0}.webm'.format(file_id), 'wb') as f:\n",
" mfile.seek(0)\n",
" f.write(mfile.read())\n",
"\n",
" print('Converting {0}.webm to {0}.mp4'.format(file_id))\n",
" !ffmpeg -y -i {file_id}.webm -r 29.97 {file_id}.mp4 \n",
"\n",
" print('Done')"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "LwVpO1u3I5tP",
"colab_type": "code",
"cellView": "form",
"colab": {}
},
"source": [
"# @title 動画を再生するための便利関数 `resume_video()` を定義\n",
"import io\n",
"import base64\n",
"from IPython.display import HTML\n",
"\n",
"def resume_video(path_to_mp4):\n",
" video = io.open(path_to_mp4, 'rb').read()\n",
" encoded = base64.b64encode(video)\n",
" return HTML(data=\"\"\"\n",
" <video width=\"640\" height=\"360\" controls>\n",
" <source src=\"data:video/mp4;base64,{0}\" type=\"video/mp4\" />\n",
" </video>\"\"\".format(encoded.decode('ascii')))"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "letSd1_kCjbj",
"colab_type": "code",
"colab": {}
},
"source": [
"# 動画の録画\n",
"record_video('sample')"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "-SABWY_MllQb",
"colab_type": "code",
"colab": {}
},
"source": [
"resume_video('sample.mp4')"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "w-TwW2Zgaexp",
"colab_type": "text"
},
"source": [
"### Video Intelligence API の結果を取得する"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "ukao_ygiegu5",
"colab_type": "text"
},
"source": [
"#### Video Intelligence API の実行オプション"
]
},
{
"cell_type": "code",
"metadata": {
"id": "wfWrHcLhIyii",
"colab_type": "code",
"cellView": "form",
"colab": {}
},
"source": [
"# @title 動画ファイルを読み込み base64 エンコードした結果を取得\n",
"\n",
"from base64 import b64encode\n",
"\n",
"path_to_video = \"sample.mp4\" #@param {type:\"string\"}\n",
"with open(path_to_video, 'rb') as video_file:\n",
" input_content = b64encode(video_file.read()).decode()"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "jZ5GHjQKxf_y",
"colab_type": "code",
"cellView": "form",
"colab": {}
},
"source": [
"# @title リージョン指定 (input_uri の Read Permission に注意)\n",
"\n",
"location_id = \"us-east1\" #@param [\"us-east1\", \"us-west1\", \"europe-west1\", \"asia-east1\"]"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "FD-QDCYk5DWK",
"colab_type": "code",
"cellView": "form",
"colab": {}
},
"source": [
"# @title 有効にする機能を選択\n",
"\n",
"class Features():\n",
" def __init__(self):\n",
" pass\n",
" \n",
" def get(self):\n",
" return [k for k, v in self.__dict__.items() if v == True]\n",
" \n",
"features = Features()\n",
"features.FEATURE_UNSPECIFIED = False #@param {type:\"boolean\"}\n",
"features.LABEL_DETECTION = True #@param {type:\"boolean\"}\n",
"features.SHOT_CHANGE_DETECTION = False #@param {type:\"boolean\"}\n",
"features.EXPLICIT_CONTENT_DETECTION = False #@param {type:\"boolean\"}\n",
"features.SPEECH_TRANSCRIPTION = False #@param {type:\"boolean\"}\n",
"features.TEXT_DETECTION = True #@param {type:\"boolean\"}\n",
"features.OBJECT_TRACKING = True #@param {type:\"boolean\"}"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "ydhsnbCGwJ-Q",
"colab_type": "code",
"cellView": "form",
"colab": {}
},
"source": [
"# @title ラベルの分析 (LABEL_DETECTION) または オブジェクト追跡 (OBJECT_TRACKING) 用の設定\n",
"\n",
"class LabelDetectionConfig():\n",
" def __init__(self):\n",
" pass\n",
" def get(self):\n",
" return self.__dict__\n",
" \n",
"label_detection_config = LabelDetectionConfig()\n",
"label_detection_config.labelDetectionMode = \"FRAME_MODE\" #@param [\"LABEL_DETECTION_MODE_UNSPECIFIED\", \"SHOT_MODE\", \"FRAME_MODE\", \"SHOT_AND_FRAME_MODE\"]\n",
"label_detection_config.stationaryCamera = False #@param {type: \"boolean\"}\n",
"label_detection_config.model = \"builtin/stable\" #@param [\"builtin/stable\", \"builtin/latest\"]\n",
"label_detection_config.frameConfidenceThreshold = 0.7 #@param {type:\"slider\", min:0.1, max:0.9, step:0.1}\n",
"label_detection_config.videoConfidenceThreshold = 0.7 #@param {type:\"slider\", min:0.1, max:0.9, step:0.1}"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "UVkUAZuucf35",
"colab_type": "code",
"cellView": "form",
"colab": {}
},
"source": [
"# @title ショット変更の分析 (SHOT_CHANGE_DETECTION) 用の設定\n",
"\n",
"class ShotChangeDetectionConfig():\n",
" def __init__(self):\n",
" pass\n",
" def get(self):\n",
" return self.__dict__\n",
" \n",
"shot_change_detection_config = ShotChangeDetectionConfig()\n",
"shot_change_detection_config.model = \"builtin/stable\" #@param [\"builtin/stable\", \"builtin/latest\"]"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "7KOTO1XVeWcZ",
"colab_type": "code",
"cellView": "form",
"colab": {}
},
"source": [
"# @title 不適切なコンテンツの分析 (EXPLICIT_CONTENT_DETECTION) 用の設定\n",
"\n",
"class ExplicitContentDetectionConfig():\n",
" def __init__(self):\n",
" pass\n",
" def get(self):\n",
" return self.__dict__\n",
" \n",
"explicit_content_detection_config = ExplicitContentDetectionConfig()\n",
"explicit_content_detection_config.model = \"builtin/stable\" #@param [\"builtin/stable\", \"builtin/latest\"]"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "3KIe-GC500ds",
"colab_type": "code",
"cellView": "form",
"colab": {}
},
"source": [
"# @title 音声文字変換 (SPEECH_TRANSCRIPTION) 用の設定\n",
"\n",
"class SpeechTranscriptionConfig():\n",
" def __init__(self):\n",
" pass\n",
" def get(self):\n",
" return self.__dict__\n",
"\n",
"speech_transcription_config = SpeechTranscriptionConfig()\n",
"speech_transcription_config.languageCode = \"en_US\" #@param [\"ja_JP\", \"en_US\"]\n",
"speech_transcription_config.maxAlternatives = 1 #@param {type:\"slider\", min:0, max:30, step:1}\n",
"speech_transcription_config.filterProfanity = False #@param {type: \"boolean\"}\n",
"speech_transcription_config.enableAutomaticPunctuation = False #@param {type: \"boolean\"}\n",
"speech_transcription_config.enableSpeakerDiarization = False #@param {type: \"boolean\"}\n",
"speech_transcription_config.enableWordConfidence = False #@param {type: \"boolean\"}"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "yZC2y5BB00gH",
"colab_type": "code",
"cellView": "form",
"colab": {}
},
"source": [
"# @title テキスト検出 (TEXT_DETECTION) 用の設定\n",
"\n",
"class TextDetectionConfig():\n",
" def __init__(self):\n",
" pass\n",
" def get(self):\n",
" return self.__dict__\n",
" \n",
"class LanguageHints():\n",
" def __init__(self):\n",
" pass\n",
" def get(self):\n",
" return [k for k, v in self.__dict__.items() if v == True]\n",
" \n",
"text_detection_config = TextDetectionConfig()\n",
"\n",
"\n",
"language_hints = LanguageHints()\n",
"language_hints.ja_JP = True #@param {type: \"boolean\"}\n",
"language_hints.en_US = False #@param {type: \"boolean\"}\n",
"\n",
"text_detection_config.languageHints = language_hints.get()"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "yTHojUr900m3",
"colab_type": "code",
"cellView": "form",
"colab": {}
},
"source": [
"#@title request を構成するための関数 `create_request()` を定義\n",
"\n",
"class Request():\n",
" def __init__(self):\n",
" pass\n",
" \n",
" def get(self):\n",
" return self.__dict__\n",
" \n",
"\n",
"class VideoContext():\n",
" def __init__(self):\n",
" pass\n",
" \n",
" def get(self):\n",
" return self.__dict__\n",
"\n",
"def create_video_context():\n",
" vc = VideoContext()\n",
" list_features = features.get()\n",
" if 'LABEL_DETECTION' in list_features or 'OBJECT_TRACKING' in list_features:\n",
" vc.labelDetectionConfig = label_detection_config.get()\n",
" if 'SHOT_CHANGE_DETECTION' in list_features:\n",
" vc.shotChangeDetectionConfig = shot_change_detection_config.get()\n",
" if 'EXPLICIT_CONTENT_DETECTION' in list_features:\n",
" vc.explicitContentDetectionConfig = explicit_content_detection_config.get()\n",
" if 'SPEECH_TRANSCRIPTION' in list_features:\n",
" vc.speechTranscriptionConfig = speech_transcription_config.get()\n",
" if 'TEXT_DETECTION' in list_features:\n",
" vc.textDetectionConfig = text_detection_config.get()\n",
" return vc.__dict__\n",
"\n",
"def create_request():\n",
" request = Request()\n",
" request.inputContent = input_content\n",
" request.features = features.get()\n",
" request.videoContext = create_video_context()\n",
" request.locationId = location_id\n",
" return request.__dict__"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "y_nlyVkKS1RM",
"colab_type": "text"
},
"source": [
"#### Request メッセージの作成"
]
},
{
"cell_type": "code",
"metadata": {
"id": "DODrLEToAf8D",
"colab_type": "code",
"colab": {}
},
"source": [
"create_request()"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "hrc7HffP4s5Q"
},
"source": [
"#### Response メッセージの取得"
]
},
{
"cell_type": "code",
"metadata": {
"id": "WNtou1fOCYXT",
"colab_type": "code",
"colab": {}
},
"source": [
"import time\n",
"\n",
"response = video_service.videos().annotate(body=create_request()).execute()\n",
"\n",
"while(True):\n",
" output = video_service.projects().locations().operations().get(name=response['name']).execute()\n",
" try:\n",
" if output['done']:\n",
" break\n",
" except KeyError:\n",
" print(output['metadata'])\n",
" time.sleep(10)\n",
" \n",
"video_intelligence_res = output"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "Og_HRE254xBH",
"colab_type": "text"
},
"source": [
"### オブジェクトトラッキング"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "en-GamGnTBL3",
"colab_type": "text"
},
"source": [
"#### オブジェクトトラッキングの結果を反映した動画を作成する"
]
},
{
"cell_type": "code",
"metadata": {
"id": "TaJ7ZYizal7z",
"colab_type": "code",
"cellView": "form",
"colab": {}
},
"source": [
"# @title オブジェクトトラッキングの結果を動画にオーバーレイする `create_annotated_video()` を定義\n",
"\n",
"!wget -N https://noto-website-2.storage.googleapis.com/pkgs/NotoSansCJKjp-hinted.zip\n",
"!unzip -o NotoSansCJKjp-hinted.zip -d notosans\n",
"\n",
"import cv2\n",
"import numpy as np\n",
"from collections import defaultdict\n",
"from decimal import Decimal, ROUND_HALF_UP, ROUND_HALF_EVEN\n",
"from PIL import Image, ImageDraw, ImageFont\n",
"\n",
"def get_image_from_array(img_arr_bgr, width, height):\n",
" img_arr_rgb = cv2.cvtColor(img_arr_bgr, cv2.COLOR_BGR2RGB)\n",
" img = Image.fromarray(img_arr_rgb.astype('uint8')) \n",
" img = img.resize((width, height), Image.ANTIALIAS)\n",
" return img\n",
"\n",
"def get_frames_from_video(path_to_file, width, height):\n",
" \"\"\"\n",
" Args:\n",
" path_to_file: path to a video file.\n",
" \"\"\"\n",
" ret = {}\n",
" \n",
" cap = cv2.VideoCapture(path_to_file) \n",
" curr_msec = 0.\n",
" prev_msec = 0.\n",
" time_resolution_msec = 0\n",
" \n",
" ret['frames'] = []\n",
" while(cap.isOpened()):\n",
" success, frame = cap.read()\n",
" if success:\n",
" curr_msec = cap.get(cv2.CAP_PROP_POS_MSEC)\n",
" diff_msec = curr_msec - prev_msec\n",
" prev_msec = curr_msec\n",
"\n",
" diff_msec = Decimal(diff_msec).quantize(\n",
" Decimal('.001'), rounding=ROUND_HALF_UP)\n",
" \n",
" if time_resolution_msec < diff_msec:\n",
" time_resolution_msec = diff_msec\n",
" \n",
" frame_dict = {}\n",
" frame_dict['image'] = get_image_from_array(frame, width, height)\n",
" ret['frames'].append(frame_dict)\n",
" else:\n",
" break\n",
" \n",
" cap.release()\n",
" \n",
" ret['time_resolution_msec'] = time_resolution_msec\n",
" ret['width'] = width\n",
" ret['height'] = height\n",
" return ret\n",
"\n",
"def convert_secstr_to_intmsec(str_val):\n",
" return int(float(str_val.replace('s', '')) * 1000)\n",
"\n",
"def merge_object_annotations(base_dict, object_annotations, annotate_fps=8):\n",
" time_resolution_msec = base_dict['time_resolution_msec']\n",
" entities = []\n",
"\n",
" # insert bboxes.\n",
" for i, object_annotation in enumerate(object_annotations):\n",
" if not 'description' in object_annotation['entity']:\n",
" continue\n",
"\n",
" confidence = object_annotation['confidence']\n",
" \n",
" entity = '{:03}_{}'.format(i, object_annotation['entity']['description'])\n",
" entities.append(entity)\n",
" \n",
" for anno_frame in object_annotation['frames']:\n",
" bbox = anno_frame['normalizedBoundingBox']\n",
" time_offset_msec = convert_secstr_to_intmsec(anno_frame['timeOffset'])\n",
" \n",
" index = int(time_offset_msec / time_resolution_msec)\n",
" base_frame_dict = base_dict['frames'][index]\n",
" \n",
" if 'object_annotations' not in base_frame_dict:\n",
" base_frame_dict['object_annotations'] = {}\n",
" \n",
" if entity not in base_frame_dict['object_annotations']:\n",
" base_frame_dict['object_annotations'][entity] = {}\n",
" base_entity_dict = base_frame_dict['object_annotations'][entity]\n",
" base_entity_dict['bbox'] = bbox\n",
" base_entity_dict['confidence'] = confidence\n",
" base_entity_dict['time_offset'] = time_offset_msec\n",
"\n",
"\n",
" interpolate_object_annotations(base_dict, entities, annotate_fps)\n",
" return\n",
"\n",
"def interpolate_object_annotations(base_dict, entities, annotate_fps=8):\n",
" # interpolate bboxes between key frames.\n",
" max_annotation_duration = 1000 // annotate_fps\n",
"\n",
" for entity in entities:\n",
" cache = None\n",
" annotation_duration = 0\n",
"\n",
" for base_frame_dict in base_dict['frames']:\n",
" \n",
" if 'object_annotations_i' not in base_frame_dict:\n",
" base_frame_dict['object_annotations_i'] = {}\n",
" \n",
" if 'object_annotations' in base_frame_dict:\n",
" if entity in base_frame_dict['object_annotations']:\n",
" cache = base_frame_dict['object_annotations'][entity]\n",
" annotation_duration = 0\n",
"\n",
" base_frame_dict['object_annotations_i'][entity] = cache\n",
" \n",
" if annotation_duration < max_annotation_duration:\n",
" annotation_duration += int(base_dict['time_resolution_msec'])\n",
" else:\n",
" cache = None\n",
" return\n",
"\n",
"def create_annotated_images(base_dict):\n",
" width = base_dict['width']\n",
" height = base_dict['height']\n",
" fnt = ImageFont.truetype('notosans/NotoSansCJKjp-Regular.otf', 10)\n",
" \n",
" for frame in base_dict['frames']:\n",
" image = frame['image']\n",
" draw = ImageDraw.Draw(image, \"RGBA\")\n",
" \n",
" object_annotations = frame['object_annotations_i']\n",
" for entity, entity_info in object_annotations.items():\n",
" if entity_info:\n",
" confidence = entity_info['confidence']\n",
" bbox = entity_info['bbox']\n",
" \n",
" try:\n",
" t = bbox['top']\n",
" l = bbox['left']\n",
" r = bbox['right']\n",
" b = bbox['bottom']\n",
" except(KeyError):\n",
" continue\n",
"\n",
" vertex_lt = (int(l * base_dict['width']), int(t * base_dict['height']))\n",
" vertex_rb = (int(r * base_dict['width']), int(b * base_dict['height']))\n",
" \n",
" \n",
" draw.rectangle(xy=(vertex_lt, vertex_rb), outline='red')\n",
" draw.text(\n",
" xy=(vertex_lt[0], vertex_lt[1]-10),\n",
" text='{}'.format(entity) + ':' + str(format(confidence, '.3f')),\n",
" fill='red', font=fnt)\n",
" \n",
" frame['image_annotated'] = image\n",
" return\n",
"\n",
"def create_annotated_video(\n",
" path_to_input, object_annotations, path_to_output='output.avi',\n",
" width=640, height=480):\n",
" \n",
" base = get_frames_from_video(path_to_input, width=width, height=height)\n",
" merge_object_annotations(base, object_annotations)\n",
" create_annotated_images(base)\n",
" \n",
" \n",
" fourcc = cv2.VideoWriter_fourcc(*'XVID')\n",
" framerate = 1000 // base['time_resolution_msec']\n",
" video_dims = (base['width'], base['height'])\n",
" video = cv2.VideoWriter(path_to_output, fourcc, framerate, video_dims)\n",
"\n",
" for frame in base['frames']:\n",
" img_annotated = frame['image_annotated']\n",
" video.write(cv2.cvtColor(np.array(img_annotated), cv2.COLOR_RGB2BGR))\n",
"\n",
" video.release()"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "No3VRxQqtd2r",
"colab_type": "code",
"colab": {}
},
"source": [
"key = 'objectAnnotations'\n",
"\n",
"for e in video_intelligence_res['response']['annotationResults']:\n",
" if key in list(e.keys()):\n",
" object_annotations = e['objectAnnotations']"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "crXuU5SYxwoI",
"colab_type": "code",
"colab": {}
},
"source": [
"create_annotated_video(\n",
" path_to_input=path_to_video,\n",
" object_annotations=object_annotations,\n",
" path_to_output='output.avi')"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "ZUFRQaJux7k1",
"colab_type": "code",
"cellView": "both",
"colab": {}
},
"source": [
"!ffmpeg -y -i output.avi -c:v libx264 -crf 19 -preset slow -c:a libfdk_aac -b:a 192k -ac 2 output.mp4"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "3QZBvBTnUCgm",
"colab_type": "text"
},
"source": [
"#### オブジェクトトラッキングの結果を確認する"
]
},
{
"cell_type": "code",
"metadata": {
"id": "sTstSx6Wx-zY",
"colab_type": "code",
"colab": {}
},
"source": [
"resume_video('output.mp4')"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "g52NOQInZoyA",
"colab_type": "text"
},
"source": [
"### 音声認識"
]
},
{
"cell_type": "code",
"metadata": {
"id": "AR6mfeR8P4AD",
"colab_type": "code",
"cellView": "form",
"colab": {}
},
"source": [
"# @title 音声認識結果を出力する便利関数を定義\n",
"import textwrap\n",
"\n",
"def print_speech_transcript(response, limit=10):\n",
" num_print = 0\n",
" for e in response['response']['annotationResults']:\n",
" if 'speechTranscriptions' in list(e.keys()):\n",
" for t in e['speechTranscriptions']:\n",
" for b in textwrap.wrap(t['alternatives'][0]['transcript'], 70):\n",
" print(b)\n",
" num_print += 1\n",
" if num_print > limit:\n",
" return \n",
"\n",
"def print_speech_word(response, limit=10):\n",
" num_print = 0\n",
" for e in response['response']['annotationResults']:\n",
" if 'speechTranscriptions' in list(e.keys()):\n",
" for t in e['speechTranscriptions']:\n",
" for o in t['alternatives'][0]['words']:\n",
" print('{:>7}-{:>7}: {}'.format(o['startTime'], o['endTime'], o['word']))\n",
" num_print += 1\n",
" if num_print > limit:\n",
" return\n",
"\n",
"def print_speech_sentence(response, limit=10):\n",
" num_print = 0\n",
" for e in response['response']['annotationResults']:\n",
" if 'speechTranscriptions' in list(e.keys()):\n",
" for t in e['speechTranscriptions']:\n",
" start_time = None\n",
" words = []\n",
" for o in t['alternatives'][0]['words']:\n",
" if words == []:\n",
" start_time = o['startTime']\n",
"\n",
" words.append(o['word'])\n",
"\n",
" if '.' == o['word'][-1]:\n",
" print('{:>7}-{:>7} {}'.format(\n",
" start_time, o['endTime'], ' '.join(words)))\n",
" words = []\n",
" start_time = None\n",
" num_print += 1\n",
"\n",
" if num_print > limit:\n",
" return"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "h83QxnCDQ9Zd",
"colab_type": "text"
},
"source": [
"#### 単語レベルのタイムスタンプを表示する"
]
},
{
"cell_type": "code",
"metadata": {
"id": "VTuwdfFusLeh",
"colab_type": "code",
"colab": {}
},
"source": [
"print_speech_word(video_intelligence_res, limit=10)"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "IrGRKq91RBcj",
"colab_type": "text"
},
"source": [
"#### センテンスレベルのタイムスタンプを表示する"
]
},
{
"cell_type": "code",
"metadata": {
"id": "Iai34MrQOBIj",
"colab_type": "code",
"colab": {}
},
"source": [
"print_speech_sentence(video_intelligence_res, limit=10)"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "yhlxKts3kKAJ",
"colab_type": "text"
},
"source": [
"### ラベルの分析"
]
},
{
"cell_type": "code",
"metadata": {
"id": "e55jDD8VkEyi",
"colab_type": "code",
"colab": {}
},
"source": [
"key = 'frameLabelAnnotations'\n",
"\n",
"frame_label_annotations = None\n",
"\n",
"for e in video_intelligence_res['response']['annotationResults']:\n",
" if key in list(e.keys()):\n",
" frame_label_annotations = e[key]"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "8j6eVAG5kY6Z",
"colab_type": "code",
"colab": {}
},
"source": [
"for i, annotation in enumerate(frame_label_annotations):\n",
" entity = annotation['entity']['description']\n",
" print('{:>20} {}'.format(entity, annotation['frames']))\n",
" if i > 10:\n",
" break"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "W1pf0qzVfYKC",
"colab_type": "text"
},
"source": [
"### テキスト検出 (OCR)"
]
},
{
"cell_type": "code",
"metadata": {
"id": "p_cl0d0IfFvH",
"colab_type": "code",
"colab": {}
},
"source": [
"key = 'textAnnotations'\n",
"\n",
"text_annotations = None\n",
"\n",
"for e in video_intelligence_res['response']['annotationResults']:\n",
" if key in list(e.keys()):\n",
" text_annotations = e[key]"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "-5ca1ianhiHB",
"colab_type": "code",
"colab": {}
},
"source": [
"num_print = 0\n",
"\n",
"for t in text_annotations:\n",
" text = t['text']\n",
" s = t['segments'][0]['segment']\n",
" confidence = t['segments'][0]['confidence']\n",
" if confidence > 0.9:\n",
" print('{:>11} - {:>11}: {}'.format(s['startTimeOffset'], s['endTimeOffset'], text))\n",
" num_print += 1\n",
" if num_print > 10:\n",
" break"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "ko1cS32OfIpL",
"colab_type": "text"
},
"source": [
"### ショット変更の分析"
]
},
{
"cell_type": "code",
"metadata": {
"id": "pCQ6lVXPgrnZ",
"colab_type": "code",
"cellView": "both",
"colab": {}
},
"source": [
"#@title ショット分析結果を出力する便利関数を定義\n",
"\n",
"def print_shots(response, limit=10):\n",
" key = 'shotAnnotations'\n",
"\n",
" shot_annotations = None\n",
" for e in response['response']['annotationResults']:\n",
" if key in list(e.keys()):\n",
" shot_annotations = e[key]\n",
"\n",
" for i, e in enumerate(shot_annotations):\n",
" print('{:>11}-{:>11}: Scene-{:03}'.format(\n",
" e['startTimeOffset'], e['endTimeOffset'], i))\n",
" if i > limit:\n",
" return"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "X-sgdOCkg8Jn",
"colab_type": "code",
"colab": {}
},
"source": [
"print_shots(video_intelligence_res, limit=10)"
],
"execution_count": 0,
"outputs": []
}
]
}