machine_learning/cloud_ai_building_blocks/conversation_ja.ipynb (771 lines of code) (raw):

{ "nbformat": 4, "nbformat_minor": 0, "metadata": { "colab": { "name": "conversation_ja", "version": "0.3.2", "provenance": [], "collapsed_sections": [ "z7mItnhNdgxa", "NU_4EPm-QW4C", "Qn-7PYaIZ-s2", "tXTdM1JOfo2j", "xyM-Oup63qHz", "rZbO9TlTEDG-", "GFZWR2SwRTzR", "skdOgoltb8Ok" ], "include_colab_link": true }, "kernelspec": { "name": "python3", "display_name": "Python 3" } }, "cells": [ { "cell_type": "markdown", "metadata": { "id": "view-in-github", "colab_type": "text" }, "source": [ "<a href=\"https://colab.research.google.com/github/GoogleCloudPlatform/gcp-getting-started-lab-jp/blob/master/machine_learning/cloud_ai_building_blocks/conversation_ja.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>" ] }, { "cell_type": "markdown", "metadata": { "id": "VANRk9XHQCdx", "colab_type": "text" }, "source": [ "\n", "--- \n", "```\n", "Copyright 2019 Google LLC\n", "\n", "Licensed under the Apache License, Version 2.0 (the \"License\");\n", "you may not use this file except in compliance with the License.\n", "You may obtain a copy of the License at\n", "\n", " https://www.apache.org/licenses/LICENSE-2.0\n", "\n", "Unless required by applicable law or agreed to in writing, software\n", "distributed under the License is distributed on an \"AS IS\" BASIS,\n", "WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", "See the License for the specific language governing permissions and\n", "limitations under the License.\n", "```\n", "---" ] }, { "cell_type": "markdown", "metadata": { "id": "WaiBJ4sitrjo", "colab_type": "text" }, "source": [ "## 事前準備\n", "\n", "\n", "\n", "1. [GCP プロジェクト](https://console.cloud.google.com/cloud-resource-manager) を作成します。\n", "2. [課金設定](https://support.google.com/cloud/answer/6293499#enable-billing) を有効にします。\n", "3. [API Key](https://console.cloud.google.com/apis/credentials) を作成します。\n", "4. [Cloud Speech-to-Text API と Cloud Text-to-Speech API](https://console.cloud.google.com/flows/enableapi?apiid=speech.googleapis.com,texttospeech.googleapis.com) を有効にします。\n", "\n" ] }, { "cell_type": "markdown", "metadata": { "id": "z7mItnhNdgxa", "colab_type": "text" }, "source": [ "## Google Cloud API の認証情報を入力\n", "\n", "Google Cloud API を REST インタフェースから利用するために、 [API Key](https://console.cloud.google.com/apis/credentials) を利用します。 Google Cloud Console から API Key をコピーしましょう。\n", "\n" ] }, { "cell_type": "code", "metadata": { "id": "kkxLB3dH99vx", "colab_type": "code", "colab": {} }, "source": [ "import getpass\n", "APIKEY = getpass.getpass()" ], "execution_count": 0, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "NU_4EPm-QW4C", "colab_type": "text" }, "source": [ "## Cloud Speech-to-Text API を使ってみよう !\n", "\n", "**Google Cloud Speech-to-Text** では、使いやすい API で高度なニューラル ネットワーク モデルを適用し、音声をテキストに変換できます。" ] }, { "cell_type": "markdown", "metadata": { "id": "jfDxa91TRzno", "colab_type": "text" }, "source": [ "API Discovery Service を利用して **Cloud Speech-to-Text API** を発見します。 Cloud Speech-to-Text の REST API 仕様は [こちら](https://cloud.google.com/speech-to-text/docs/reference/rest/) に解説されています。\n", "\n" ] }, { "cell_type": "code", "metadata": { "id": "75K5Nm4mfiqR", "colab_type": "code", "colab": {} }, "source": [ "from googleapiclient.discovery import build\n", "\n", "speech_service = build('speech', 'v1p1beta1', developerKey=APIKEY)" ], "execution_count": 0, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "Qn-7PYaIZ-s2", "colab_type": "text" }, "source": [ "### 音声データの準備" ] }, { "cell_type": "markdown", "metadata": { "id": "updocvAtQIr4", "colab_type": "text" }, "source": [ "音声録音のための関数 ```record_audio``` を定義しましょう。" ] }, { "cell_type": "code", "metadata": { "id": "TvTA6GckUJ1m", "colab_type": "code", "cellView": "form", "colab": {} }, "source": [ "#@title このセルを実行して record_audio を定義\n", "\n", "# Install required libraries and packages\n", "!pip install -qq pydub\n", "!apt-get -qq update\n", "!apt-get -qq install -y ffmpeg\n", "\n", "# Define record_audio\n", "import base64\n", "import google.colab\n", "import pydub\n", "\n", "from io import BytesIO\n", "\n", "def record_audio(file_id, framerate=16000, channels=1, file_format='flac'):\n", " # Record webm file from Colaboratory.\n", " audio = google.colab._message.blocking_request(\n", " 'user_media',\n", " {\n", " 'audio': True,\n", " 'video': False,\n", " 'duration': -1\n", " },\n", " timeout_sec=600)\n", "\n", " # Convert web file into in_memory file.\n", " mfile = BytesIO(base64.b64decode(audio[audio.index(',')+1:]))\n", "\n", " # Store webm file locally.\n", " with open('{0}.webm'.format(file_id), 'wb') as f:\n", " mfile.seek(0)\n", " f.write(mfile.read())\n", "\n", " # Open stored web file and save it as wav with sample_rate=16000\n", " output_file = '{0}.{1}'.format(file_id, file_format)\n", " _ = pydub.AudioSegment.from_file('{0}.webm'.format(file_id), codec='opus')\n", " _ = _.set_channels(channels)\n", " _.set_frame_rate(framerate).export(output_file, format=file_format)\n", "\n", " return output_file" ], "execution_count": 0, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "oE3ugmiw0p9S", "colab_type": "text" }, "source": [ "```record_audio``` を実行して音声を録音しましょう。" ] }, { "cell_type": "code", "metadata": { "id": "fdc8_6qXdLOX", "colab_type": "code", "colab": {} }, "source": [ "audio_filename = record_audio('ja-sample', framerate=16000, channels=1)" ], "execution_count": 0, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "F7JgOg9e1BFv", "colab_type": "text" }, "source": [ "録音結果を確認しましょう。" ] }, { "cell_type": "code", "metadata": { "id": "8aoZaLbqUJ8Z", "colab_type": "code", "colab": {} }, "source": [ "from IPython.display import Audio\n", "\n", "Audio(audio_filename, rate=16000)" ], "execution_count": 0, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "tXTdM1JOfo2j", "colab_type": "text" }, "source": [ "### 音声認識の実行" ] }, { "cell_type": "markdown", "metadata": { "id": "UTnMKxTDUUXC", "colab_type": "text" }, "source": [ "Cloud Speech-to-Text API に入力する情報を定義します.\n" ] }, { "cell_type": "code", "metadata": { "id": "5cY3j02njak-", "colab_type": "code", "cellView": "form", "colab": {} }, "source": [ "from base64 import b64encode\n", "from json import dumps\n", "\n", "languageCode = 'en-US' #@param [\"en-US\", \"ja-JP\", \"en-IN\"]\n", "model = 'default' #@param [\"command_and_search\", \"phone_call\", \"video\", \"default\"]" ], "execution_count": 0, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "_VBjpvF-xUP2", "colab_type": "text" }, "source": [ "入力する音声データを定義します。" ] }, { "cell_type": "code", "metadata": { "id": "xcOzUij-xITA", "colab_type": "code", "colab": {} }, "source": [ "with open(audio_filename, 'rb') as audio_file:\n", " content = b64encode(audio_file.read()).decode('utf-8')\n", "\n", "my_audio = {\n", " 'content': content\n", "}" ], "execution_count": 0, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "fNUjOZM2UYy9", "colab_type": "text" }, "source": [ "[RecognitionConfig](https://cloud.google.com/speech-to-text/docs/reference/rest/v1p1beta1/RecognitionConfig) を定義します。" ] }, { "cell_type": "code", "metadata": { "id": "daSoaTHLei1O", "colab_type": "code", "colab": {} }, "source": [ "my_recognition_config = {\n", " 'encoding': 'FLAC',\n", " 'sampleRateHertz': 16000,\n", " 'languageCode': languageCode,\n", " 'model': model\n", "}" ], "execution_count": 0, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "9Q41Aw5zUG2u", "colab_type": "text" }, "source": [ "[recognize](https://cloud.google.com/speech-to-text/docs/reference/rest/v1p1beta1/speech/recognize) method のリクエストメッセージの body を定義します。" ] }, { "cell_type": "code", "metadata": { "id": "-A4v0-JYhNr5", "colab_type": "code", "colab": {} }, "source": [ "my_request_body={\n", " 'audio': my_audio,\n", " 'config': my_recognition_config,\n", "}" ], "execution_count": 0, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "QWWCxnhyhPzJ", "colab_type": "text" }, "source": [ "[recognize](https://cloud.google.com/speech-to-text/docs/reference/rest/v1p1beta1/speech/recognize) method を実行します。" ] }, { "cell_type": "code", "metadata": { "id": "qWWesAO2fLmf", "colab_type": "code", "colab": {} }, "source": [ "response = speech_service.speech().recognize(body=my_request_body).execute()" ], "execution_count": 0, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "6CFwnpWIhVwa", "colab_type": "text" }, "source": [ "[recognize](https://cloud.google.com/speech-to-text/docs/reference/rest/v1p1beta1/speech/recognize) method のレスポンスを確認します。" ] }, { "cell_type": "code", "metadata": { "id": "IIghtSiw6V0i", "colab_type": "code", "colab": {} }, "source": [ "response" ], "execution_count": 0, "outputs": [] }, { "cell_type": "code", "metadata": { "id": "HYTQmX8nfxk3", "colab_type": "code", "colab": {} }, "source": [ "for r in response[\"results\"]:\n", " print('認識結果: ', r['alternatives'][0]['transcript'])\n", " print('信頼度: ', r['alternatives'][0]['confidence'])" ], "execution_count": 0, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "zH_95Knijttl", "colab_type": "text" }, "source": [ "### 単語のタイムスタンプの取得" ] }, { "cell_type": "markdown", "metadata": { "id": "7Hok3NcZWgMh", "colab_type": "text" }, "source": [ "[RecognitionConfig](https://cloud.google.com/speech-to-text/docs/reference/rest/v1p1beta1/RecognitionConfig) に ```enableWordTimeOffsets``` の設定を追加します。" ] }, { "cell_type": "code", "metadata": { "id": "Fiot-T7W2pjg", "colab_type": "code", "colab": {} }, "source": [ "my_recognition_config = {\n", " 'encoding': 'FLAC',\n", " 'sampleRateHertz': 16000,\n", " 'languageCode': languageCode,\n", " 'model': model,\n", " 'enableWordTimeOffsets': True\n", "}\n", "\n", "my_request_body={\n", " 'audio': my_audio,\n", " 'config': my_recognition_config,\n", "}" ], "execution_count": 0, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "JSwoH5ZaWtBi", "colab_type": "text" }, "source": [ "[recognize](https://cloud.google.com/speech-to-text/docs/reference/rest/v1p1beta1/speech/recognize) method を実行します。" ] }, { "cell_type": "code", "metadata": { "id": "LwfdrOQ8208B", "colab_type": "code", "colab": {} }, "source": [ "response = speech_service.speech().recognize(body=my_request_body).execute()" ], "execution_count": 0, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "FEvyB07K6L4s", "colab_type": "text" }, "source": [ "[recognize](https://cloud.google.com/speech-to-text/docs/reference/rest/v1p1beta1/speech/recognize) method のレスポンスを確認します。" ] }, { "cell_type": "code", "metadata": { "id": "TmGnghXk6l_A", "colab_type": "code", "colab": {} }, "source": [ "response" ], "execution_count": 0, "outputs": [] }, { "cell_type": "code", "metadata": { "id": "dByPKRS-3BUP", "colab_type": "code", "colab": {} }, "source": [ "for r in response[\"results\"]:\n", " print('認識結果: ', r['alternatives'][0]['transcript'])\n", " print('信頼度: ', r['alternatives'][0]['confidence'], \"\\n\")\n", "\n", "for r in response[\"results\"][0]['alternatives'][0][\"words\"]:\n", " print(\"word: \", r[\"word\"])\n", " print(\"startTime: \", r[\"startTime\"])\n", " print(\"endTime: \", r[\"endTime\"], \"\\n\")" ], "execution_count": 0, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "HoMwModrTgq8", "colab_type": "text" }, "source": [ "### 演習問題" ] }, { "cell_type": "markdown", "metadata": { "id": "n0W73r_XX0Qa", "colab_type": "text" }, "source": [ "** 1. [こちら](https://cloud.google.com/speech-to-text/docs/word-confidence)を参考にして、単語レベルの信頼度を見てみましょう**\n" ] }, { "cell_type": "markdown", "metadata": { "id": "xyM-Oup63qHz", "colab_type": "text" }, "source": [ "## Cloud Text-to-Speech を使ってみよう !\n", "**Cloud Text-to-Speech** を使うと、自然な会話音声を合成できます。用意されている声は 30 種類。数多くの言語と方言に対応します。" ] }, { "cell_type": "markdown", "metadata": { "id": "KBPR_3dSYLL5", "colab_type": "text" }, "source": [ "API Discovery Service を利用して **Cloud Text-to-Speech API** を発見します。 Cloud Text-to-Speech の REST API 仕様は [こちら](https://cloud.google.com/text-to-speech/docs/reference/rest/) に解説されています。" ] }, { "cell_type": "code", "metadata": { "id": "TXiBcqA--cxr", "colab_type": "code", "colab": {} }, "source": [ "import textwrap\n", "from googleapiclient.discovery import build\n", "service = build('texttospeech', 'v1beta1', developerKey=APIKEY)" ], "execution_count": 0, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "rZbO9TlTEDG-", "colab_type": "text" }, "source": [ "### サポートされているすべての音声の一覧表示\n", "\n", "テキスト読み上げ合成のために Cloud Text-to-Speech API で使用できる音声を一覧表示します。なお、 ```languageCode``` は[こちら](https://cloud.google.com/text-to-speech/docs/voices)を参考にしてください。" ] }, { "cell_type": "code", "metadata": { "id": "ryVwlMHT_aQW", "colab_type": "code", "colab": {} }, "source": [ "response = service.voices().list(\n", " languageCode=\"ja_JP\",\n", ").execute()" ], "execution_count": 0, "outputs": [] }, { "cell_type": "code", "metadata": { "id": "ZnVUeDot_uiz", "colab_type": "code", "colab": {} }, "source": [ "for voice in response['voices']:\n", " print(voice)" ], "execution_count": 0, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "GFZWR2SwRTzR", "colab_type": "text" }, "source": [ "### テキストから音声を合成する\n", "\n", "[text.synthesize](https://cloud.google.com/text-to-speech/docs/reference/rest/v1beta1/text/synthesize) メソッドを使用すると、単語や文を自然な人間の音声の base64 でエンコードされた音声データに変換できます。このメソッドは、入力を生のテキストまたは[音声合成マークアップ言語(SSML)](https://cloud.google.com/text-to-speech/docs/ssml)として受け入れます。" ] }, { "cell_type": "code", "metadata": { "id": "DWdYpnM2-gRz", "colab_type": "code", "cellView": "both", "colab": {} }, "source": [ "source_language = \"ja_JP\" #@param {type: \"string\"}\n", "source_sentence = \"Google Cloud Text-to-Speech \\u3092\\u4F7F\\u3046\\u3068\\u3001\\u81EA\\u7136\\u306A\\u4F1A\\u8A71\\u97F3\\u58F0\\u3092\\u5408\\u6210\\u3067\\u304D\\u307E\\u3059\\u3002\" #@param {type:\"string\"}\n", "audio_encoding = 'OGG_OPUS' #@param ['OGG_OPUS', 'LINEAR16', 'MP3']\n", "voice_gender = 'FEMALE' #@param ['FEMALE', 'MALE', 'NEUTRAL', 'SSML_VOICE_GENDER_UNSPECIFIED']\n", "textwrap.wrap(source_sentence)\n", "voice_name = 'ja-JP-Wavenet-A' #@param {type: \"string\"}" ], "execution_count": 0, "outputs": [] }, { "cell_type": "code", "metadata": { "id": "OsTVXzMz_wHV", "colab_type": "code", "colab": {} }, "source": [ "response = service.text().synthesize(\n", " body={\n", " 'input': {\n", " 'text': source_sentence,\n", " },\n", " 'voice': {\n", " 'languageCode': source_language,\n", " 'ssmlGender': voice_gender,\n", " 'name': voice_name,\n", " },\n", " 'audioConfig': {\n", " 'audioEncoding': audio_encoding,\n", " },\n", " }\n", ").execute()" ], "execution_count": 0, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "OOEin3FXTP1S", "colab_type": "text" }, "source": [ "合成した音声を確認しましょう" ] }, { "cell_type": "code", "metadata": { "id": "DL7Z4RwH_4lO", "colab_type": "code", "colab": {} }, "source": [ "import base64\n", "from IPython.display import Audio\n", "\n", "Audio(base64.b64decode(response['audioContent']))" ], "execution_count": 0, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "skdOgoltb8Ok", "colab_type": "text" }, "source": [ "### 演習問題" ] }, { "cell_type": "markdown", "metadata": { "id": "dcK-tQxwb_uJ", "colab_type": "text" }, "source": [ "** 1. 日本語のテキストを Standard モデルで音声合成してみましょう。**" ] }, { "cell_type": "markdown", "metadata": { "id": "WsdkH0Zkb_wN", "colab_type": "text" }, "source": [ "**2. 英語のテキストを様々なモデルで音声合成してみましょう。**" ] } ] }