translation/intro_translation.ipynb (515 lines of code) (raw):
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "ur8xi4C7S06n"
},
"outputs": [],
"source": [
"# Copyright 2024 Google LLC\n",
"#\n",
"# Licensed under the Apache License, Version 2.0 (the \"License\");\n",
"# you may not use this file except in compliance with the License.\n",
"# You may obtain a copy of the License at\n",
"#\n",
"# https://www.apache.org/licenses/LICENSE-2.0\n",
"#\n",
"# Unless required by applicable law or agreed to in writing, software\n",
"# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
"# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
"# See the License for the specific language governing permissions and\n",
"# limitations under the License."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "JAPoU8Sm5E6e"
},
"source": [
"# Getting Started with Translation\n",
"\n",
"<table align=\"left\">\n",
" <td style=\"text-align: center\">\n",
" <a href=\"https://colab.research.google.com/github/GoogleCloudPlatform/generative-ai/blob/main/translation/intro_translation.ipynb\">\n",
" <img width=\"32px\" src=\"https://www.gstatic.com/pantheon/images/bigquery/welcome_page/colab-logo.svg\" alt=\"Google Colaboratory logo\"><br> Open in Colab\n",
" </a>\n",
" </td>\n",
" <td style=\"text-align: center\">\n",
" <a href=\"https://console.cloud.google.com/vertex-ai/colab/import/https:%2F%2Fraw.githubusercontent.com%2FGoogleCloudPlatform%2Fgenerative-ai%2Fmain%2Ftranslation%2Fintro_translation.ipynb\">\n",
" <img width=\"32px\" src=\"https://lh3.googleusercontent.com/JmcxdQi-qOpctIvWKgPtrzZdJJK-J3sWE1RsfjZNwshCFgE_9fULcNpuXYTilIR2hjwN\" alt=\"Google Cloud Colab Enterprise logo\"><br> Open in Colab Enterprise\n",
" </a>\n",
" </td>\n",
" <td style=\"text-align: center\">\n",
" <a href=\"https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://raw.githubusercontent.com/GoogleCloudPlatform/generative-ai/main/translation/intro_translation.ipynb\">\n",
" <img src=\"https://www.gstatic.com/images/branding/gcpiconscolors/vertexai/v1/32px.svg\" alt=\"Vertex AI logo\"><br> Open in Vertex AI Workbench\n",
" </a>\n",
" </td>\n",
" <td style=\"text-align: center\">\n",
" <a href=\"https://github.com/GoogleCloudPlatform/generative-ai/blob/main/translation/intro_translation.ipynb\">\n",
" <img width=\"32px\" src=\"https://www.svgrepo.com/download/217753/github.svg\" alt=\"GitHub logo\"><br> View on GitHub\n",
" </a>\n",
" </td>\n",
"</table>\n",
"\n",
"<div style=\"clear: both;\"></div>\n",
"\n",
"<b>Share to:</b>\n",
"\n",
"<a href=\"https://www.linkedin.com/sharing/share-offsite/?url=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/translation/intro_translation.ipynb\" target=\"_blank\">\n",
" <img width=\"20px\" src=\"https://upload.wikimedia.org/wikipedia/commons/8/81/LinkedIn_icon.svg\" alt=\"LinkedIn logo\">\n",
"</a>\n",
"\n",
"<a href=\"https://bsky.app/intent/compose?text=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/translation/intro_translation.ipynb\" target=\"_blank\">\n",
" <img width=\"20px\" src=\"https://upload.wikimedia.org/wikipedia/commons/7/7a/Bluesky_Logo.svg\" alt=\"Bluesky logo\">\n",
"</a>\n",
"\n",
"<a href=\"https://twitter.com/intent/tweet?url=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/translation/intro_translation.ipynb\" target=\"_blank\">\n",
" <img width=\"20px\" src=\"https://upload.wikimedia.org/wikipedia/commons/5/5a/X_icon_2.svg\" alt=\"X logo\">\n",
"</a>\n",
"\n",
"<a href=\"https://reddit.com/submit?url=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/translation/intro_translation.ipynb\" target=\"_blank\">\n",
" <img width=\"20px\" src=\"https://redditinc.com/hubfs/Reddit%20Inc/Brand/Reddit_Logo.png\" alt=\"Reddit logo\">\n",
"</a>\n",
"\n",
"<a href=\"https://www.facebook.com/sharer/sharer.php?u=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/translation/intro_translation.ipynb\" target=\"_blank\">\n",
" <img width=\"20px\" src=\"https://upload.wikimedia.org/wikipedia/commons/5/51/Facebook_f_logo_%282019%29.svg\" alt=\"Facebook logo\">\n",
"</a>"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "adb516335e46"
},
"source": [
"| Author |\n",
"| --- |\n",
"| [Holt Skinner](https://github.com/holtskinner) |"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "tvgnzT1CKxrO"
},
"source": [
"## Overview\n",
"\n",
"This notebook demonstrates how to use the [Google Cloud Translation API](https://cloud.google.com/translate) to translate text in [130+ languages](https://cloud.google.com/translate/docs/languages)."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "d975e698c9a4"
},
"source": [
"### Objective\n",
"\n",
"This tutorial uses the following Google Cloud AI services and resources:\n",
"\n",
"- [Cloud Translation API](https://cloud.google.com/translate/docs/overview)\n",
"- Cloud Storage\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "aed92deeb4a0"
},
"source": [
"### Costs\n",
"\n",
"This tutorial uses billable components of Google Cloud:\n",
"\n",
"* Cloud Translation\n",
"* Cloud Storage\n",
"\n",
"Learn about [Translate pricing](https://cloud.google.com/translate/pricing),\n",
"and [Cloud Storage pricing](https://cloud.google.com/storage/pricing),\n",
"and use the [Pricing Calculator](https://cloud.google.com/products/calculator/)\n",
"to generate a cost estimate based on your projected usage."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "i7EUnXsZhAGF"
},
"source": [
"## Getting Started\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "NNSWiCNPjh_p"
},
"source": [
"### Install Vertex AI SDK, other packages and their dependencies\n",
"\n",
"Install the following packages required to execute this notebook."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "2b4ef9b72d43"
},
"outputs": [],
"source": [
"# Install the packages\n",
"%pip install --user --upgrade -q google-cloud-translate"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "YbMFqPZ3tnwz"
},
"source": [
"Set the project and region.\n",
"\n",
"* Please note the **available regions** for Translation, see [documentation](https://cloud.google.com/translate/docs/advanced/endpoints)"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"id": "GjSsu6cmUdEx"
},
"outputs": [],
"source": [
"# Use the environment variable if the user doesn't provide Project ID.\n",
"import os\n",
"\n",
"PROJECT_ID = \"[your-project-id]\" # @param {type: \"string\", placeholder: \"[your-project-id]\", isTemplate: true}\n",
"if not PROJECT_ID or PROJECT_ID == \"[your-project-id]\":\n",
" PROJECT_ID = str(os.environ.get(\"GOOGLE_CLOUD_PROJECT\"))\n",
"\n",
"LOCATION = os.environ.get(\"GOOGLE_CLOUD_REGION\", \"us-central1\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "opUxT_k5TdgP"
},
"source": [
"### Authenticating your notebook environment\n",
"\n",
"* If you are using **Colab** to run this notebook, run the cell below and continue.\n",
"* If you are using **Vertex AI Workbench**, check out the setup instructions [here](https://github.com/GoogleCloudPlatform/generative-ai/tree/main/setup-env)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "vbNgv4q1T2Mi"
},
"outputs": [],
"source": [
"import sys\n",
"\n",
"# Additional authentication is required for Google Colab\n",
"if \"google.colab\" in sys.modules:\n",
" # Authenticate user to Google Cloud\n",
" from google.colab import auth\n",
"\n",
" auth.authenticate_user()\n",
"\n",
" ! gcloud config set project {PROJECT_ID}\n",
" ! gcloud auth application-default login -q"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "960505627ddf"
},
"source": [
"### Import libraries"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"id": "PyQmSRbKA8r-"
},
"outputs": [],
"source": [
"from google.cloud import translate"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "v4gUI8WqciKS"
},
"source": [
"### Create client"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"id": "6Pl3un_YciKS"
},
"outputs": [],
"source": [
"client = translate.TranslationServiceClient(\n",
" # Optional: https://cloud.google.com/translate/docs/advanced/endpoints\n",
" # client_options=ClientOptions(\n",
" # api_endpoint=f\"translate-{TRANSLATE_LOCATION}.googleapis.com\"\n",
" # )\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "v5CEc4-Wrjk2"
},
"source": [
"### Create helper functions"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"id": "kYx2wwhjrmD6"
},
"outputs": [],
"source": [
"def translate_text(\n",
" text: str,\n",
" project_id: str = PROJECT_ID,\n",
" location: str = LOCATION,\n",
" glossary: str | None = None,\n",
") -> translate.TranslateTextResponse:\n",
" \"\"\"Translating Text.\"\"\"\n",
" # Translate text from English to Spanish\n",
" # Detail on supported types can be found here:\n",
" # https://cloud.google.com/translate/docs/supported-formats\n",
" response = client.translate_text(\n",
" request=translate.TranslateTextRequest(\n",
" parent=client.common_location_path(project_id, location),\n",
" contents=[text],\n",
" # Supported language codes: https://cloud.google.com/translate/docs/languages\n",
" source_language_code=\"en\",\n",
" target_language_code=\"es\",\n",
" glossary_config=(\n",
" translate.TranslateTextGlossaryConfig(glossary=glossary)\n",
" if glossary\n",
" else None\n",
" ),\n",
" )\n",
" )\n",
"\n",
" return response\n",
"\n",
"\n",
"def create_glossary(\n",
" input_uri: str,\n",
" glossary_id: str,\n",
" project_id: str = PROJECT_ID,\n",
" location: str = LOCATION,\n",
" timeout: int = 180,\n",
") -> translate.Glossary:\n",
" \"\"\"\n",
" Create a unidirectional glossary. Glossary can be words or\n",
" short phrases (usually fewer than five words).\n",
" https://cloud.google.com/translate/docs/advanced/glossary#format-glossary\n",
" \"\"\"\n",
" glossary = translate.Glossary(\n",
" name=client.glossary_path(project_id, location, glossary_id),\n",
" # Supported language codes: https://cloud.google.com/translate/docs/languages\n",
" language_pair=translate.Glossary.LanguageCodePair(\n",
" source_language_code=\"en\", target_language_code=\"es\"\n",
" ),\n",
" input_config=translate.GlossaryInputConfig(\n",
" gcs_source=translate.GcsSource(input_uri=input_uri)\n",
" ),\n",
" )\n",
"\n",
" # glossary is a custom dictionary Translation API uses\n",
" # to translate the domain-specific terminology.\n",
" operation = client.create_glossary(\n",
" parent=client.common_location_path(project_id, location), glossary=glossary\n",
" )\n",
"\n",
" result = operation.result(timeout)\n",
" return result"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "a93a71e75b78"
},
"source": [
"Now let's try to translate a simple phrase from English to Spanish."
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"id": "2e8e84e860db"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Translated text: ¡Hola!\n"
]
}
],
"source": [
"response = translate_text(\"Hi there!\")\n",
"\n",
"# Display the translation for each input text provided\n",
"for translation in response.translations:\n",
" print(f\"Translated text: {translation.translated_text}\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "6a84167808aa"
},
"source": [
"## Glossaries"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "bd76360f8068"
},
"source": [
"That looks great! However, let's look at what happens if we try to translate a technical word, such as the Google Cloud product [Compute Engine](https://cloud.google.com/compute?hl=en)."
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {
"id": "894401e53921"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Translated text: Motor de Computación\n"
]
}
],
"source": [
"response = translate_text(\"Compute Engine\")\n",
"\n",
"# Display the translation for each input text provided\n",
"for translation in response.translations:\n",
" print(f\"Translated text: {translation.translated_text}\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "6760e9c66d88"
},
"source": [
"### Create a Glossary"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "120088b42a11"
},
"source": [
"Notice that the Translation API translated the name literally.\n",
"\n",
"Suppose we want this name to be the same in all languages, we can create a [Glossary](https://cloud.google.com/translate/docs/advanced/glossary) to consistently translate domain-specific words and phrases.\n",
"\n",
"Next, we'll create a glossary for lots of Google Cloud product names to indicate how they should be translated into Spanish.\n",
"\n",
"We've already created an input TSV file and uploaded it to a publicly-accessible Cloud Storage bucket."
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {
"id": "25714948fd80"
},
"outputs": [],
"source": [
"glossary = create_glossary(\n",
" input_uri=\"gs://github-repo/translation/GoogleCloudGlossary.tsv\",\n",
" glossary_id=\"google_cloud_english_to_spanish\",\n",
")\n",
"print(glossary)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "059abb41d254"
},
"source": [
"Now, let's try translating the text again using the glossary."
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"id": "0c9e51afa18f"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Default Translated text: Motor de Computación\n",
"Glossary Translated text: Compute Engine\n"
]
}
],
"source": [
"response = translate_text(\"Compute Engine\", glossary=glossary.name)\n",
"\n",
"# Display the translation for each input text provided\n",
"for translation in response.translations:\n",
" print(f\"Default Translated text: {translation.translated_text}\")\n",
"\n",
"for translation in response.glossary_translations:\n",
" print(f\"Glossary Translated text: {translation.translated_text}\")"
]
}
],
"metadata": {
"colab": {
"name": "intro_translation.ipynb",
"toc_visible": true
},
"kernelspec": {
"display_name": "Python 3",
"name": "python3"
}
},
"nbformat": 4,
"nbformat_minor": 0
}