translation/intro_translation.ipynb (515 lines of code) (raw):

{ "cells": [ { "cell_type": "code", "execution_count": null, "metadata": { "id": "ur8xi4C7S06n" }, "outputs": [], "source": [ "# Copyright 2024 Google LLC\n", "#\n", "# Licensed under the Apache License, Version 2.0 (the \"License\");\n", "# you may not use this file except in compliance with the License.\n", "# You may obtain a copy of the License at\n", "#\n", "# https://www.apache.org/licenses/LICENSE-2.0\n", "#\n", "# Unless required by applicable law or agreed to in writing, software\n", "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", "# See the License for the specific language governing permissions and\n", "# limitations under the License." ] }, { "cell_type": "markdown", "metadata": { "id": "JAPoU8Sm5E6e" }, "source": [ "# Getting Started with Translation\n", "\n", "<table align=\"left\">\n", " <td style=\"text-align: center\">\n", " <a href=\"https://colab.research.google.com/github/GoogleCloudPlatform/generative-ai/blob/main/translation/intro_translation.ipynb\">\n", " <img width=\"32px\" src=\"https://www.gstatic.com/pantheon/images/bigquery/welcome_page/colab-logo.svg\" alt=\"Google Colaboratory logo\"><br> Open in Colab\n", " </a>\n", " </td>\n", " <td style=\"text-align: center\">\n", " <a href=\"https://console.cloud.google.com/vertex-ai/colab/import/https:%2F%2Fraw.githubusercontent.com%2FGoogleCloudPlatform%2Fgenerative-ai%2Fmain%2Ftranslation%2Fintro_translation.ipynb\">\n", " <img width=\"32px\" src=\"https://lh3.googleusercontent.com/JmcxdQi-qOpctIvWKgPtrzZdJJK-J3sWE1RsfjZNwshCFgE_9fULcNpuXYTilIR2hjwN\" alt=\"Google Cloud Colab Enterprise logo\"><br> Open in Colab Enterprise\n", " </a>\n", " </td>\n", " <td style=\"text-align: center\">\n", " <a href=\"https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://raw.githubusercontent.com/GoogleCloudPlatform/generative-ai/main/translation/intro_translation.ipynb\">\n", " <img src=\"https://www.gstatic.com/images/branding/gcpiconscolors/vertexai/v1/32px.svg\" alt=\"Vertex AI logo\"><br> Open in Vertex AI Workbench\n", " </a>\n", " </td>\n", " <td style=\"text-align: center\">\n", " <a href=\"https://github.com/GoogleCloudPlatform/generative-ai/blob/main/translation/intro_translation.ipynb\">\n", " <img width=\"32px\" src=\"https://www.svgrepo.com/download/217753/github.svg\" alt=\"GitHub logo\"><br> View on GitHub\n", " </a>\n", " </td>\n", "</table>\n", "\n", "<div style=\"clear: both;\"></div>\n", "\n", "<b>Share to:</b>\n", "\n", "<a href=\"https://www.linkedin.com/sharing/share-offsite/?url=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/translation/intro_translation.ipynb\" target=\"_blank\">\n", " <img width=\"20px\" src=\"https://upload.wikimedia.org/wikipedia/commons/8/81/LinkedIn_icon.svg\" alt=\"LinkedIn logo\">\n", "</a>\n", "\n", "<a href=\"https://bsky.app/intent/compose?text=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/translation/intro_translation.ipynb\" target=\"_blank\">\n", " <img width=\"20px\" src=\"https://upload.wikimedia.org/wikipedia/commons/7/7a/Bluesky_Logo.svg\" alt=\"Bluesky logo\">\n", "</a>\n", "\n", "<a href=\"https://twitter.com/intent/tweet?url=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/translation/intro_translation.ipynb\" target=\"_blank\">\n", " <img width=\"20px\" src=\"https://upload.wikimedia.org/wikipedia/commons/5/5a/X_icon_2.svg\" alt=\"X logo\">\n", "</a>\n", "\n", "<a href=\"https://reddit.com/submit?url=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/translation/intro_translation.ipynb\" target=\"_blank\">\n", " <img width=\"20px\" src=\"https://redditinc.com/hubfs/Reddit%20Inc/Brand/Reddit_Logo.png\" alt=\"Reddit logo\">\n", "</a>\n", "\n", "<a href=\"https://www.facebook.com/sharer/sharer.php?u=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/translation/intro_translation.ipynb\" target=\"_blank\">\n", " <img width=\"20px\" src=\"https://upload.wikimedia.org/wikipedia/commons/5/51/Facebook_f_logo_%282019%29.svg\" alt=\"Facebook logo\">\n", "</a>" ] }, { "cell_type": "markdown", "metadata": { "id": "adb516335e46" }, "source": [ "| Author |\n", "| --- |\n", "| [Holt Skinner](https://github.com/holtskinner) |" ] }, { "cell_type": "markdown", "metadata": { "id": "tvgnzT1CKxrO" }, "source": [ "## Overview\n", "\n", "This notebook demonstrates how to use the [Google Cloud Translation API](https://cloud.google.com/translate) to translate text in [130+ languages](https://cloud.google.com/translate/docs/languages)." ] }, { "cell_type": "markdown", "metadata": { "id": "d975e698c9a4" }, "source": [ "### Objective\n", "\n", "This tutorial uses the following Google Cloud AI services and resources:\n", "\n", "- [Cloud Translation API](https://cloud.google.com/translate/docs/overview)\n", "- Cloud Storage\n" ] }, { "cell_type": "markdown", "metadata": { "id": "aed92deeb4a0" }, "source": [ "### Costs\n", "\n", "This tutorial uses billable components of Google Cloud:\n", "\n", "* Cloud Translation\n", "* Cloud Storage\n", "\n", "Learn about [Translate pricing](https://cloud.google.com/translate/pricing),\n", "and [Cloud Storage pricing](https://cloud.google.com/storage/pricing),\n", "and use the [Pricing Calculator](https://cloud.google.com/products/calculator/)\n", "to generate a cost estimate based on your projected usage." ] }, { "cell_type": "markdown", "metadata": { "id": "i7EUnXsZhAGF" }, "source": [ "## Getting Started\n" ] }, { "cell_type": "markdown", "metadata": { "id": "NNSWiCNPjh_p" }, "source": [ "### Install Vertex AI SDK, other packages and their dependencies\n", "\n", "Install the following packages required to execute this notebook." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "2b4ef9b72d43" }, "outputs": [], "source": [ "# Install the packages\n", "%pip install --user --upgrade -q google-cloud-translate" ] }, { "cell_type": "markdown", "metadata": { "id": "YbMFqPZ3tnwz" }, "source": [ "Set the project and region.\n", "\n", "* Please note the **available regions** for Translation, see [documentation](https://cloud.google.com/translate/docs/advanced/endpoints)" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "id": "GjSsu6cmUdEx" }, "outputs": [], "source": [ "# Use the environment variable if the user doesn't provide Project ID.\n", "import os\n", "\n", "PROJECT_ID = \"[your-project-id]\" # @param {type: \"string\", placeholder: \"[your-project-id]\", isTemplate: true}\n", "if not PROJECT_ID or PROJECT_ID == \"[your-project-id]\":\n", " PROJECT_ID = str(os.environ.get(\"GOOGLE_CLOUD_PROJECT\"))\n", "\n", "LOCATION = os.environ.get(\"GOOGLE_CLOUD_REGION\", \"us-central1\")" ] }, { "cell_type": "markdown", "metadata": { "id": "opUxT_k5TdgP" }, "source": [ "### Authenticating your notebook environment\n", "\n", "* If you are using **Colab** to run this notebook, run the cell below and continue.\n", "* If you are using **Vertex AI Workbench**, check out the setup instructions [here](https://github.com/GoogleCloudPlatform/generative-ai/tree/main/setup-env)." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "vbNgv4q1T2Mi" }, "outputs": [], "source": [ "import sys\n", "\n", "# Additional authentication is required for Google Colab\n", "if \"google.colab\" in sys.modules:\n", " # Authenticate user to Google Cloud\n", " from google.colab import auth\n", "\n", " auth.authenticate_user()\n", "\n", " ! gcloud config set project {PROJECT_ID}\n", " ! gcloud auth application-default login -q" ] }, { "cell_type": "markdown", "metadata": { "id": "960505627ddf" }, "source": [ "### Import libraries" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "id": "PyQmSRbKA8r-" }, "outputs": [], "source": [ "from google.cloud import translate" ] }, { "cell_type": "markdown", "metadata": { "id": "v4gUI8WqciKS" }, "source": [ "### Create client" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "id": "6Pl3un_YciKS" }, "outputs": [], "source": [ "client = translate.TranslationServiceClient(\n", " # Optional: https://cloud.google.com/translate/docs/advanced/endpoints\n", " # client_options=ClientOptions(\n", " # api_endpoint=f\"translate-{TRANSLATE_LOCATION}.googleapis.com\"\n", " # )\n", ")" ] }, { "cell_type": "markdown", "metadata": { "id": "v5CEc4-Wrjk2" }, "source": [ "### Create helper functions" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "id": "kYx2wwhjrmD6" }, "outputs": [], "source": [ "def translate_text(\n", " text: str,\n", " project_id: str = PROJECT_ID,\n", " location: str = LOCATION,\n", " glossary: str | None = None,\n", ") -> translate.TranslateTextResponse:\n", " \"\"\"Translating Text.\"\"\"\n", " # Translate text from English to Spanish\n", " # Detail on supported types can be found here:\n", " # https://cloud.google.com/translate/docs/supported-formats\n", " response = client.translate_text(\n", " request=translate.TranslateTextRequest(\n", " parent=client.common_location_path(project_id, location),\n", " contents=[text],\n", " # Supported language codes: https://cloud.google.com/translate/docs/languages\n", " source_language_code=\"en\",\n", " target_language_code=\"es\",\n", " glossary_config=(\n", " translate.TranslateTextGlossaryConfig(glossary=glossary)\n", " if glossary\n", " else None\n", " ),\n", " )\n", " )\n", "\n", " return response\n", "\n", "\n", "def create_glossary(\n", " input_uri: str,\n", " glossary_id: str,\n", " project_id: str = PROJECT_ID,\n", " location: str = LOCATION,\n", " timeout: int = 180,\n", ") -> translate.Glossary:\n", " \"\"\"\n", " Create a unidirectional glossary. Glossary can be words or\n", " short phrases (usually fewer than five words).\n", " https://cloud.google.com/translate/docs/advanced/glossary#format-glossary\n", " \"\"\"\n", " glossary = translate.Glossary(\n", " name=client.glossary_path(project_id, location, glossary_id),\n", " # Supported language codes: https://cloud.google.com/translate/docs/languages\n", " language_pair=translate.Glossary.LanguageCodePair(\n", " source_language_code=\"en\", target_language_code=\"es\"\n", " ),\n", " input_config=translate.GlossaryInputConfig(\n", " gcs_source=translate.GcsSource(input_uri=input_uri)\n", " ),\n", " )\n", "\n", " # glossary is a custom dictionary Translation API uses\n", " # to translate the domain-specific terminology.\n", " operation = client.create_glossary(\n", " parent=client.common_location_path(project_id, location), glossary=glossary\n", " )\n", "\n", " result = operation.result(timeout)\n", " return result" ] }, { "cell_type": "markdown", "metadata": { "id": "a93a71e75b78" }, "source": [ "Now let's try to translate a simple phrase from English to Spanish." ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "id": "2e8e84e860db" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Translated text: ¡Hola!\n" ] } ], "source": [ "response = translate_text(\"Hi there!\")\n", "\n", "# Display the translation for each input text provided\n", "for translation in response.translations:\n", " print(f\"Translated text: {translation.translated_text}\")" ] }, { "cell_type": "markdown", "metadata": { "id": "6a84167808aa" }, "source": [ "## Glossaries" ] }, { "cell_type": "markdown", "metadata": { "id": "bd76360f8068" }, "source": [ "That looks great! However, let's look at what happens if we try to translate a technical word, such as the Google Cloud product [Compute Engine](https://cloud.google.com/compute?hl=en)." ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "id": "894401e53921" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Translated text: Motor de Computación\n" ] } ], "source": [ "response = translate_text(\"Compute Engine\")\n", "\n", "# Display the translation for each input text provided\n", "for translation in response.translations:\n", " print(f\"Translated text: {translation.translated_text}\")" ] }, { "cell_type": "markdown", "metadata": { "id": "6760e9c66d88" }, "source": [ "### Create a Glossary" ] }, { "cell_type": "markdown", "metadata": { "id": "120088b42a11" }, "source": [ "Notice that the Translation API translated the name literally.\n", "\n", "Suppose we want this name to be the same in all languages, we can create a [Glossary](https://cloud.google.com/translate/docs/advanced/glossary) to consistently translate domain-specific words and phrases.\n", "\n", "Next, we'll create a glossary for lots of Google Cloud product names to indicate how they should be translated into Spanish.\n", "\n", "We've already created an input TSV file and uploaded it to a publicly-accessible Cloud Storage bucket." ] }, { "cell_type": "code", "execution_count": 32, "metadata": { "id": "25714948fd80" }, "outputs": [], "source": [ "glossary = create_glossary(\n", " input_uri=\"gs://github-repo/translation/GoogleCloudGlossary.tsv\",\n", " glossary_id=\"google_cloud_english_to_spanish\",\n", ")\n", "print(glossary)" ] }, { "cell_type": "markdown", "metadata": { "id": "059abb41d254" }, "source": [ "Now, let's try translating the text again using the glossary." ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "id": "0c9e51afa18f" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Default Translated text: Motor de Computación\n", "Glossary Translated text: Compute Engine\n" ] } ], "source": [ "response = translate_text(\"Compute Engine\", glossary=glossary.name)\n", "\n", "# Display the translation for each input text provided\n", "for translation in response.translations:\n", " print(f\"Default Translated text: {translation.translated_text}\")\n", "\n", "for translation in response.glossary_translations:\n", " print(f\"Glossary Translated text: {translation.translated_text}\")" ] } ], "metadata": { "colab": { "name": "intro_translation.ipynb", "toc_visible": true }, "kernelspec": { "display_name": "Python 3", "name": "python3" } }, "nbformat": 4, "nbformat_minor": 0 }