notebooks/batch_hello_world.ipynb (592 lines of code) (raw):
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "kGyoVNRZWPhP"
},
"source": [
"<table class=\"tfo-notebook-buttons\" align=\"left\">\n",
" <td>\n",
" <a target=\"_blank\" href=\"https://colab.research.google.com/github/GoogleCloudPlatform/scientific-computing-examples/blob/main/notebooks/batch_hello_world.ipynb\"><img src=\"https://quantumai.google/site-assets/images/buttons/colab_logo_1x.png\" />Run in Google Colab</a>\n",
" </td>\n",
" <td>\n",
" <a target=\"_blank\" href=\"https://raw.githubusercontent.com/GoogleCloudPlatform/scientific-computing-examples/main/notebooks/batch_hello_world.ipynb\"><img src=\"https://quantumai.google/site-assets/images/buttons/github_logo_1x.png\" />View source on GitHub</a>\n",
" </td>\n",
"</table>"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "eJwVuZUyVkQY"
},
"source": [
"# Overview\n",
"This tutorial will demonstrate how to run some simple jobs using [Google Cloud Batch](https://cloud.google.com/batch/docs/get-started). A few key points:\n",
"\n",
" 1. You have the choice of using a [project you have already created](https://cloud.google.com/resource-manager/docs/creating-managing-projects), or creating one here.\n",
" 1. You use a [existing GCS bucket](https://cloud.google.com/storage/docs/creating-buckets#create_a_new_bucket) from your existing project, or you can create a new one here.\n",
" 1. Most of the work communicating with Google Cloud is done with the [gcloud command](https://cloud.google.com/sdk/gcloud)\n",
" 1. There are no VMs to provision. All that is done automatically for you by Batch.\n",
"\n",
"The following series of steps will take you through some of the basics of using Cloud Batch.\n",
"\n",
" > **IMPORTANT:** \n",
" \n",
"It is assumed that the system hosting this notebook has `gcloud` installed. If not, [it can be installed according to these instructions](https://cloud.google.com/sdk/docs/install)\n",
" For this reason, we recommend the use of [Google Colab](https://research.google.com/colaboratory/)\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "YXSfZ92YuAKl"
},
"source": [
"# Getting Started\n",
"There are several steps to configure your settings with Google Cloud. Except for the Python installation below, all are completed with the `gcloud` command.\n",
"## Run Pip to install packages\n",
"Some Python packages are not preinstalled. We add them here."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "Q2Sqg-vFMS1G"
},
"outputs": [],
"source": [
"%%capture\n",
"#@title Pip Install Packages: Run Me! {display-mode: \"form\"}\n",
"!pip install Jinja2\n",
"!pip3 install grpcio"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "u4PpEvsuVUK1"
},
"source": [
"## Build Input UI. \n",
"\n",
"The form below will is for you to enter the required (and optional) information to make the rest of the tutorial work. You will enter:\n",
"\n",
"1. [The project ID](https://cloud.google.com/resource-manager/docs/creating-managing-projects#before_you_begin)\n",
"1. You user ID, which is the login id you used to connect to the Google Cloud Console.\n",
"1. The [region where you want the jobs](https://cloud.google.com/compute/docs/regions-zones) to run.\n",
"1. The name of you [GCS Bucket](https://cloud.google.com/storage/docs/creating-buckets), either already created or the one to be created in your project.\n",
"1. Your [billing id](https://cloud.google.com/billing/docs/how-to/view-linked), you may have to ask your organization manager. This is required only if you plan to create a new project here.\n",
"\n",
"Run the next code cell to create the data entry form."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "GlI68hqtjVN_"
},
"outputs": [],
"source": [
"#@title Form Setup: Run Me! {display-mode: \"form\"}\n",
"#@markdown Run this section to enter your information into the notebook\n",
"\n",
"import ipywidgets as widgets\n",
"import subprocess\n",
"account = subprocess.run(['gcloud', 'config', 'get-value', 'account'], stdout=subprocess.PIPE)\n",
"if not account.stdout: \n",
" account.stdout = 'enter-user-id' \n",
"\n",
"project = subprocess.run(['gcloud', 'config', 'get-value', 'project'], stdout=subprocess.PIPE)\n",
"if not project.stdout: \n",
" project.stdout = 'enter-project-id' \n",
"\n",
"region = subprocess.run(['gcloud', 'config', 'get-value', 'compute/region'], stdout=subprocess.PIPE)\n",
"if not region.stdout: \n",
" region.stdout = 'us-east1' \n",
"\n",
"bucket_name = \"enter-bucket-name\"\n",
"billing_id = \"enter-billing-id\"\n",
"\n",
"\n",
"project = widgets.Text( value=project.stdout, disabled=False )\n",
"username = widgets.Text( value=account.stdout, disabled=False )\n",
"region = widgets.Text( value=region.stdout, disabled=False )\n",
"bucket = widgets.Text( value=bucket_name, disabled=False )\n",
"billing = widgets.Text( value=billing_id, disabled=False )\n",
"\n",
"\n",
"\n",
"accordion = widgets.Accordion(children=[project, username, region, bucket, billing])\n",
"accordion.set_title(0, 'ProjectID')\n",
"accordion.set_title(1, 'UserID')\n",
"accordion.set_title(2, 'Region')\n",
"accordion.set_title(3, 'Bucket Name')\n",
"accordion.set_title(4, 'Billing ID [Optional]')\n",
"accordion.selected_index = None\n",
"display(accordion)\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "ps0NIbk_pBAn"
},
"source": [
"## Authenticate to Google Cloud\n",
"This step allows you to connect and run `gcloud` commands. If you get a <span style=\"color:yellow\">\"WARNING\"</span>, you may already be logged in.\n",
"\n",
"Follow the instructions to ensure the login process is completed.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true,
"id": "VxEz-nbFWScI"
},
"outputs": [],
"source": [
"!gcloud auth login $username.value"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "LoN8RDoR2wd0"
},
"source": [
"## Create the project\n",
"If the billing ID and the Project ID are set above, you can click here to create the project. If the project already exists, you will get a warning, but it will still work.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "s2yV-vcyqhsl"
},
"outputs": [],
"source": [
"def create_project():\n",
" proj = project.value.rstrip(\"\\n\\r \") ## remove \"\\n\", \"\\r\" char in the project value var\n",
"\n",
" # Don't do project creationg if the billing ID is not set\n",
" if billing.value == \"enter-billing-id\":\n",
" print(\"Billing ID not set\")\n",
" else:\n",
" !gcloud projects create $proj\n",
" !gcloud beta billing projects link $proj --billing-account=$billing.value\n",
" \n",
"create_project()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "phn_yO8O3I9W"
},
"outputs": [],
"source": [
"def create_bucket():\n",
" proj = project.value.rstrip(\"\\n\\r \") ## remove \"\\n\", \"\\r\" char in the project value var\n",
" buck = bucket.value.rstrip(\"\\r\\n \") ## remove \"\\n\" char in the bucket value var\n",
"\n",
" !echo \"Executing: \" gcloud alpha storage buckets create gs://$buck --project=$proj\n",
" !gcloud alpha storage buckets create gs://$buck --project=$proj\n",
" \n",
"create_bucket()"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "ZFGqX9-lpPRu"
},
"source": [
"## Create gcloud configuration\n",
"\n",
"There are two things happening here:\n",
"1. You are setting the configurations for `gcloud` defaults\n",
"1. You are enabling Google Cloud APIs\n",
"1. Finally, the full configuration is listed out.\n",
"\n",
"It is important that these happen in the right order, for example, you have to enable the `compute` API before `gcloud` lets you set a variable for `compute`.\n",
"\n",
"This code cell takes about 30 seconds to execute."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "9u_xa75zp-n-"
},
"outputs": [],
"source": [
"# set project\n",
"!gcloud config set project $project.value \n",
"#enable APIs\n",
"!gcloud services enable compute.googleapis.com \n",
"!gcloud services enable batch.googleapis.com\n",
"!gcloud services enable monitoring.googleapis.com\n",
"# set region \n",
"!gcloud config set compute/region $region.value\n",
"\n",
"# list out the complete configuation\n",
"!gcloud config list"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "aheY4T6GWPhT"
},
"source": [
"# Running jobs\n",
"Now that Google Cloud is configured, it is time to create the job files to guide execution, then run the jobs.\n",
"\n",
"## Create job file\n",
"The job files are JSON based and contain all the required information to run the tasks described. The jobs file you will run here are simple. \n",
"\n",
"The next code cell creates a (somewhat complicated) data structure to contain the raw text of the job files. Essentially, there are two things happening:\n",
"\n",
"1. The JSON is written as a template, so things like the GCS Bucket name can be inserted into the job file. \n",
"1. The JSON template is tied to a list that we use to make the nice green buttons below.\n",
"\n",
"If you want to see the details of the JSON< you can click the \"Show code\" link and you can see all everything.\n",
"\n",
"Otherwise, just execute the cell and move to the next section."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "C89wryR1WPhT"
},
"outputs": [],
"source": [
"#@title Create Templates for JSON settings files {display-mode: \"form\"}\n",
"#@markdown [These jobs files are from this doc](https://cloud.google.com/batch/docs/get-started#create-basic-job)\n",
"#@markdown * Create a JSON file for writing GCS bucket\n",
"\n",
"def create_jobtypelist():\n",
" jobtypelist = {}\n",
" gcs_template = \"\"\"\n",
" {\n",
" \"taskGroups\":[\n",
" {\n",
" \"taskSpec\":{\n",
" \"runnables\": [\n",
" {\n",
" \"script\": {\n",
" \"text\": \"echo Hello world from task ${BATCH_TASK_INDEX}. >> /mnt/share/log_${JOB_ID}_${BATCH_TASK_INDEX}.txt\"\n",
" },\n",
" \"environment\": {\n",
" \"variables\": {\n",
" \"JOB_ID\": \"{{ job_id }} \"\n",
" }\n",
" }\n",
" } \n",
" ],\n",
" \"volumes\": [\n",
" {\n",
" \"gcs\": {\n",
" \"remotePath\": \"{{ bucket_name }}\"\n",
" },\n",
" \"mountPath\": \"/mnt/share\"\n",
" }\n",
" ]\n",
" },\n",
" \"taskCount\": 3\n",
" }\n",
" ]\n",
" }\n",
"\n",
" \"\"\"\n",
"\n",
" #@markdown * Create a JSON file for running Hello World using a container\n",
" hello_container_template = \"\"\"{\n",
" \"taskGroups\": [\n",
" {\n",
" \"taskSpec\": {\n",
" \"runnables\": [\n",
" {\n",
" \"container\": {\n",
" \"imageUri\": \"gcr.io/google-containers/busybox\",\n",
" \"entrypoint\": \"/bin/sh\",\n",
" \"commands\": [\n",
" \"-c\",\n",
" \"echo Hello world! This is task ${BATCH_TASK_INDEX} from JOB ={{ job_id }}. This job has a total of ${BATCH_TASK_COUNT} tasks.\"\n",
" ]\n",
" }\n",
" }\n",
" ],\n",
" \"computeResource\": {\n",
" \"cpuMilli\": 2000,\n",
" \"memoryMib\": 16\n",
" },\n",
" \"maxRetryCount\": 2,\n",
" \"maxRunDuration\": \"3600s\"\n",
" },\n",
" \"taskCount\": 4,\n",
" \"parallelism\": 2\n",
" }\n",
" ],\n",
" \"allocationPolicy\": {\n",
" \"instances\": [\n",
" {\n",
" \"policy\": {\"machineType\": \"e2-standard-4\"}\n",
" }\n",
" ]\n",
" },\n",
" \"labels\": {\n",
" \"department\": \"finance\",\n",
" \"env\": \"testing\"\n",
" },\n",
" \"logsPolicy\": {\n",
" \"destination\": \"CLOUD_LOGGING\"\n",
" }\n",
" }\n",
" \"\"\"\n",
" jobtypelist[\"hello\"] = {\"desc\": \"Create Hello World\", \"code\" : hello_container_template}\n",
" jobtypelist[\"gcs\"] = {\"desc\": \"Create GCS\", \"code\": gcs_template }\n",
"\n",
" return(jobtypelist)\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "D2iq0LiFWPhW"
},
"source": [
"# Submit a job\n",
"All the preamble was to get you to this point: running a job!\n",
"\n",
"There are a couple of jobs to choose from. \n",
"\n",
"1. Create files on GCS \"Create GCS\"\n",
"1. Use a container to say \"Hello World!\": \"Create Hello World\"\n",
"\n",
"Choosing the appropriate button will submit the job.\n",
"\n",
"* Since job names have to be unique in Batch, a random job id is generated and used to create output file.\n",
"* In sections below you will get experience with jobs, listing them and deleting them.\n",
"\n",
"Now, run the next cell and click on a green button."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "Vqt3McR5WwB4"
},
"outputs": [],
"source": [
"#@title Generate buttons to submit jobs {display-mode: \"form\"}\n",
"#@markdown Create a job by selecting the button.\n",
"import jinja2\n",
"import sys\n",
"from os import environ\n",
"import ipywidgets as widgets\n",
"import uuid\n",
"\n",
"jobid = \"\"\n",
"environment = jinja2.Environment()\n",
"\n",
"jobtypelist = create_jobtypelist()\n",
"\n",
"class MyButton(widgets.Button):\n",
" code = \"\"\n",
"\n",
"def run_job(b):\n",
" jobid = \"job\" + uuid.uuid4().hex[:8]\n",
" template = environment.from_string(b.code)\n",
" render = template.render(bucket_name=bucket.value, job_id=jobid)\n",
" f = open(jobid + \".json\", \"w\")\n",
" f.write(render)\n",
" f.close()\n",
" out2.clear_output()\n",
" with out2:\n",
" print(\"\\n\\nCreating job\")\n",
" print(\"\\n\\nCreating job file: \", jobid + \"txt\")\n",
" jobstring = !echo gcloud beta batch jobs submit {jobid} --location us-east1 --config {jobid}.json\n",
" print(\"Executing command: \" + jobstring[0])\n",
" # print(render)\n",
" \n",
" ## This is the command line that is being executed.\n",
" !gcloud beta batch jobs submit {jobid} --location us-east1 --config {jobid}.json\n",
"\n",
"out = widgets.Output(layout={'border': '1px solid black'})\n",
"display(out)\n",
"out2 = widgets.Output(layout={'border': '1px solid black'})\n",
"display(out2)\n",
"with out:\n",
" for jobtype in jobtypelist:\n",
" button = MyButton(description=jobtypelist[jobtype][\"desc\"], button_style='success')\n",
" button.code = jobtypelist[jobtype] = jobtypelist[jobtype][\"code\"]\n",
" display(button)\n",
" button.on_click(run_job)\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "EqBFyr61WPhX"
},
"source": [
"## List jobs \n",
"\n",
"Once you have submitted jobs above (you can submit as many as you like), you can see the list of jobs \n",
"\n",
"You can also see the list of jobs in the console UI: \n",
"\n",
"* [Job list](https://console.cloud.google.com/batch/jobs)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "BOn0tGkOWPhY"
},
"outputs": [],
"source": [
"!gcloud beta batch jobs list --location $region.value"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "ugCnqz-eWPhY"
},
"source": [
"## Delete jobs\n",
"Once a job has run, it will remain in the list of jobs for that project indefinitely. However, if you want to delete a job, you can use the `gcloud` command as:\n",
"```\n",
"gcloud beta batch jobs delete job_name --location Region\n",
"```\n",
"For you convenience, you can run the cell below and use the button to delete you jobs by entering the job name."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "1iHwPO9jQ-sh"
},
"outputs": [],
"source": [
"#@title Use this form to delete jobs file {display-mode: \"form\"}\n",
"#@markdown [From this doc](https://cloud.google.com/batch/docs/get-started#delete_a_job)\n",
"\n",
"import ipywidgets as widgets\n",
"\n",
"job = widgets.Text( value=\"f4b6c517\", disabled=False )\n",
"button = widgets.Button(\n",
" value=False,\n",
" description='Delete job',\n",
" button_style='success', # 'success', 'info', 'warning', 'danger' or ''\n",
" tooltip='Description',\n",
" icon='check'\n",
")\n",
"display(job, button)\n",
"def delete_me(b):\n",
"\n",
" ## this is the Gcloud command you will want to execute if deleting manually\n",
" !gcloud beta batch jobs delete $job.value --location $region.value\n",
"\n",
"button.on_click(delete_me)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "Xi8GyXZ2422v"
},
"source": [
"## See output of the \"Hello World\" job\n",
"If you ran the \"Hello World\" job, the output was sent to the Cloud Logging system. To view logs, run [`gcloud logging`](https://cloud.google.com/sdk/gcloud/reference/logging) commands.\n",
" \n",
" The next cell provides a sample of how to see the output of the last few jobs"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "0IGItc3rmShK"
},
"outputs": [],
"source": [
"!gcloud logging read \"Hello world\" --limit 100 | grep 'This is task [0-9]'"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "u3Rg37jD_ywm"
},
"source": [
"## See output of the \"GCS\" job\n",
"If you ran the \"GCS\" job, the output was sent to the bucket you created above. \n",
" \n",
" The next cell provides a sample of how to see the output of all the outputs stored in the GCS bucket."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "WnBYcBww_IZE"
},
"outputs": [],
"source": [
"!gsutil ls gs://$bucket.value/*.txt\n",
"!gsutil cat gs://$bucket.value/*.txt"
]
}
],
"metadata": {
"colab": {
"collapsed_sections": [],
"provenance": [],
"toc_visible": true
},
"kernelspec": {
"display_name": "Python 3.10.5 64-bit",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.5"
},
"orig_nbformat": 4,
"vscode": {
"interpreter": {
"hash": "916dbcbb3f70747c44a77c7bcd40155683ae19c65e1c03b4aa3499c5328201f1"
}
}
},
"nbformat": 4,
"nbformat_minor": 0
}