notebooks/batch_create_a2_vm.ipynb (473 lines of code) (raw):
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "kGyoVNRZWPhP"
},
"source": [
"<table class=\"tfo-notebook-buttons\" align=\"left\">\n",
" <td>\n",
" <a target=\"_blank\" href=\"https://colab.research.google.com/github/GoogleCloudPlatform/scientific-computing-examples/blob/main/notebooks/batch_create_a2_vm.ipynb\"><img src=\"https://quantumai.google/site-assets/images/buttons/colab_logo_1x.png\" />Run in Google Colab</a>\n",
" </td>\n",
" <td>\n",
" <a target=\"_blank\" href=\"https://raw.githubusercontent.com/GoogleCloudPlatform/scientific-computing-examples/main/notebooks/batch_create_a2_vm.ipynb\"><img src=\"https://quantumai.google/site-assets/images/buttons/github_logo_1x.png\" />View source on GitHub</a>\n",
" </td>\n",
"</table>"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "eJwVuZUyVkQY"
},
"source": [
"# Overview\n",
"This tutorial will demonstrate how to run some simple jobs using [Google Cloud Batch](https://cloud.google.com/batch/docs/get-started). A few key points:\n",
"\n",
" 1. You have the choice of using a [project you have already created](https://cloud.google.com/resource-manager/docs/creating-managing-projects), or creating one here.\n",
" 1. You use a [existing GCS bucket](https://cloud.google.com/storage/docs/creating-buckets#create_a_new_bucket) from your existing project, or you can create a new one here.\n",
" 1. Most of the work communicating with Google Cloud is done with the [gcloud command](https://cloud.google.com/sdk/gcloud)\n",
" 1. There are no VMs to provision. All that is done automatically for you by Batch.\n",
"\n",
"The following series of steps will take you through some of the basics of using Cloud Batch.\n",
"\n",
" > **IMPORTANT:** \n",
" \n",
"It is assumed that the system hosting this notebook has `gcloud` installed. If not, [it can be installed according to these instructions](https://cloud.google.com/sdk/docs/install)\n",
" For this reason, we recommend the use of [Google Colab](https://research.google.com/colaboratory/)\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "YXSfZ92YuAKl"
},
"source": [
"# Getting Started\n",
"There are several steps to configure your settings with Google Cloud. Except for the Python installation below, all are completed with the `gcloud` command.\n",
"## Run Pip to install packages\n",
"Some Python packages are not preinstalled. We add them here."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "Q2Sqg-vFMS1G"
},
"outputs": [],
"source": [
"%%capture\n",
"#@title Pip Install Packages: Run Me! {display-mode: \"form\"}\n",
"!pip install Jinja2\n",
"!pip3 install grpcio"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "u4PpEvsuVUK1"
},
"source": [
"## Build Input UI. \n",
"\n",
"The form below will is for you to enter the required (and optional) information to make the rest of the tutorial work. You will enter:\n",
"\n",
"1. [The project ID](https://cloud.google.com/resource-manager/docs/creating-managing-projects#before_you_begin)\n",
"1. You user ID, which is the login id you used to connect to the Google Cloud Console.\n",
"1. The [region where you want the jobs](https://cloud.google.com/compute/docs/regions-zones) to run.\n",
"1. The name of you [GCS Bucket](https://cloud.google.com/storage/docs/creating-buckets), either already created or the one to be created in your project.\n",
"1. Your [billing id](https://cloud.google.com/billing/docs/how-to/view-linked), you may have to ask your organization manager. This is required only if you plan to create a new project here.\n",
"\n",
"Run the next code cell to create the data entry form."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "GlI68hqtjVN_"
},
"outputs": [],
"source": [
"#@title Form Setup: Run Me! {display-mode: \"form\"}\n",
"#@markdown Run this section to enter your information into the notebook\n",
"\n",
"import ipywidgets as widgets\n",
"import subprocess\n",
"from IPython.core.display import display, Markdown\n",
"\n",
"account = subprocess.run(['gcloud', 'config', 'get-value', 'account'], stdout=subprocess.PIPE)\n",
"if not account.stdout: \n",
" account.stdout = 'enter-user-id' \n",
"\n",
"project = subprocess.run(['gcloud', 'config', 'get-value', 'project'], stdout=subprocess.PIPE)\n",
"if not project.stdout: \n",
" project.stdout = 'enter-project-id' \n",
"\n",
"region = subprocess.run(['gcloud', 'config', 'get-value', 'compute/region'], stdout=subprocess.PIPE)\n",
"if not region.stdout: \n",
" region.stdout = 'us-central1' \n",
"\n",
"bucket_name = \"enter-bucket-name\"\n",
"billing_id = \"enter-billing-id\"\n",
"\n",
"\n",
"project = widgets.Text( value=project.stdout, disabled=False )\n",
"username = widgets.Text( value=account.stdout, disabled=False )\n",
"region = widgets.Text( value=region.stdout, disabled=False )\n",
"bucket = widgets.Text( value=bucket_name, disabled=False )\n",
"billing = widgets.Text( value=billing_id, disabled=False )\n",
"\n",
"\n",
"\n",
"accordion = widgets.Accordion(children=[project, username, region, bucket, billing])\n",
"accordion.set_title(0, 'ProjectID')\n",
"accordion.set_title(1, 'UserID')\n",
"accordion.set_title(2, 'Region')\n",
"accordion.set_title(3, 'Bucket Name')\n",
"accordion.set_title(4, 'Billing ID [Optional]')\n",
"accordion.selected_index = None\n",
"\n",
"display(Markdown(\"## Enter values below: \"))\n",
"display(accordion)\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "ps0NIbk_pBAn"
},
"source": [
"## Authenticate to Google Cloud\n",
"This step allows you to connect and run `gcloud` commands. If you get a <span style=\"color:yellow\">\"WARNING\"</span>, you may already be logged in.\n",
"\n",
"Follow the instructions to ensure the login process is completed.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true,
"id": "VxEz-nbFWScI"
},
"outputs": [],
"source": [
"!gcloud auth login $username.value"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "LoN8RDoR2wd0"
},
"source": [
"## Create the project\n",
"If the billing ID and the Project ID are set above, you can click here to create the project. If the project already exists, you will get a warning, but it will still work.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "s2yV-vcyqhsl"
},
"outputs": [],
"source": [
"def create_project():\n",
" proj = project.value.rstrip(\"\\n\\r \") ## remove \"\\n\", \"\\r\" char in the project value var\n",
"\n",
" # Don't do project creationg if the billing ID is not set\n",
" if billing.value == \"enter-billing-id\":\n",
" print(\"Billing ID not set\")\n",
" else:\n",
" !gcloud projects create $proj\n",
" !gcloud beta billing projects link $proj --billing-account=$billing.value\n",
" \n",
"create_project()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "phn_yO8O3I9W"
},
"outputs": [],
"source": [
"def create_bucket():\n",
" proj = project.value.rstrip(\"\\n\\r \") ## remove \"\\n\", \"\\r\" char in the project value var\n",
" buck = bucket.value.rstrip(\"\\r\\n \") ## remove \"\\n\" char in the bucket value var\n",
"\n",
" !echo \"Executing: \" gcloud alpha storage buckets create gs://$buck --project=$proj\n",
" !gcloud alpha storage buckets create gs://$buck --project=$proj\n",
" \n",
"create_bucket()"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "ZFGqX9-lpPRu"
},
"source": [
"## Create gcloud configuration\n",
"\n",
"There are two things happening here:\n",
"1. You are setting the configurations for `gcloud` defaults\n",
"1. You are enabling Google Cloud APIs\n",
"1. Finally, the full configuration is listed out.\n",
"\n",
"It is important that these happen in the right order, for example, you have to enable the `compute` API before `gcloud` lets you set a variable for `compute`.\n",
"\n",
"This code cell takes about 30 seconds to execute."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "9u_xa75zp-n-"
},
"outputs": [],
"source": [
"# set project\n",
"!gcloud config set project $project.value \n",
"#enable APIs\n",
"!gcloud services enable compute.googleapis.com \n",
"!gcloud services enable batch.googleapis.com\n",
"!gcloud services enable monitoring.googleapis.com\n",
"# set region \n",
"!gcloud config set compute/region $region.value\n",
"\n",
"# list out the complete configuation\n",
"!gcloud config list"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "uzqk-xqkNEyx"
},
"source": [
"## Create _Instance Template_ for Batch to create a VM\n",
"The following command creates an instance template that is used by Batch to\n",
"create a VM."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "6t5MqDQCG9gF"
},
"outputs": [],
"source": [
"!gcloud compute instance-templates create gpu-schedmd \\\n",
" --machine-type=a2-highgpu-2g \\\n",
" --maintenance-policy=TERMINATE \\\n",
" --image-family=schedmd-v5-slurm-22-05-3-hpc-centos-7 \\\n",
" --image-project=schedmd-slurm-public \\\n",
" --boot-disk-size=300"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "Vqt3McR5WwB4"
},
"outputs": [],
"source": [
"#@title Generate buttons to submit jobs {display-mode: \"form\"}\n",
"#@markdown Create a job by selecting the button.\n",
"import jinja2\n",
"import sys\n",
"from os import environ\n",
"import ipywidgets as widgets\n",
"import uuid\n",
"\n",
"jobid = \"\"\n",
"environment = jinja2.Environment()\n",
"\n",
"\n",
"def create_jobtypelist():\n",
" jobtypelist = {}\n",
" \n",
" gpu_template = \"\"\"\n",
" {\n",
" \"taskGroups\":[\n",
" {\n",
" \"taskSpec\":{\n",
" \"runnables\": [\n",
" \n",
"\t\t {\n",
" \"script\": {\n",
" \"text\": \"nvidia-smi >> /mnt/share/log_${JOB_ID}_${BATCH_TASK_INDEX}.txt\"\n",
" },\n",
" \"environment\": {\n",
" \"variables\": {\n",
" \"JOB_ID\": \"{{ job_id }} \"\n",
" }\n",
" }\n",
" } \n",
" ],\n",
" \"volumes\": [\n",
" {\n",
" \"gcs\": {\n",
" \"remotePath\": \"{{ bucket_name }}\"\n",
" },\n",
" \"mountPath\": \"/mnt/share\"\n",
" }\n",
" ]\n",
" },\n",
" \"taskCount\": 3\n",
" }\n",
" ],\n",
" \"allocationPolicy\": {\n",
" \"location\": {\n",
" \"allowed_locations\": [ \"regions/{{ region_name }}\" ]\n",
" },\n",
" \"instances\": [\n",
"\t\t {\n",
"\t\t\t\"instanceTemplate\": \"gpu-schedmd\"\n",
"\t\t }\n",
" ]\n",
" },\n",
" \"labels\": {\n",
" \"department\": \"wdc\",\n",
" \"env\": \"testing\"\n",
" },\n",
" \"logsPolicy\": {\n",
" \"destination\": \"CLOUD_LOGGING\"\n",
" }\n",
" }\n",
"\n",
" \"\"\"\n",
"\n",
"\n",
" jobtypelist[\"gpu\"] = {\"desc\": \"Create a2_a100 VM \", \"code\" : gpu_template}\n",
" return(jobtypelist)\n",
"\n",
"jobtypelist = create_jobtypelist()\n",
"\n",
"class MyButton(widgets.Button):\n",
" code = \"\"\n",
"\n",
"def run_job(b):\n",
" jobid = \"job-\" + uuid.uuid4().hex[:8]\n",
" template = environment.from_string(b.code)\n",
" render = template.render(bucket_name=bucket.value, \n",
" job_id=jobid, \n",
" region_name=region.value)\n",
" f = open(jobid + \".json\", \"w\")\n",
" f.write(render)\n",
" f.close()\n",
" out2.clear_output()\n",
" with out2:\n",
" print(\"\\n\\nCreating job\")\n",
" print(\"\\n\\nCreating job file: \", jobid + \"txt\")\n",
" jobstring = !echo gcloud beta batch jobs submit {jobid} --location {region.value} --config {jobid}.json\n",
" print(\"Executing command: \" + jobstring[0])\n",
" # print(render)\n",
" \n",
" ## This is the command line that is being executed.\n",
" !gcloud beta batch jobs submit {jobid} --location {region.value} --config {jobid}.json\n",
"\n",
"\n",
"out = widgets.Output(layout={'border': '1px solid black'})\n",
"display(out)\n",
"out2 = widgets.Output(layout={'border': '1px solid black'})\n",
"display(out2)\n",
"with out:\n",
" for jobtype in jobtypelist:\n",
" button = MyButton(description=jobtypelist[jobtype][\"desc\"], button_style='success')\n",
" button.code = jobtypelist[jobtype] = jobtypelist[jobtype][\"code\"]\n",
" display(button)\n",
" button.on_click(run_job)\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "2Z2OaZBOKixa"
},
"outputs": [],
"source": [
"#@title You can see the list of jobs in the GCP Console {display-mode: \"form\"}\n",
"\n",
"from IPython.core.display import display, Markdown\n",
"display(Markdown(\"[View Job list in the Console](https://console.cloud.google.com/batch/jobs?project=\" + project.value + \")\"))"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "upUu3BdvQaMx"
},
"source": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "u3Rg37jD_ywm"
},
"source": [
"## See output of the \"GCS\" job\n",
"If you ran the \"GCS\" job, the output was sent to the bucket you created above. \n",
" \n",
" The next cell provides a sample of how to see the output of all the outputs stored in the GCS bucket.\n",
"\n",
"> The output should simply be the output of the \"nvdia-smi\" command."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "WnBYcBww_IZE"
},
"outputs": [],
"source": [
"!gsutil ls gs://$bucket.value/*.txt\n",
"!gsutil cat gs://$bucket.value/*.txt"
]
}
],
"metadata": {
"colab": {
"collapsed_sections": [],
"provenance": []
},
"kernelspec": {
"display_name": "Python 3.10.5 64-bit",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.5"
},
"orig_nbformat": 4,
"vscode": {
"interpreter": {
"hash": "916dbcbb3f70747c44a77c7bcd40155683ae19c65e1c03b4aa3499c5328201f1"
}
}
},
"nbformat": 4,
"nbformat_minor": 0
}