example_notebooks/intelligent_defaults_and_logging.ipynb (582 lines of code) (raw):
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Intelligent Defaults and Logging Configuration in SageMakerCore"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---\n",
"## Introductions\n",
"\n",
"In this notebook, we will walkthrough the setup and usage of intelligent defaults in the SageMakerCore SDK. Additionally, this notebook contains a section with the steps required for configuring logging levels to assist in the debugging of issues that arise while using the SDK.\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Intelligent Defaults\n",
"\n",
"Intelligent Defaults is a feature provided by the SageMakerCore SDK to assist users in defining default values to be auto populated into the AWS API Request parameters. For example, if a user/admin wants all of their AWS Resources to use a specific VPC Config during creation this can be defined in the Intelligent Defaults Configs. Intelligent Defaults supports:\n",
"1. GlobalDefaults - default values applied across SageMaker API calls\n",
"2. Resource Specific Defaults - defaults applied only when creating a specific resource\n",
"\n",
"An Example of the strucuture of the Intelligent Defaults Config is below:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"```json\n",
"{\n",
" \"SchemaVesion\": \"1.0\",\n",
" \"SageMaker\": {\n",
" \"PythonSDK\": {\n",
" \"Resources\": {\n",
" \"GlobalDefaults\": {\n",
" \"vpc_config\": {\n",
" \"security_group_ids\": [\n",
" \"sg-xxxxxxxxxxxxxxxxx\" // Replace with security group id\n",
" ],\n",
" \"subnets\": [\n",
" \"subnet-xxxxxxxxxxxxxxxxx\", // Replace with subnet id\n",
" \"subnet-xxxxxxxxxxxxxxxxx\" // Replace with subnet id\n",
" ]\n",
" }\n",
" // ...\n",
" },\n",
" \"TrainingJob\": {\n",
" \"role_arn\": \"arn:aws:xxxxxxxxxxx:role/xxxxx\", // Replace with role arn\n",
" \"output_data_config\": {\n",
" \"s3_output_path\": \"s3://xxxxxxxxxxx\", // Replace with S3 URI\n",
" },\n",
" // ...\n",
" }\n",
" }\n",
" }\n",
" }\n",
"}\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Logging Levels\n",
"\n",
"To assist in debugging issues originating within the SDK, SageMakerCore provides a simple utility method - `configure_logging()`\n",
"\n",
"To set the logging level users have 2 options:\n",
"1. Pass a string parameter to utility method with log level they desire - `configure_logging(\"DEBUG\")`\n",
"2. Set the `LOG_LEVEL=INFO` environment variable and call `configure_logging()` without a parameter\n",
"\n",
"\n",
"In a later section in this notebook, we will walk through an example of how these options would look like in practice for a user."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Pre-Requisites"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Install Latest SageMakerCore\n",
"All SageMakerCore beta distributions will be released to a private s3 bucket. After being allowlisted, run the cells below to install the latest version of SageMakerCore from `s3://sagemaker-core-beta-artifacts/sagemaker_core-latest.tar.gz`\n",
"\n",
"Ensure you are using a kernel with python version >=3.8"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Uninstall previous version of sagemaker_core and restart kernel\n",
"!pip uninstall sagemaker-core -y"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Install the latest version of sagemaker_core\n",
"\n",
"!pip install sagemaker-core --upgrade"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Check the version of sagemaker_core\n",
"!pip show -v sagemaker-core"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Install Additional Packages"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Install additionall packages\n",
"\n",
"!pip install -U scikit-learn pandas boto3"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Setup\n",
"\n",
"Let's start by specifying:\n",
"- AWS region.\n",
"- The IAM role arn used to give learning and hosting access to your data. Ensure your enviornment has AWS Credentials configured.\n",
"- The S3 bucket that you want to use for storing training and model data."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from sagemaker_core.helper.session_helper import Session, get_execution_role\n",
"from rich import print\n",
"\n",
"# Get region, role, bucket\n",
"\n",
"sagemaker_session = Session()\n",
"region = sagemaker_session.boto_region_name\n",
"role = get_execution_role()\n",
"bucket = sagemaker_session.default_bucket()\n",
"print(role)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Load and Prepare Dataset\n",
"For this example, we will be using the IRIS data set from `sklearn.datasets` to train our XGBoost container."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.datasets import load_iris\n",
"from sklearn.model_selection import train_test_split\n",
"\n",
"import pandas as pd\n",
"\n",
"# Get IRIS Data\n",
"\n",
"iris = load_iris()\n",
"iris_df = pd.DataFrame(iris.data, columns=iris.feature_names)\n",
"iris_df['target'] = iris.target"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"\n",
"# Prepare Data\n",
"\n",
"os.makedirs('./data', exist_ok=True)\n",
"\n",
"iris_df = iris_df[['target'] + [col for col in iris_df.columns if col != 'target']]\n",
"\n",
"train_data, test_data = train_test_split(iris_df, test_size=0.2, random_state=42)\n",
"\n",
"train_data.to_csv('./data/train.csv', index=False, header=False)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Upload Data to S3\n",
"In this step, we will upload the train and test data to the S3 bucket configured earlier using `sagemaker_session.default_bucket()`"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Upload Data\n",
"\n",
"prefix = \"DEMO-scikit-iris\"\n",
"TRAIN_DATA = \"train.csv\"\n",
"DATA_DIRECTORY = \"data\"\n",
"\n",
"train_input = sagemaker_session.upload_data(\n",
" DATA_DIRECTORY, bucket=bucket, key_prefix=\"{}/{}\".format(prefix, DATA_DIRECTORY)\n",
")\n",
"\n",
"s3_input_path = \"s3://{}/{}/data/{}\".format(bucket, prefix, TRAIN_DATA)\n",
"s3_output_path = \"s3://{}/{}/output\".format(bucket, prefix)\n",
"\n",
"print(s3_input_path)\n",
"print(s3_output_path)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Fetch the XGBoost Image URI\n",
"In this step, we will fetch the XGBoost Image URI we will use as an input parameter when creating an AWS TrainingJob"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Image name is hardcoded here\n",
"# Image name can be programatically got by using sagemaker package and calling image_uris.retrieve\n",
"# Since that is a high level abstraction that has multiple dependencies, the image URIs functionalities will live in sagemaker (V2)\n",
"\n",
"image = \"433757028032.dkr.ecr.us-west-2.amazonaws.com/xgboost:latest\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Intelligent Defaults"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Create Intelligent Defaults JSON\n",
"\n",
"In order for SageMakerCore to pick up the Intelligent Defaults Configs to populate API calls, we first must create the json config file and set the `SAGEMAKER_CORE_ADMIN_CONFIG_OVERRIDE` enviornment variable.\n",
"\n",
"Below we will create the config file at `data/defaults.json` and assign this path to the `SAGEMAKER_CORE_ADMIN_CONFIG_OVERRIDE` enviornment variable."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"import json\n",
"\n",
"DEFAULTS_CONTENT = {\n",
" \"SchemaVesion\": \"1.0\",\n",
" \"SageMaker\": {\n",
" \"PythonSDK\": {\n",
" \"Resources\": {\n",
" \"GlobalDefaults\": {\n",
" \"vpc_config\": {\n",
" \"security_group_ids\": [\n",
" \"sg-xxxxxxxxxxxxxxxxx\" # Replace with security group id\n",
" ],\n",
" \"subnets\": [\n",
" \"subnet-xxxxxxxxxxxxxxxxx\", # Replace with subnet id\n",
" \"subnet-xxxxxxxxxxxxxxxxx\" # Replace with subnet id\n",
" ]\n",
" }\n",
" },\n",
" \"TrainingJob\": {\n",
" \"role_arn\": role,\n",
" \"output_data_config\": {\n",
" \"s3_output_path\": s3_output_path\n",
" },\n",
" \n",
" }\n",
" }\n",
" }\n",
" }\n",
"}\n",
"\n",
"path_to_defaults = os.path.join(DATA_DIRECTORY, \"defaults.json\")\n",
"with open(os.path.join(DATA_DIRECTORY, \"defaults.json\"), \"w\") as f:\n",
" json.dump(DEFAULTS_CONTENT, f, indent=4) "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"# Setting path of Config file in environment variable \n",
"os.environ['SAGEMAKER_CORE_ADMIN_CONFIG_OVERRIDE'] = path_to_defaults"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Using GlobalDefaults\n",
"In the below example, a `Cluster` resource will be created using the `vpc_config` defined under the `SageMaker.PythonSDK.Resources.GlobalDefaults`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import time\n",
"from sagemaker_core.resources import Cluster\n",
"from sagemaker_core.shapes import ClusterInstanceGroupSpecification, ClusterLifeCycleConfig\n",
" \n",
"cluster_name_v3 = 'xgboost-cluster-' + time.strftime(\"%Y-%m-%d-%H-%M-%S\", time.gmtime())\n",
"\n",
"# Use vpc_config from Intelligent Defaults JSON config file under the SageMaker.PythonSDK.Resources.GlobalDefaults key\n",
"cluster = Cluster.create(\n",
" cluster_name=cluster_name_v3,\n",
" instance_groups=[\n",
" ClusterInstanceGroupSpecification(\n",
" instance_count=1, \n",
" instance_group_name=\"instance-group-11\",\n",
" instance_type=\"ml.m5.4xlarge\",\n",
" life_cycle_config=ClusterLifeCycleConfig(source_s3_uri=s3_input_path, on_create=\"dothis\"),\n",
" execution_role=role\n",
" )\n",
" ]\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"cluster.wait_for_status(\"InService\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Using Resource Defaults\n",
"In the below example, a `TrainingJob` resource will be created using the `role` and `output_data_config` defined under the `SageMaker.Python.Resources.TrainingJob` key. \n",
"\n",
"Note: Because `TrainingJob` also excepts a `vpc_config` parameter, the `vpc_config` parameter will be populated from the `GlobalDefaults`"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import time\n",
"from sagemaker_core.resources import TrainingJob\n",
"from sagemaker_core.shapes import AlgorithmSpecification, Channel, DataSource, S3DataSource, ResourceConfig, StoppingCondition\n",
"\n",
"job_name_v3 = 'xgboost-iris-' + time.strftime(\"%Y-%m-%d-%H-%M-%S\", time.gmtime())\n",
"\n",
"# Use role and output_data_config from Intelligent Defaults JSON config file under the SageMaker.PythonSDK.Resources.TrainingJob key\n",
"# Use vpc_config from Intelligent Defaults JSON config file under the SageMaker.PythonSDK.Resources.GlobalDefaults key\n",
"\n",
"training_job = TrainingJob.create(\n",
" training_job_name=job_name_v3,\n",
" hyper_parameters={\n",
" 'objective': 'multi:softmax',\n",
" 'num_class': '3',\n",
" 'num_round': '10',\n",
" 'eval_metric': 'merror'\n",
" },\n",
" algorithm_specification=AlgorithmSpecification(\n",
" training_image=image,\n",
" training_input_mode='File'\n",
" ),\n",
" input_data_config=[\n",
" Channel(\n",
" channel_name='train',\n",
" content_type='csv',\n",
" compression_type='None',\n",
" record_wrapper_type='None',\n",
" data_source=DataSource(\n",
" s3_data_source=S3DataSource(\n",
" s3_data_type='S3Prefix',\n",
" s3_uri=s3_input_path,\n",
" s3_data_distribution_type='FullyReplicated'\n",
" )\n",
" )\n",
" )\n",
" ],\n",
" resource_config=ResourceConfig(\n",
" instance_type='ml.m4.xlarge',\n",
" instance_count=1,\n",
" volume_size_in_gb=30\n",
" ),\n",
" stopping_condition=StoppingCondition(\n",
" max_runtime_in_seconds=600\n",
" )\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Configure Logging Levels\n",
"\n",
"Below are 2 examples of how a SageMakerCore user could configure the logging level of the SDK to assist with debugging.\n",
"\n",
"To set the logging level users have 2 options:\n",
"1. Pass a string parameter to utility method with log level they desire - `configure_logging(\"DEBUG\")`\n",
"2. Set the `LOG_LEVEL=INFO` environment variable and call `configure_logging()` without a parameter"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Configure Logging with Parameter"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Setting log_level to DEBUG using configure_logging with string parameter \n",
"from sagemaker_core.main.utils import configure_logging\n",
"\n",
"configure_logging('DEBUG')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Get TrainingJob with DEBUG log_level\n",
"from sagemaker_core.resources import TrainingJob\n",
"\n",
"training_job = TrainingJob.get(job_name_v3)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Configure Logging with Enviornment Variable"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Setting log_level to INFO using env variable\n",
"!export LOG_LEVEL=INFO\n",
"\n",
"configure_logging()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# List TrainingJobs with INFO log_level\n",
"training_job = TrainingJob.get(job_name_v3)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Delete All SageMaker Resources\n",
"The following code block will call the delete() method for any SageMaker Core Resources created during the execution of this notebook which were assigned to local or global variables. If you created any additional deleteable resources without assigning the returning object to a unique variable, you will need to delete the resource manually by doing something like:\n",
"\n",
"```python\n",
"resource = Resource.get(\"resource-name\")\n",
"resource.delete()\n",
"```\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Delete any sagemaker core resource objects created in this notebook\n",
"def delete_all_sagemaker_resources():\n",
" all_objects = list(locals().values()) + list(globals().values())\n",
" deletable_objects = [obj for obj in all_objects if hasattr(obj, 'delete') and obj.__class__.__module__ == 'sagemaker_core.main.resources']\n",
" \n",
" for obj in deletable_objects:\n",
" obj.delete()\n",
" \n",
"delete_all_sagemaker_resources()"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "py3.10.14",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.14"
}
},
"nbformat": 4,
"nbformat_minor": 2
}