data-labelling/sagemaker_ground_truth

{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Automate multi-modality, parallel data labeling workflows with Amazon SageMaker Ground Truth and AWS Step Functions\n" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "\n", "This notebook's CI test result for us-west-2 is as follows. CI test results in other regions can be found at the end of the notebook. \n", "\n", "![This us-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-west-2/ground_truth_labeling_jobs|multi_modal_parallel_sagemaker_labeling_workflows_with_step_functions|sagemaker_ground_truth_workflows.ipynb)\n", "\n", "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "Data labeling often requires a single data object to include multiple types of annotations, or multi-type, such as 2D boxes (bounding boxes), lines, and segmentation masks, all on a single image. Additionally, to create high-quality machine learning (ML) models using labeled data, you need a way to monitor the quality of the labels. You can do this by creating a workflow in which labeled data is audited and adjusted as needed. This notebook introduces a solution to address both of these labeling challenges using an automotive dataset, and you can extend this solution for use with any type of dataset.\n", "\n", "This notebook walks through an example situation where we generate multiple types of annotations for an automotive scene. Specifically we\u2019ll run a total of four labeling jobs per input video clip: (1) an initial labeling of vehicles, (2) initial labeling of lanes, and then (3) an adjustment job per initial job.\n", "\n", "This notebook only works if you have deployed the corresponding AWS CloudFormation template. You can deploy the solution in `us-east-1` by clicking [here](https://console.aws.amazon.com/cloudformation/home?region=us-east-1#/stacks/create/template?stackName=blog&templateURL=https://aws-ml-blog.s3.amazonaws.com/artifacts/automate-multi-modality-parallel-data-labeling-workflows-sagemaker-gt/main-packaged.yml). To deploy to other regions or to customize the deployment, see `README.md` provided with this notebook.\n", "\n", "Recommended Kernel: conda_python3\n", "\n", "While following along with this blog post, we recommend that you leave most of the cells unmodified. However, the notebook will indicate where you can modify variables to create the resources needed for a custom labeling job.\n", "\n", "Let's start by importing required libraries and initializing session and other variables used in this notebook. By default, the notebook uses the default Amazon S3 bucket in the same AWS Region you use to run this notebook. If you want to use a different S3 bucket, make sure it is in the same AWS Region you use to complete this tutorial, and specify the bucket name for `bucket`." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [], "source": [ "!pip3 install requests_aws_sign" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%pylab inline\n", "from enum import Enum\n", "from pprint import pprint\n", "import uuid\n", "import urllib\n", "import os\n", "import json\n", "from glob import glob\n", "\n", "import requests\n", "import boto3\n", "from PIL import Image\n", "import sagemaker as sm\n", "\n", "import requests_aws_sign\n", "from requests_aws_sign import AWSV4Sign" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Prerequisites\n", "\n", "You will create some of the resources you need to launch workflows using SageMaker Ground Truth in this notebook. You must create and populate the following resources before executing this notebook:\n", "\n", "* API Gateway Endpoint. An API Gateway endpoint is setup as part of the workflows solution and obtain the API identifier `rest_api_id`,Region `region` in the [API Gateway Console](https://docs.aws.amazon.com/apigateway/latest/developerguide/how-to-call-api.html)\n", "and assign `AWS_REST_API_ID`, `AWS_ACCOUNT_ID` and `AWS_REGION` variables." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [], "source": [ "AWS_REST_API_ID = \"<<ADD REST API Gateway ID HERE>>\"\n", "AWS_ACCOUNT_ID = \"<<ADD AWS Account ID HERE>>\"\n", "AWS_REGION = region = boto3.session.Session().region_name" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [], "source": [ "# Make sure api gateway id is populated\n", "assert AWS_REST_API_ID != \"<<ADD REST API Gateway ID HERE>>\"\n", "\n", "# Make sure AWS account number is populated\n", "assert AWS_ACCOUNT_ID != \"<<ADD AWS Account ID HERE>>\"\n", "\n", "# Make sure AWS Region is populated\n", "assert AWS_REGION != \"<<ADD AWS Region HERE>>\"" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [], "source": [ "BASE_URL = f\"https://{AWS_REST_API_ID}.execute-api.{AWS_REGION}.amazonaws.com/prod\"\n", "API_URL_POST_BATCH_CREATE = f\"{BASE_URL}/batch/create\"\n", "API_URL_GET_BATCH_SHOW = f\"{BASE_URL}/batch/show\"\n", "API_URL_POST_BATCH_METADATA_POST = f\"{BASE_URL}/batch-metadata/post\"\n", "API_URL_GET_WORKFORCE_SHOW = f\"{BASE_URL}/workforce/show\"\n", "\n", "S3_INPUT_BUCKET = f\"smgt-workflows-{AWS_ACCOUNT_ID}-{AWS_REGION}-batch-input\"\n", "\n", "credentials = boto3.session.Session().get_credentials()\n", "\n", "sess = requests.Session()\n", "\n", "sess.auth = AWSV4Sign(credentials, AWS_REGION, \"execute-api\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* The IAM execution role you used to create this notebook instance must have the following permissions:\n", " * If you do not require granular permissions for your use case, you can attach [AmazonSageMakerFullAccess](https://console.aws.amazon.com/iam/home?#/policies/arn:aws:iam::aws:policy/AmazonSageMakerFullAccess) to your IAM user or role. If you are running this example in a SageMaker notebook instance, this is the IAM execution role used to create your notebook instance. If you need granular permissions, see [Assign IAM Permissions to Use Ground Truth](https://docs.aws.amazon.com/sagemaker/latest/dg/sms-security-permission.html#sms-security-permissions-get-started) for granular policy to use Ground Truth.\n", " * AWS managed policy [AmazonSageMakerGroundTruthExecution](https://console.aws.amazon.com/iam/home#policies/arn:aws:iam::aws:policy/AmazonSageMakerGroundTruthExecution). Run the following code-block to see your IAM execution role name. This [GIF](images/add-policy-loop.gif) demonstrates how to attach this policy to an IAM role in the IAM console. You can also find instructions in the IAM User Guide: [Adding and removing IAM identity permissions](https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies_manage-attach-detach.html#add-policies-console).\n", " * When you create your role, please make sure that you attach [AmazonAPIGatewayInvokeFullAccess](https://console.aws.amazon.com/iam/home#policies/arn:aws:iam::aws:policy/AmazonAPIGatewayInvokeFullAccess) to your role.\n", " * When you create your role, you specify Amazon S3 permissions. Make sure that your IAM role has access to the S3 bucket that this solution uses in this example. If you do not require granular permissions, you can attach [AmazonS3FullAccess](https://console.aws.amazon.com/iam/home#policies/arn:aws:iam::aws:policy/AmazonS3FullAccess) to your role." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [], "source": [ "role = sm.get_execution_role()\n", "role_name = role.split(\"/\")[-1]\n", "print(\n", " \"IMPORTANT: Make sure this execution role has the AWS Managed policy AmazonGroundTruthExecution attached.\"\n", ")\n", "print(\"********************************************************************************\")\n", "print(\"The IAM execution role name:\", role_name)\n", "print(\"The IAM execution role ARN:\", role)\n", "print(\"********************************************************************************\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Download Data\n", "\n", "We are going to use a dataset from the Multi Object Tracking Challenge, a commonly used benchmark for multi object tracking. We are going to download the data. Depending on your connection speed, this can take between 5 and 10 minutes. Then, we will unzip it and upload it to `s3_input_bucket` in Amazon S3.\n", "\n", "Disclosure regarding the Multiple Object Tracking Benchmark:\n", "\n", "Multiple Object Tracking Benchmark is created by Patrick Dendorfer, Hamid Rezatofighi, Anton Milan, Javen Shi, Daniel Cremers, Ian Reid, Stefan Roth, Konrad Schindler, and Laura Leal-Taixe. We have not modified the images or the accompanying annotations. You can obtain the images and the annotations [here](https://motchallenge.net/data/MOT20/). The images and annotations are licensed by the authors under [Creative Commons Attribution-NonCommercial-ShareAlike 3.0 License](https://creativecommons.org/licenses/by-nc-sa/3.0/). The following paper describes Multiple Object Tracking Benchmark in depth: from the data collection and annotation to detailed statistics about the data and evaluation of models trained on it.\n", "\n", "MOT17: A benchmark for multi object tracking in crowded scenes.\n", "Patrick Dendorfer, Hamid Rezatofighi, Anton Milan, Javen Shi, Daniel Cremers, Ian Reid, Stefan Roth, Konrad Schindler, Laura Leal-Taixe [arXiv:2003.09003](https://arxiv.org/abs/2003.09003)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Download dataset\n", "!wget https://motchallenge.net/data/MOT17.zip -O /tmp/MOT17.zip" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Unzip dataset\n", "!unzip -q /tmp/MOT17.zip -d MOT17\n", "!rm /tmp/MOT17.zip" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!ls MOT17/MOT17/train" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Copy Dataset to S3" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# send our data to s3 this will take a couple minutes\n", "!aws s3 sync MOT17/MOT17/train/MOT17-13-SDP s3://{S3_INPUT_BUCKET}/MOT17/train/MOT17-13-SDP" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## View Images and Labels\n", "The scene is a street setting with a large number of cars and pedestrians. Grab image paths and plot the first image." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "img_paths = glob(\"MOT17/MOT17/train/MOT17-13-SDP/img1/*.jpg\")\n", "img_paths.sort()\n", "\n", "imgs = []\n", "for imgp in img_paths:\n", " img = Image.open(imgp)\n", " imgs.append(img)\n", "\n", "imgs" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Generate manifests\n", "\n", "SageMaker Ground Truth operates using [manifests](https://docs.aws.amazon.com/sagemaker/latest/dg/sms-input-data-input-manifest.html). When using a modality like image classification, a single image corresponds to a single entry in a manifest and a given manifest will directly contain paths for all of the images to be labeled. For videos, because we have multiple frames per video and we can have [multiple videos in a single manifest](https://docs.aws.amazon.com/sagemaker/latest/dg/sms-video-manual-data-setup.html), it is organized instead by using a JSON sequence file for each video that contains the paths to our frames in S3. This allows a single manifest to contain multiple videos for a single job.\n", "\n", "In this case our image files are all split out, so we can just grab filepaths. If your data is in the form of video files, you can use the Ground Truth console to split videos into video frames. To learn more, see [Automated Video Frame Input Data Setup](https://docs.aws.amazon.com/sagemaker/latest/dg/sms-video-automated-data-setup.html). Other tools like [ffmpeg](https://ffmpeg.org/) can also be used for splitting video files into individual image frames. The below block is simply storing our filepaths in a dictionary." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "vid = \"MOT17/MOT17/train/MOT17-13-SDP\"\n", "\n", "# we assume we have folders with the same name as the mp4 file in the same root folder\n", "files = glob(f\"{vid}/img1/*jpg\")\n", "files.sort()\n", "files = files[:300:2] # skipping every other frame\n", "fileset = []\n", "for fil in files:\n", " fileset.append(\"/\".join(fil.split(\"/\")[5:]))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "With your image paths, you can iterate through frames and create a list of entries for each in your sequence file." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Generate sequences\n", "os.makedirs(\"tracking_manifests\", exist_ok=True)\n", "\n", "frames = []\n", "for i, v in enumerate(fileset):\n", " frame = {\n", " \"frame-no\": i + 1,\n", " \"frame\": f\"{v.split('/')[-1]}\",\n", " }\n", " frames.append(frame)\n", "seq = {\n", " \"version\": \"2020-07-01\",\n", " \"seq-no\": 0,\n", " \"prefix\": f\"s3://{S3_INPUT_BUCKET}/{'/'.join(vid.split('/')[1:])}/img1/\",\n", " \"number-of-frames\": len(frames),\n", " \"frames\": frames,\n", "}\n", "\n", "print(seq)\n", "\n", "# save sequences\n", "with open(f\"tracking_manifests/{vid.split('/')[-1]}_seq.json\", \"w\") as f:\n", " json.dump(seq, f)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "With your sequence file, you can create your manifest file. To create a new job with no existing labels, you can simply pass in a path to your sequence file. Since you already have labels and instead want to launch an adjustment job, point to the location of those labels in Amazon S3 and provide metadata for those labels in your manifest." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# create manifest\n", "source_ref = f\"s3://{S3_INPUT_BUCKET}/tracking_manifests/{vid.split('/')[-1]}_seq.json\"\n", "annot_labels = f\"s3://{S3_INPUT_BUCKET}/tracking_manifests/SeqLabel.json\"\n", "\n", "manifest = {\n", " \"source-ref\": source_ref,\n", "}\n", "\n", "# save videos as individual jobs\n", "manifest_file_name = f\"tracking_manifests/{vid.split('/')[-1]}.manifest\"\n", "with open(manifest_file_name, \"w\") as f:\n", " json.dump(manifest, f)\n", "\n", "manifest_s3_uri = f\"s3://{S3_INPUT_BUCKET}/{manifest_file_name}\"\n", "\n", "print(\"Example manifest: \", manifest)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Generate Label Category Configuration Files\n", "The following cell creates label category configuration files for the labeling jobs. These files are used to identify the label categories labelers use to annotate objects, and label category attributes which can be used to provide additional information about objects or scenes. For more information about these files, see [Create a Labeling Category Configuration File with Label Category and Frame Attributes](https://docs.aws.amazon.com/sagemaker/latest/dg/sms-label-cat-config-attributes.html)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "moving_attribute = {\n", " \"name\": \"Moving\",\n", " \"type\": \"string\",\n", " \"enum\": [\"Stationary\", \"Dynamic\"],\n", "}\n", "vehicle_type_attribute = {\n", " \"name\": \"Vehicle_Type\",\n", " \"type\": \"string\",\n", " \"enum\": [\"Car\", \"Van\", \"Bus\", \"SUV\"],\n", "}\n", "audit_attribute = {\n", " \"name\": \"audit\",\n", " \"type\": \"string\",\n", " \"enum\": [\"Pass\", \"Fail\"],\n", "}\n", "\n", "label_vehicle = {\n", " \"document-version\": \"2020-08-15\",\n", " \"frameAttributes\": [\n", " {\n", " \"name\": \"Number_Of_Vehicles\",\n", " \"description\": \"How many vehicles to you see in the scene?\",\n", " \"type\": \"number\",\n", " }\n", " ],\n", " \"labels\": [\n", " {\n", " \"label\": \"Vehicle\",\n", " \"attributes\": [moving_attribute, vehicle_type_attribute],\n", " },\n", " ],\n", " \"instructions\": {\n", " \"shortInstruction\": \"Please label vehicles.\",\n", " \"fullInstruction\": \"Please label vehicles.\",\n", " },\n", " \"annotationType\": \"BoundingBox\",\n", "}\n", "filename = \"tracking_manifests/vehicle_label_category.json\"\n", "with open(filename, \"w\") as f:\n", " json.dump(label_vehicle, f)\n", "vehicle_label_category_s3_uri = f\"s3://{S3_INPUT_BUCKET}/{filename}\"\n", "\n", "label_vehicle_audit = {\n", " \"document-version\": \"2020-08-15\",\n", " \"frameAttributes\": [\n", " {\n", " \"name\": \"Frame_Quality\",\n", " \"description\": \"Describe the quality of frame\",\n", " \"type\": \"string\",\n", " \"enum\": [\"Pass\", \"Fail\"],\n", " },\n", " ],\n", " \"labels\": [\n", " {\n", " \"label\": \"Vehicle\",\n", " \"attributes\": [audit_attribute],\n", " },\n", " ],\n", " \"instructions\": {\n", " \"shortInstruction\": \"Please label vehicles.\",\n", " \"fullInstruction\": \"Please label vehicles.\",\n", " },\n", " \"annotationType\": \"BoundingBox\",\n", "}\n", "filename = \"tracking_manifests/vehicle_label_category_audit.json\"\n", "with open(filename, \"w\") as f:\n", " json.dump(label_vehicle_audit, f)\n", "vehicle_audit_label_category_s3_uri = f\"s3://{S3_INPUT_BUCKET}/{filename}\"\n", "\n", "label_lane = {\n", " \"document-version\": \"2020-08-15\",\n", " \"labels\": [\n", " {\n", " \"label\": \"Lane\",\n", " },\n", " ],\n", " \"instructions\": {\n", " \"shortInstruction\": \"Please label lanes.\",\n", " \"fullInstruction\": \"Please label lanes.\",\n", " },\n", " \"annotationType\": \"Polyline\",\n", "}\n", "filename = \"tracking_manifests/lane_label_category.json\"\n", "with open(filename, \"w\") as f:\n", " json.dump(label_lane, f)\n", "lane_label_category_s3_uri = f\"s3://{S3_INPUT_BUCKET}/{filename}\"\n", "\n", "label_lane_audit = {\n", " \"document-version\": \"2020-08-15\",\n", " \"labels\": [\n", " {\n", " \"label\": \"Lane\",\n", " \"attributes\": [audit_attribute],\n", " },\n", " ],\n", " \"instructions\": {\n", " \"shortInstruction\": \"Please label lanes.\",\n", " \"fullInstruction\": \"Please label lanes.\",\n", " },\n", " \"annotationType\": \"Polyline\",\n", "}\n", "filename = \"tracking_manifests/lane_label_category_audit.json\"\n", "with open(filename, \"w\") as f:\n", " json.dump(label_lane_audit, f)\n", "lane_audit_label_category_s3_uri = f\"s3://{S3_INPUT_BUCKET}/{filename}\"" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Send data to S3\n", "!aws s3 cp --recursive tracking_manifests s3://{S3_INPUT_BUCKET}/tracking_manifests" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Batch Creation Demo\n", "\n", "Now we'll send the input manifest files that we just created through a custom Ground Truth pipeline, orchestrated by batch jobs." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "batch_id = f\"nb-track-{str(uuid.uuid4())[:8]}\"\n", "\n", "\n", "# Perform image classification, bounding box, and polylines annotations.\n", "batch_create_response = sess.post(\n", " API_URL_POST_BATCH_CREATE,\n", " json={\n", " \"batchId\": batch_id,\n", " \"labelingJobs\": [\n", " {\n", " \"jobName\": f\"{batch_id}-vehicle\",\n", " \"jobType\": \"BATCH\",\n", " \"jobModality\": \"VideoObjectTracking\",\n", " \"labelAttributeName\": \"vehicle-ref\",\n", " \"labelCategoryConfigS3Uri\": vehicle_label_category_s3_uri,\n", " \"inputConfig\": {\n", " \"inputManifestS3Uri\": manifest_s3_uri,\n", " },\n", " \"jobLevel\": 1,\n", " },\n", " {\n", " \"jobName\": f\"{batch_id}-lane\",\n", " \"jobType\": \"BATCH\",\n", " \"jobModality\": \"VideoObjectTracking\",\n", " \"labelAttributeName\": \"lane-ref\",\n", " \"labelCategoryConfigS3Uri\": lane_label_category_s3_uri,\n", " \"inputConfig\": {\n", " \"inputManifestS3Uri\": manifest_s3_uri,\n", " },\n", " \"jobLevel\": 1,\n", " },\n", " {\n", " \"jobName\": f\"{batch_id}-vehicle-audit\",\n", " \"jobType\": \"BATCH\",\n", " \"jobModality\": \"VideoObjectTrackingAudit\",\n", " \"labelAttributeName\": \"vehicle-audit-ref\",\n", " \"labelCategoryConfigS3Uri\": vehicle_audit_label_category_s3_uri,\n", " \"inputConfig\": {\n", " \"chainFromJobName\": f\"{batch_id}-vehicle\",\n", " },\n", " \"jobLevel\": 2,\n", " },\n", " {\n", " \"jobName\": f\"{batch_id}-lane-audit\",\n", " \"jobType\": \"BATCH\",\n", " \"jobModality\": \"VideoObjectTrackingAudit\",\n", " \"labelAttributeName\": \"lane-audit-ref\",\n", " \"labelCategoryConfigS3Uri\": lane_audit_label_category_s3_uri,\n", " \"inputConfig\": {\"chainFromJobName\": f\"{batch_id}-lane\"},\n", " \"jobLevel\": 2,\n", " },\n", " ],\n", " },\n", ")\n", "\n", "\n", "print(batch_create_response.text)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Sign-in To Worker Portal\n", "* If you have added yourself to the work teams created by the Cloud Formation Template, use the following URL to login to the worker portal.\n", "* If you have not added yourself to the work team, use the instructions in Add or Remove Workers to add yourself to the following work teams: smgt-workflows-first-level and smgt-workflows-second-level ." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "sagemaker = boto3.client(\"sagemaker\")\n", "workteam = sagemaker.describe_workteam(WorkteamName=\"smgt-workflows-second-level\")\n", "worker_portal_url = workteam[\"Workteam\"][\"SubDomain\"]\n", "print(f\"Sign-in by going here: {worker_portal_url}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Complete Tasks In Worker Portal\n", "* Sign-in, wait for a task to appear in the worker portal\n", "* Complete the task by hitting submit in the bottom right corner\n", "* Ensure you wait long enough to see both frames appear in the worker console and submit both\n", "* Once you're done a notification should come over the job output SNS indicating that the batch has completed first level review and is waiting for batch downsampling metadata before continuing on" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Batch Show Demo\n", "* Now our batch is being processed and will show up in the worker portal within a few minutes\n", "* We can see a list of all the batches being processed using the batch/show API without arguments" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "batch_show_response = sess.get(API_URL_GET_BATCH_SHOW)\n", "print(f\"Your batch: {batch_id} should be present in the list below\")\n", "batch_show_response.json()[\"COMPLETE\"]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Batch Detailed Show Demo" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Choose a batch of interest\n", "batch_id = \"add-batch-name\"" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Now let's see details about the batch\n", "batch_show_response = sess.get(API_URL_GET_BATCH_SHOW, params={\"batchId\": batch_id})\n", "batch_show_response.json()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Workforce Show Demo\n", "* Make sure an email you have access to is in the Workforce" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "sess.get(API_URL_GET_WORKFORCE_SHOW).json()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Batch Metadata Post\n", "* Here we provide the downsample rate for the batch that we want to send to the QA Manager step, let's specify a downsample rate of 50% to only send one frame to the manager queue." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "batch_metadata_post = sess.post(\n", " API_URL_POST_BATCH_METADATA_POST,\n", " json={\n", " \"downSamplingRate\": 50,\n", " \"batchId\": batch_id,\n", " },\n", ")\n", "batch_metadata_post.text" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Complete Second Level Reviewer Tasks In Worker Portal\n", "* After completing batch metadata post, the frames are downsampled and sent to the next level queue" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Now we can watch batch status until completion.\n", "batch_show_response = sess.get(API_URL_GET_BATCH_SHOW, params={\"batchId\": batch_id})\n", "batch_show_response.json()[\"status\"]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Once complete, let's grab relevant manifests.\n", "batch_show_response = sess.get(API_URL_GET_BATCH_SHOW, params={\"batchId\": batch_id})\n", "# first_level_output_manifest_url = batch_show_response.json()[\"firstLevel\"][\"outputS3PresignedUrl\"]\n", "second_level_output_manifest_url = batch_show_response.json()[\"secondLevel\"][\"jobLevels\"][0][\n", " \"jobOutputS3Url\"\n", "]\n", "# print(f\"First level output manifest url: {first_level_output_manifest_url}\")\n", "print(f\"Second level output manifest url: {second_level_output_manifest_url}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Conclusion\n", "* This notebook introduced how to take video frame data and trigger a workflow to run multiple Ground Truth labeling jobs, generating two different types of annotations (bounding boxes and polylines). You also learned how you can extend the pipeline to audit and verify the labeled dataset and how to retrieve the audited results. Lastly, you saw how to reference the current progress of batch jobs using the `BatchShow` API.\n", "\n", "Customize the notebook for your input datasets by adding additional jobs or audit steps, or by modifying the data modality of the jobs. Further customization could include, but is not limited, to:\n", "*\tAdding additional types of annotations such as semantic segmentation masks or keypoints\n", "*\tAdding automated quality assurance and filtering to the Step Functions workflow to only send low-quality annotations to the next level of review\n", "*\tAdding third or fourth levels of quality review for additional, more specialized types of reviews\n", "\n", "After completing the demo, you can clean up all the resources using this section\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Clean Up\n", "* After completing the demo, you can clean up all the resources using this section\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Enter the top level cloudformation stack name i.e smgt-workflow-v1\n", "cloud_formation_stack_name = \"\"" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [], "source": [ "response = cfn_client.list_stack_resources(StackName=cloud_formation_stack_name)\n", "\n", "resource_summaries = response[\"StackResourceSummaries\"]\n", "\n", "for resource_summary in resource_summaries:\n", " logical_resource_id = resource_summary[\"LogicalResourceId\"]\n", " stack_name = resource_summary[\"PhysicalResourceId\"].split(\"/\")[1]\n", " nested_stack_resources = cfn_client.list_stack_resources(StackName=stack_name)[\n", " \"StackResourceSummaries\"\n", " ]\n", "\n", " for nested_stack_resource in nested_stack_resources:\n", " if nested_stack_resource[\"ResourceType\"] == \"AWS::S3::Bucket\":\n", " s3_bucket_name = nested_stack_resource[\"PhysicalResourceId\"]\n", " bucket = s3.Bucket(s3_bucket_name)\n", " for obj in bucket.objects.filter():\n", " s3.Object(bucket.name, obj.key).delete()\n", "\n", "\n", "response = cfn_client.delete_stack(StackName=cloud_formation_stack_name)\n", "\n", "print(\n", " \"Cloudformation stack deletion triggered succesfully. You can monitor the deletion progress from the cloudformation console.\"\n", ")" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Notebook CI Test Results\n", "\n", "This notebook was tested in multiple regions. The test results are as follows, except for us-west-2 which is shown at the top of the notebook.\n", "\n", "![This us-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-east-1/ground_truth_labeling_jobs|multi_modal_parallel_sagemaker_labeling_workflows_with_step_functions|sagemaker_ground_truth_workflows.ipynb)\n", "\n", "![This us-east-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-east-2/ground_truth_labeling_jobs|multi_modal_parallel_sagemaker_labeling_workflows_with_step_functions|sagemaker_ground_truth_workflows.ipynb)\n", "\n", "![This us-west-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-west-1/ground_truth_labeling_jobs|multi_modal_parallel_sagemaker_labeling_workflows_with_step_functions|sagemaker_ground_truth_workflows.ipynb)\n", "\n", "![This ca-central-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ca-central-1/ground_truth_labeling_jobs|multi_modal_parallel_sagemaker_labeling_workflows_with_step_functions|sagemaker_ground_truth_workflows.ipynb)\n", "\n", "![This sa-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/sa-east-1/ground_truth_labeling_jobs|multi_modal_parallel_sagemaker_labeling_workflows_with_step_functions|sagemaker_ground_truth_workflows.ipynb)\n", "\n", "![This eu-west-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-west-1/ground_truth_labeling_jobs|multi_modal_parallel_sagemaker_labeling_workflows_with_step_functions|sagemaker_ground_truth_workflows.ipynb)\n", "\n", "![This eu-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-west-2/ground_truth_labeling_jobs|multi_modal_parallel_sagemaker_labeling_workflows_with_step_functions|sagemaker_ground_truth_workflows.ipynb)\n", "\n", "![This eu-west-3 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-west-3/ground_truth_labeling_jobs|multi_modal_parallel_sagemaker_labeling_workflows_with_step_functions|sagemaker_ground_truth_workflows.ipynb)\n", "\n", "![This eu-central-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-central-1/ground_truth_labeling_jobs|multi_modal_parallel_sagemaker_labeling_workflows_with_step_functions|sagemaker_ground_truth_workflows.ipynb)\n", "\n", "![This eu-north-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-north-1/ground_truth_labeling_jobs|multi_modal_parallel_sagemaker_labeling_workflows_with_step_functions|sagemaker_ground_truth_workflows.ipynb)\n", "\n", "![This ap-southeast-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-southeast-1/ground_truth_labeling_jobs|multi_modal_parallel_sagemaker_labeling_workflows_with_step_functions|sagemaker_ground_truth_workflows.ipynb)\n", "\n", "![This ap-southeast-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-southeast-2/ground_truth_labeling_jobs|multi_modal_parallel_sagemaker_labeling_workflows_with_step_functions|sagemaker_ground_truth_workflows.ipynb)\n", "\n", "![This ap-northeast-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-northeast-1/ground_truth_labeling_jobs|multi_modal_parallel_sagemaker_labeling_workflows_with_step_functions|sagemaker_ground_truth_workflows.ipynb)\n", "\n", "![This ap-northeast-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-northeast-2/ground_truth_labeling_jobs|multi_modal_parallel_sagemaker_labeling_workflows_with_step_functions|sagemaker_ground_truth_workflows.ipynb)\n", "\n", "![This ap-south-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-south-1/ground_truth_labeling_jobs|multi_modal_parallel_sagemaker_labeling_workflows_with_step_functions|sagemaker_ground_truth_workflows.ipynb)\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.2" } }, "nbformat": 4, "nbformat_minor": 4 }

data-labelling/sagemaker_ground_truth_workflows.ipynb (861 lines of code) (raw):