# Labeling Adjustment Job Adaptation


---

This notebook's CI test result for us-west-2 is as follows. CI test results in other regions can be found at the end of the notebook. 

![This us-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-west-2/ground_truth_labeling_jobs|labeling_adjustment_job_adaptation|labeling_adjustment_job_adaptation.ipynb)

---


## Labeling Adjustment Jobs

This notebook is focusing on creation of Labeling Adjustment Jobs in SageMaker Ground Truth.

More details about the usage of label adjustment jobs as well as their creation can be found in official documentation: https://docs.aws.amazon.com/sagemaker/latest/dg/sms-verification-data.html

## Customer use case description

The example provided here is given for the bounding box labeling job with multiple object detection on image data. 

Once your customer has originally labeled their dataset for object detection and trained their first models it is possible that the business requirements and priorities might change. Therefore, individual original objects which we did want to detect originally with our model might become irrelevant for further detection and should be removed, while we would want to add additional labels to be detected in our dataset.

This will require the original dataset to be re-labeled with the labeling adjustment job displaying the already existing labels which we want to keep while removing the labels which are not anymore in target scope. The current SageMaker Ground Truth UI enables us to remove unwanted labels from the labeling team workforce UI before launching labeling adjustment job, which will also remove the labels visually from each individual image displayed to the labeling team.

However, jobs launched in this way will fail on every example image during consolidation stage where the labels have not been adjusted by the labeling team. To avoid this issue, we need to process the existing output manifest file and remove all the unwanted labels from the manifest file directly before launching labeling adjustment job.

The script provided in this notebook accepts as input a set of labels to remove from the output manifest file, and the name of the labeling job containing the output manifest file to adjust. It will generate the cleaned output manifest file with only target labels removed from the latest labeling job that can be used to safely launch label adjustment job.

### Function code

In [3]:
import boto3
import os
import botocore
import json

sagemaker_client = boto3.client("sagemaker")
s3_client = boto3.client("s3")

##### Helper function for communication with aws services (sagemaker and s3)
def get_labeling_job_output_manifest_file_location(
    labeling_job_name: str, sagemaker_client: botocore.client
) -> str:
    """
    # ref: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html#SageMaker.Client.describe_labeling_job
    """
    s3_output_location = sagemaker_client.describe_labeling_job(LabelingJobName=labeling_job_name)[
        "OutputConfig"
    ]["S3OutputPath"]
    manifest_file_relative_path_from_output_location = "{}/manifests/output/output.manifest".format(
        labeling_job_name
    )
    output_manifest_absolute_path = os.path.join(
        s3_output_location, manifest_file_relative_path_from_output_location
    )

    return output_manifest_absolute_path


def get_labeling_job_attribute_name(
    labeling_job_name: str, sagemaker_client: botocore.client
) -> str:
    """
    # ref: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html#SageMaker.Client.describe_labeling_job
    """
    labeling_job_attribute_name = sagemaker_client.describe_labeling_job(
        LabelingJobName=labeling_job_name
    )["LabelAttributeName"]
    return labeling_job_attribute_name


def split_bucket_key_from_s3_path(s3_full_path: str) -> (str, str):
    """
    full s3 path in format: s3://BUCKET/KEY
    """
    split_location = s3_full_path[5:].find("/") + 5
    return s3_full_path[5:split_location], s3_full_path[split_location + 1 :]


def read_s3_file(file_path: str, s3_client: botocore.client):
    bucket_name, key = split_bucket_key_from_s3_path(file_path)
    response = s3_client.get_object(Bucket=bucket_name, Key=key)["Body"].read()
    return response


def save_file_to_s3(file_path: str, object_to_save, s3_client: botocore.client):
    bucket_name, key = split_bucket_key_from_s3_path(file_path)
    s3_client.put_object(Body=object_to_save, Bucket=bucket_name, Key=key)


#### Helper functions to process the output.manifest file and cleanup unnecessary labels
def get_class_ids_for_removable_labels(label_annotations_metadata, labels_to_remove):
    class_ids_for_removable_labels = []
    for label in labels_to_remove:
        for key, value in label_annotations_metadata["class-map"].items():
            if value == label:
                class_ids_for_removable_labels.append(key)
                del label_annotations_metadata["class-map"][key]
                break
    return label_annotations_metadata, class_ids_for_removable_labels


def clean_up_annotations(label_annotations, class_ids_to_remove):
    removed_annotation_positions = []
    new_annotations_list = []
    for i in range(len(label_annotations["annotations"])):
        if str(label_annotations["annotations"][i]["class_id"]) in class_ids_to_remove:
            removed_annotation_positions.append(i)
        else:
            new_annotations_list.append(label_annotations["annotations"][i])
    label_annotations["annotations"] = new_annotations_list
    return label_annotations, removed_annotation_positions


def clean_up_metadata(label_annotations_metadata, removed_marked_labels_positions):
    for i in range(len(removed_marked_labels_positions)):
        del label_annotations_metadata["objects"][removed_marked_labels_positions[i] - i]

    label_annotations_metadata["adjustment-status"] = "adjusted"
    return label_annotations_metadata


#### Main function to remove all the unnecessary labels from manifest file
def remove_labels_from_output_manifest_file(
    remove_labels: list, marked_labels: list, labeling_job_attribute_name: str
):
    """
    remove_labels (list[str]): list of labels we want to remove from output.manifest file
    marked_labels (list[marked_labels_per_document]): content of output.manifest file marked labels per document
                format of marked_labels_per_document:
                    'all_keys': ['source-ref', 'category', 'category-metadata','chain-job-name','chain-job-name-metadata']
                    'category' (chain-job-name): ['image_size', 'annotations']
                    'category-metadata' (chain-job-name-metadata): ['objects', 'class-map', 'type', 'human-annotated', 'creation-date', 'job-name', 'adjustment-status']
    labeling_job_attribute_name (str): name of the labeling job attribute to find adequate annotations and annotations_meta data to be adjusted
    """
    nmb_keys_previous = len(list(marked_labels[0].keys()))
    total_nmb_of_removed_marked_labels = 0

    for label in marked_labels:
        nmb_keys = len(list(label.keys()))
        if nmb_keys_previous != nmb_keys:
            assert "Label does not have same amount of keys as others! This is unexpected behaviour since each should have same amount of jobs run..."

        latest_annotations_name = labeling_job_attribute_name
        latest_annotations_metadata_name = "{}-metadata".format(labeling_job_attribute_name)

        (
            label[latest_annotations_metadata_name],
            class_ids_to_remove,
        ) = get_class_ids_for_removable_labels(
            label[latest_annotations_metadata_name], remove_labels
        )

        # every labeling job class-map should have one label mentioned only once, but not every class needs to be present
        assert len(class_ids_to_remove) <= len(remove_labels)

        label[latest_annotations_name], removed_marked_labels_positions = clean_up_annotations(
            label[latest_annotations_name], class_ids_to_remove
        )
        label[latest_annotations_metadata_name] = clean_up_metadata(
            label[latest_annotations_metadata_name], removed_marked_labels_positions
        )
        total_nmb_of_removed_marked_labels += len(removed_marked_labels_positions)

    # this will log for you the total number of labels that have been removed from your manifest file
    # you can use it to check the expectations depending on how many labels of the target type to be removed,
    # was expected in input manifest file
    print("In total we have removed {} marked labels.".format(total_nmb_of_removed_marked_labels))
    return marked_labels


def main_function(
    labeling_job_name, remove_labels, path_to_save_results_to, sagemaker_client, s3_client
):
    output_file_path = get_labeling_job_output_manifest_file_location(
        labeling_job_name, sagemaker_client
    )
    output_file_content = read_s3_file(output_file_path, s3_client)

    labels = []
    for line in output_file_content.splitlines():
        labels.append(json.loads(line))

    cleaned_labels = remove_labels_from_output_manifest_file(
        remove_labels, labels, get_labeling_job_attribute_name(labeling_job_name, sagemaker_client)
    )
    # you can uncomment this to generate a smaller output file for testing
    # cleaned_labels = cleaned_labels[:15]

    # function to save back all the marked labels to cleaned up manifest file
    output_manifest_cleaned_content = ""
    for clean_label in cleaned_labels:
        output_manifest_cleaned_content = (
            output_manifest_cleaned_content + json.dumps(clean_label) + "\n"
        )

    save_file_to_s3(path_to_save_results_to, output_manifest_cleaned_content, s3_client)

### Parameter setup and script execution

In [4]:
#### program execution

# these are input parameters to adjust
labeling_job_name = "<name_of_the_labeling_job_you_want_to_run_label_adjustment_job_for>"
remove_labels = ["<label_1>", "<label_2>", "<label_3>", "<label_4>"]

path_to_save_results_to = (
    "s3://<bucket_dst>/<path_you_want_your_cleaned_output_manifest_file_saved_to>/output.manifest"
)

main_function(
    labeling_job_name, remove_labels, path_to_save_results_to, sagemaker_client, s3_client
)

In total we have removed 4 marked labels.


## Notebook CI Test Results

This notebook was tested in multiple regions. The test results are as follows, except for us-west-2 which is shown at the top of the notebook.

![This us-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-east-1/ground_truth_labeling_jobs|labeling_adjustment_job_adaptation|labeling_adjustment_job_adaptation.ipynb)

![This us-east-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-east-2/ground_truth_labeling_jobs|labeling_adjustment_job_adaptation|labeling_adjustment_job_adaptation.ipynb)

![This us-west-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-west-1/ground_truth_labeling_jobs|labeling_adjustment_job_adaptation|labeling_adjustment_job_adaptation.ipynb)

![This ca-central-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ca-central-1/ground_truth_labeling_jobs|labeling_adjustment_job_adaptation|labeling_adjustment_job_adaptation.ipynb)

![This sa-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/sa-east-1/ground_truth_labeling_jobs|labeling_adjustment_job_adaptation|labeling_adjustment_job_adaptation.ipynb)

![This eu-west-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-west-1/ground_truth_labeling_jobs|labeling_adjustment_job_adaptation|labeling_adjustment_job_adaptation.ipynb)

![This eu-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-west-2/ground_truth_labeling_jobs|labeling_adjustment_job_adaptation|labeling_adjustment_job_adaptation.ipynb)

![This eu-west-3 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-west-3/ground_truth_labeling_jobs|labeling_adjustment_job_adaptation|labeling_adjustment_job_adaptation.ipynb)

![This eu-central-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-central-1/ground_truth_labeling_jobs|labeling_adjustment_job_adaptation|labeling_adjustment_job_adaptation.ipynb)

![This eu-north-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-north-1/ground_truth_labeling_jobs|labeling_adjustment_job_adaptation|labeling_adjustment_job_adaptation.ipynb)

![This ap-southeast-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-southeast-1/ground_truth_labeling_jobs|labeling_adjustment_job_adaptation|labeling_adjustment_job_adaptation.ipynb)

![This ap-southeast-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-southeast-2/ground_truth_labeling_jobs|labeling_adjustment_job_adaptation|labeling_adjustment_job_adaptation.ipynb)

![This ap-northeast-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-northeast-1/ground_truth_labeling_jobs|labeling_adjustment_job_adaptation|labeling_adjustment_job_adaptation.ipynb)

![This ap-northeast-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-northeast-2/ground_truth_labeling_jobs|labeling_adjustment_job_adaptation|labeling_adjustment_job_adaptation.ipynb)

![This ap-south-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-south-1/ground_truth_labeling_jobs|labeling_adjustment_job_adaptation|labeling_adjustment_job_adaptation.ipynb)
