In [None]:
# Copyright 2022 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

Starting on September 15, 2024, you can only customize classification, entity extraction, and sentiment analysis models by moving to Vertex AI Gemini prompts and tuning. Training or updating models for Vertex AI AutoML for Text classification, entity extraction, and sentiment analysis objectives will no longer be available. You can continue using existing Vertex AI AutoML Text objectives until June 15, 2025. For more information about how Gemini offers enhanced user experience through improved prompting capabilities, see 
[Introduction to tuning](https://cloud.google.com/vertex-ai/generative-ai/docs/models/tune-gemini-overview)

# Vertex AI Pipelines: AutoML text classification pipelines using google-cloud-pipeline-components

<table align="left">
  <td>
    <a href="https://colab.research.google.com/github/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/official/model_evaluation/automl_text_classification_model_evaluation.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/colab-logo-32px.png" alt="Colab logo"> Run in Colab
    </a>
  </td>
  <td>
    <a href="https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/official/model_evaluation/automl_text_classification_model_evaluation.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/github-logo-32px.png" alt="GitHub logo">
      View on GitHub
    </a>
  </td>
  <td>
    <a href="https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://raw.githubusercontent.com/GoogleCloudPlatform/vertex-ai-samples/main/notebooks/official/model_evaluation/automl_text_classification_model_evaluation.ipynb">
      <img src="https://lh3.googleusercontent.com/UiNooY4LUgW_oTvpsNhPpQzsstV5W8F7rYgxgGBD85cWJoLmrOzhVs_ksK_vgx40SHs7jCqkTkCk=e14-rj-sc0xffffff-h130-w32" alt="Vertex AI logo">
      Open in Vertex AI Workbench
     </a>
  </td>
</table>
<br/><br/><br/>

**_NOTE_**: This notebook has been tested in the following environment:

* Python version = 3.9

## Overview

This notebook demonstrates how to use the Vertex AI classification model evaluation component to evaluate an AutoML text classification model. Model evaluation helps you determine your model performance based on the evaluation metrics and improve the model if necessary. 

Learn more about [Vertex AI Model Evaluation](https://cloud.google.com/vertex-ai/docs/evaluation/introduction) and [Classification on text data](https://cloud.google.com/vertex-ai/docs/training-overview#classification_for_text).

### Objective

In this tutorial, you learn how to use `Vertex AI Pipelines` and `Google Cloud Pipeline Components` to build and evaluate an `AutoML` text classification model.


This tutorial uses the following Google Cloud ML services and resources:

- Vertex AI `Datasets`
- Vertex AI `Training`(AutoML Text Classification) 
- Vertex AI `Model Registry`
- Vertex AI `Pipelines`
- Vertex AI `Batch Predictions`

The steps performed include:

- Create a Vertex AI `Dataset`.
- Train an Automl Text Classification model on the `Dataset` resource.
- Import the trained `AutoML model resource` into the pipeline.
- Run a `Batch Prediction` job.
- Evaluate the AutoML model using the `Classification Evaluation Component`.
- Import the evaluation metrics to the AutoML model resource.

### Dataset

The dataset used for this tutorial is the [Happy Moments dataset](https://www.kaggle.com/ritresearch/happydb) from [Kaggle Datasets](https://www.kaggle.com/ritresearch/happydb). The version of the dataset you use in this tutorial is stored in a public Cloud Storage bucket.

### Costs

This tutorial uses billable components of Google Cloud:

* Vertex AI
* Cloud Storage

Learn about [Vertex AI
pricing](https://cloud.google.com/vertex-ai/pricing) and [Cloud Storage
pricing](https://cloud.google.com/storage/pricing), and use the [Pricing
Calculator](https://cloud.google.com/products/calculator/)
to generate a cost estimate based on your projected usage.

## Installation

Install the packages required for executing this notebook.

In [None]:
! pip3 install --upgrade google-cloud-aiplatform \
                         google-cloud-storage \
                         kfp google-cloud-pipeline-components==1.0.25 \
                         ndjson --quiet

### Colab only: Uncomment the following cell to restart the kernel

In [None]:
# Automatically restart kernel after installs so that your environment can access the new packages
# import IPython

# app = IPython.Application.instance()
# app.kernel.do_shutdown(True)

## Before you begin

### Set up your Google Cloud project

**The following steps are required, regardless of your notebook environment.**

1. [Select or create a Google Cloud project](https://console.cloud.google.com/cloud-resource-manager). When you first create an account, you get a $300 free credit towards your compute/storage costs.

2. [Make sure that billing is enabled for your project](https://cloud.google.com/billing/docs/how-to/modify-project).

3. [Enable the Vertex AI and Dataflow APIs](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com,dataflow.googleapis.com).
4. If you are running this notebook locally, you need to install the [Cloud SDK](https://cloud.google.com/sdk).

#### Set your project ID

**If you don't know your project ID**, try the following:
* Run `gcloud config list`.
* Run `gcloud projects list`.
* See the support page: [Locate the project ID](https://support.google.com/googleapi/answer/7014113)

In [None]:
PROJECT_ID = "[your-project-id]"  # @param {type:"string"}

# Set the project id
! gcloud config set project {PROJECT_ID}

#### Region

You can also change the `REGION` variable used by Vertex AI. Learn more about [Vertex AI regions](https://cloud.google.com/vertex-ai/docs/general/locations).

In [None]:
REGION = "us-central1"  # @param {type: "string"}

### Authenticate your Google Cloud account

Depending on your Jupyter environment, you may have to manually authenticate. Follow the relevant instructions below.

**1. Vertex AI Workbench**
* Do nothing as you are already authenticated.

**2. Local JupyterLab instance, uncomment and run:**

In [None]:
# ! gcloud auth login

**3. Colab, uncomment and run:**

In [None]:
# from google.colab import auth
# auth.authenticate_user()

**4. Service account or other**
* See how to grant Cloud Storage permissions to your service account at https://cloud.google.com/storage/docs/gsutil/commands/iam#ch-examples.

### Create a Cloud Storage bucket

Create a storage bucket to store intermediate artifacts such as datasets.

In [None]:
BUCKET_URI = f"gs://your-bucket-name-{PROJECT_ID}-unique"  # @param {type:"string"}

**Only if your bucket doesn't already exist**: Run the following cell to create your Cloud Storage bucket.

In [None]:
! gsutil mb -l {REGION} -p {PROJECT_ID} {BUCKET_URI}

#### Service Account

**If you don't know your service account**, try to get your service account using `gcloud` command by executing the second cell below.

In [None]:
SERVICE_ACCOUNT = "[your-service-account]"  # @param {type:"string"}

In [None]:
import sys

IS_COLAB = "google.colab" in sys.modules

if (
    SERVICE_ACCOUNT == ""
    or SERVICE_ACCOUNT is None
    or SERVICE_ACCOUNT == "[your-service-account]"
):
    # Get your service account from gcloud
    if not IS_COLAB:
        shell_output = !gcloud auth list 2>/dev/null
        SERVICE_ACCOUNT = shell_output[2].replace("*", "").strip()

    if IS_COLAB:
        shell_output = ! gcloud projects describe  $PROJECT_ID
        project_number = shell_output[-1].split(":")[1].strip().replace("'", "")
        SERVICE_ACCOUNT = f"{project_number}-compute@developer.gserviceaccount.com"

    print("Service Account:", SERVICE_ACCOUNT)

#### Set service account access for Vertex AI Pipelines

Run the following commands to grant your service account the access to read and write pipeline artifacts in the bucket that you created in the previous step. You only need to run this step once per service account.

In [None]:
! gsutil iam ch serviceAccount:{SERVICE_ACCOUNT}:roles/storage.objectCreator $BUCKET_URI

! gsutil iam ch serviceAccount:{SERVICE_ACCOUNT}:roles/storage.objectViewer $BUCKET_URI

### Import libraries

In [None]:
import json

import kfp
import matplotlib.pyplot as plt
import ndjson
from google.cloud import aiplatform, aiplatform_v1, storage
from kfp.v2 import compiler  # noqa: F811

### Initialize Vertex AI SDK for Python

Initialize the Vertex AI SDK for Python for your project and corresponding bucket.

In [None]:
aiplatform.init(project=PROJECT_ID, staging_bucket=BUCKET_URI)

## Train and deploy AutoML Text Classification model 

In this notebook, you execute all the steps from dataset building to model deployment and evaluation using Vertex AI pipelines. 

As the first step, you build the training and deployment pipeline. The pipeline includes the following tasks:
1. Create a Vertex AI Text Dataset.
2. Trains an Automl Text Classification model.
3. Creates a Vertex AI Endpoint.
4. Deploys the AutoML model to the Vertex AI Endpoint.

The pipeline uses pre-built components for each of the tasks from the `Google Cloud Pipeline Components` package.

Learn more about the [Google Cloud Pipeline Components](https://cloud.google.com/vertex-ai/docs/pipelines/components-introduction).

Set the parameters required for the training and deployment pipeline.

In [None]:
# Specify the GCS path for the text dataset
IMPORT_FILE = "gs://cloud-ml-data/NL-classification/happiness.csv"

# provide dataset display name
DATASET_DISPLAY_NAME = "happydb-dataset-unique"

# provide training job display name
TRAINING_JOB_DISPLAY_NAME = "happydb-automl-job-unique"

# provide model display name
MODEL_DISPLAY_NAME = "happydb-automl-model-unique"

# provide endpoint display name
ENDPOINT_DISPLAY_NAME = "happydb-classification-endpoint-unique"

# provide pipeline job display name
TRAINING_PIPELINE_DISPLAY_NAME = "happydb-training-pipeline-unique"

# provide Cloud Storage root folder path for saving the artifacts
PIPELINE_ROOT = f"{BUCKET_URI}/pipeline_root/happydb"

# provide path to store the compiled pipeline package
TRAINING_PIPELINE_PATH = "automl_text_classification_pipeline.json"

Define the Vertex AI pipeline. 

Learn more about building [Vertex AI pipelines](https://cloud.google.com/vertex-ai/docs/pipelines/build-pipeline).

In [None]:
@kfp.dsl.pipeline(name=TRAINING_PIPELINE_DISPLAY_NAME)
def pipeline(
    import_file: str,
    dataset_display_name: str,
    training_job_display_name: str,
    model_display_name: str,
    endpoint_display_name: str,
    project: str = PROJECT_ID,
    region: str = REGION,
    training_split: float = 0.4,
    validation_split: float = 0.3,
    test_split: float = 0.3,
):
    from google_cloud_pipeline_components import aiplatform as gcc_aip
    from google_cloud_pipeline_components.v1.endpoint import (EndpointCreateOp,
                                                              ModelDeployOp)

    # component to create the dataset
    dataset_create_task = gcc_aip.TextDatasetCreateOp(
        display_name=dataset_display_name,
        gcs_source=import_file,
        import_schema_uri=aiplatform.schema.dataset.ioformat.text.multi_label_classification,
        project=project,
    )
    # component to run AutoML training job
    training_run_task = gcc_aip.AutoMLTextTrainingJobRunOp(
        dataset=dataset_create_task.outputs["dataset"],
        display_name=training_job_display_name,
        prediction_type="classification",
        multi_label=True,
        training_fraction_split=training_split,
        validation_fraction_split=validation_split,
        test_fraction_split=test_split,
        model_display_name=model_display_name,
        project=project,
    )
    # component to create an endpoint
    endpoint_op = EndpointCreateOp(
        project=project,
        location=region,
        display_name=endpoint_display_name,
    )
    # component to deploy the model the endpoint
    _ = ModelDeployOp(
        model=training_run_task.outputs["model"],
        endpoint=endpoint_op.outputs["endpoint"],
        automatic_resources_min_replica_count=1,
        automatic_resources_max_replica_count=1,
    )

### Compile the pipeline

Next, compile the pipeline to a json package.

In [None]:
compiler.Compiler().compile(
    pipeline_func=pipeline,
    package_path=TRAINING_PIPELINE_PATH,
)

### Run the training and deployment pipeline

Now, create a Vertex AI pipeline job to run the pipeline. Note that during the pipeline definition, training, validation and test split are by default specified as 0.4, 0.3 and 0.3 respectively. Change it as needed.

For creating the pipeline job, you specify the following parameters:

- `display_name`: The name of the pipeline, this shows up in the Google Cloud console.
- `template_path`: The path of PipelineJob or PipelineSpec JSON or YAML file. It can be a local path, a Google Cloud Storage URI or an Artifact Registry URI.
- `parameter_values`: The mapping from runtime parameter names to its values that control the pipeline run.
- `enable_caching`: Set as True to turn on caching for the run.

Learn more about [PipelineJob](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.PipelineJob).

In [None]:
# set the values to be passed as input parameters to the pipeline
training_parameters = {
    "import_file": IMPORT_FILE,
    "dataset_display_name": DATASET_DISPLAY_NAME,
    "training_job_display_name": TRAINING_JOB_DISPLAY_NAME,
    "model_display_name": MODEL_DISPLAY_NAME,
    "endpoint_display_name": ENDPOINT_DISPLAY_NAME,
}

# create a pipeline job
training_job = aiplatform.PipelineJob(
    display_name=TRAINING_PIPELINE_DISPLAY_NAME,
    template_path=TRAINING_PIPELINE_PATH,
    pipeline_root=PIPELINE_ROOT,
    parameter_values=training_parameters,
    enable_caching=False,
)

# run the job
training_job.run(sync=True)

Click on the generated link to see your run in the Cloud Console.

In the UI, many of the pipeline DAG nodes will expand or collapse when you click on them. Here is a partially-expanded view of the DAG (click image to see larger version).

<a href="https://storage.googleapis.com/amy-jo/images/mp/automl_text_classif.png" target="_blank"><img src="https://storage.googleapis.com/amy-jo/images/mp/automl_text_classif.png" width="40%"/></a>

Fetch the created model by filtering the display name.

In [None]:
models = aiplatform.Model.list(
    filter=f"display_name={MODEL_DISPLAY_NAME}", order_by="create_time"
)
if models:
    model = models[0]
print(model)

Fetch the availble evaluation metrics for the model.

In [None]:
# Get evaluations
model_evaluations = model.list_model_evaluations()

model_evaluation = list(model_evaluations)[0]

# Print the evaluation metrics
for evaluation in model_evaluations:
    evaluation = evaluation.to_dict()
    print("Model's evaluation metrics from Training:\n")
    metrics = evaluation["metrics"]
    for metric in metrics.keys():
        print(f"metric: {metric}, value: {metrics[metric]}\n")

### Run batch predictions on the model

For evaluating the model, a batch of test data along with ground truth is required. Before evaluating the model, you generate a batch prediction job for the model to see if the model is able to generate the predictions in batches. In Vertex AI, you need not deploy a model in order to run batch prediction jobs on it. 

To create a batch prediction job, you must first format your input instances(in JSONL format) and store them in a Google Cloud Storage bucket. You also need to provide a Google Cloud Storage bucket to save the results.

#### Format input instances
In this step, the instances are formatted in JSONL. Each line in the JSONL document needs to be formatted as below.

```
{ "content": "gs://sourcebucket/datasets/texts/source_text.txt", "mimeType": "text/plain"}
```

The `content` field in the JSON structure must be a Google Cloud Storage URI to a document that contains the text input for the model.

Learn more about [batch predictions](https://cloud.google.com/ai-platform-unified/docs/predictions/batch-predictions#text).

In [None]:
# define a set of test samples
instances = [
    {
        "Text": "I went on a successful date with someone I felt sympathy and connection with.",
        "Labels": "affection",
    },
    {
        "Text": "I was happy when my son got 90% marks in his examination",
        "Labels": "affection",
    },
    {"Text": "I went to the gym this morning and did yoga.", "Labels": "exercise"},
    {
        "Text": "We had a serious talk with some friends of ours who have been flaky lately. They understood and we had a good evening hanging out.",
        "Labels": "bonding",
    },
    {
        "Text": "I went with grandchildren to butterfly display at Crohn Conservatory",
        "Labels": "affection",
    },
    {"Text": "I meditated last night.", "Labels": "leisure"},
    {
        "Text": "I made a new recipe for peasant bread, and it came out spectacular!",
        "Labels": "achievement",
    },
    {
        "Text": "I got gift from my elder brother which was really surprising me",
        "Labels": "affection",
    },
    {"Text": "YESTERDAY MY MOMS BIRTHDAY SO I ENJOYED", "Labels": "enjoy_the_moment"},
    {
        "Text": "Watching cupcake wars with my three teen children",
        "Labels": "affection",
    },
    {"Text": "I came in 3rd place in my Call of Duty video game.", "Labels": "leisure"},
    {
        "Text": "I completed my 5 miles run without break. It makes me feel strong.",
        "Labels": "exercise",
    },
    {"Text": "went to movies with my friends it was fun", "Labels": "bonding"},
    {
        "Text": "I was shorting Gold and made $200 from the trade.",
        "Labels": "achievement",
    },
    {
        "Text": "Hearing Songs It can be nearly impossible to go from angry to happy, so you're just looking for the thought that eases you out of your angry feeling and moves you in the direction of happiness. It may take a while, but as long as you're headed in a more positive direction youall be doing yourself a world of good.",
        "Labels": "enjoy_the_moment",
    },
    {
        "Text": "My son performed very well for a test preparation.",
        "Labels": "affection",
    },
    {"Text": "I helped my neighbour to fix their car damages.", "Labels": "bonding"},
    {
        "Text": "Managed to get the final trophy in a game I was playing.",
        "Labels": "achievement",
    },
    {
        "Text": "A hot kiss with my girl friend last night made my day",
        "Labels": "bonding",
    },
    {
        "Text": "My new BCAAs came in the mail. Yay! Strawberry Lemonade flavored aminos make my heart happy.",
        "Labels": "affection",
    },
    {"Text": "Got A in class.", "Labels": "achievement"},
    {
        "Text": "My sister called me from abroad this morning after some long years. Such a happy occassion for all family members.",
        "Labels": "affection",
    },
    {
        "Text": "The cake I made today came out amazing. It tasted amazing as well.",
        "Labels": "achievement",
    },
    {
        "Text": "There are two types of people in the world: those who choose to be happy, and those who choose to be unhappy. Contrary to popular belief, happiness doesn't come from fame, fortune, other people, or material possessions",
        "Labels": "enjoy_the_moment",
    },
    {
        "Text": "My grandmother start to walk from the bed after a long time.",
        "Labels": "affection",
    },
    {"Text": "i was able to hit a top spin serve in tennis", "Labels": "achievement"},
    {
        "Text": "I napped with my husband on the bed this afternoon and it was sweet to cuddle so close to him.",
        "Labels": "affection",
    },
    {
        "Text": "My co-woker started playing a Carley Rae Jepsen song from her phone while ringing out customers.",
        "Labels": "leisure",
    },
    {
        "Text": "My son woke me up to a fantastic breakfast of eggs, his special hamburger patty and pancakes.",
        "Labels": "affection",
    },
    {
        "Text": "After a long time my brother gave a suprise visit to my house yesterday.",
        "Labels": "affection",
    },
]

# define the input file name
BATCH_JOB_INPUT_FILE = "happiness-batch-prediction-input.jsonl"

#### Save the data to Cloud Storage bucket

Create a new Cloud Storage blob, upload individual instances as text files to the bucket, and then create the JSONL file with URIs for the instances.

In [None]:
# Instantiate the Storage client and create the new bucket
storage_client = storage.Client()
bucket = storage_client.bucket(BUCKET_URI[5:])
# Iterate over the prediction instances and create a new text file
input_file_data = []
for count, instance in enumerate(instances):
    instance_name = f"input_{count}.txt"
    instance_file_uri = f"{BUCKET_URI}/batch-prediction-input/{instance_name}"
    # Add the data to store in the JSONL input file.
    tmp_data = {"content": instance_file_uri, "mimeType": "text/plain"}
    input_file_data.append(tmp_data)

    # Create the new instance file
    blob = bucket.blob("batch-prediction-input/" + instance_name)
    blob.upload_from_string(instance["Text"])


input_str = "\n".join([str(d) for d in input_file_data])
file_blob = bucket.blob(f"{BATCH_JOB_INPUT_FILE}")
file_blob.upload_from_string(input_str)

#### Create and run the batch prediction job

In [None]:
# provide display name for the batch prediction job
BATCH_JOB_DISPLAY_NAME = "happydb-batch-prediction-job-unique"

# create the batch prediction job
batch_prediction_job = model.batch_predict(
    job_display_name=BATCH_JOB_DISPLAY_NAME,
    gcs_source=f"{BUCKET_URI}/{BATCH_JOB_INPUT_FILE}",
    gcs_destination_prefix=f"{BUCKET_URI}/output",
    sync=True,
)
batch_prediction_job_name = batch_prediction_job.resource_name

In [None]:
# fetch the job details
batch_job = aiplatform.jobs.BatchPredictionJob(batch_prediction_job_name)
print(f"Batch prediction job state: {str(batch_job.state)}")

#### Get predictions from the batch prediction job

Load the batch predictions that are saved to the specified output Cloud Storage path.

In [None]:
bp_iter_outputs = batch_job.iter_outputs()

prediction_results = list()
for blob in bp_iter_outputs:
    if blob.name.split("/")[-1].startswith("prediction"):
        prediction_results.append(blob.name)

for prediction_result in prediction_results:
    gfile_name = f"gs://{bp_iter_outputs.bucket.name}/{prediction_result}".replace(
        BUCKET_URI + "/", ""
    )
    data = bucket.get_blob(gfile_name).download_as_string()
    data = ndjson.loads(data)
    print(data)

## Create input file with ground truth for evaluation 

Evaluation component needs ground truth to be part of the input file against which the predicted results can be compared and evaluated.

In [None]:
# set the file name for saving the input with ground truth data
BATCH_JOB_INPUT_EVAL_FILE = "happydb-input-with-groundtruth.jsonl"

In [None]:
# Instantiate the Storage client and create the new bucket
storage_client = storage.Client()
bucket = storage_client.bucket(BUCKET_URI[5:])
# Iterate over the prediction instances, creating a new TXT file
# for each.
input_file_data = []
for count, instance in enumerate(instances):
    instance_name = f"input_{count}.txt"
    instance_file_uri = (
        f"{BUCKET_URI}/evaluation-batch-prediction-input/{instance_name}"
    )
    # Add the data to store in the JSONL input file.
    # ground_truth variable in each json instance is needed to act as ground_truth for the evaluation task
    tmp_data = {
        "content": instance_file_uri,
        "mimeType": "text/plain",
        "ground_truth": instance["Labels"],
    }
    input_file_data.append(tmp_data)

    # Create the new instance file
    blob = bucket.blob("evaluation-batch-prediction-input/" + instance_name)
    blob.upload_from_string(instance["Text"])

input_str = json.dumps(input_file_data[0])
for i in input_file_data[1:]:
    input_str = input_str + "\n" + json.dumps(i)
file_blob = bucket.blob(f"{BATCH_JOB_INPUT_EVAL_FILE}")
file_blob.upload_from_string(input_str)

## Create a pipeline for model evaluation

In this section, you run a batch prediction job and evaluate the results from a Vertex AI pipeline by calling `evaluate` function. Learn more about [evaluate function](https://github.com/googleapis/python-aiplatform/blob/main/google/cloud/aiplatform/models.py#L5127)..

Set the parameters for the evaluation pipeline.

### Define parameters to run the evaluate function

Specify the required parameters to run `evaluate` function. 

The following is the instruction of `evaluate` function paramters:

- `prediction_type`: The problem type being addressed by this evaluation run. 'classification' and 'regression' are the currently supported problem types.
- `target_field_name`: Name of the column to be used as the target for classification.
- `gcs_source_uris`: List of the Cloud Storage bucket uris of input instances for batch prediction.
- `class_labels`: The list of all class names for the target field in the dataset.
- `generate_feature_attributions`: Optional. Whether the model evaluation job should generate feature attributions. Defaults to False if not specified.

In [None]:
DATA_SOURCE = f"{BUCKET_URI}/{BATCH_JOB_INPUT_EVAL_FILE}"
CLASS_LABELS = [
    "affection",
    "exercise",
    "bonding",
    "leisure",
    "achievement",
    "enjoy_the_moment",
    "nature",
]

evaluation_job = model.evaluate(
    prediction_type="classification",
    target_field_name="ground_truth",
    gcs_source_uris=[DATA_SOURCE],
    class_labels=CLASS_LABELS,
    generate_feature_attributions=False,
)

print("Waiting model evaluation is in process")
evaluation_job.wait()

## Check the evaluation result

To see if the pipeline ran successfully, click on the generated link above to see the pipeline graph in the Cloud Console.

In the displayed pipeline, the nodes expand or collapse when you click on them. An example of a partially-expanded view of the pipeline can be seen below (click image to see larger version).

<img src="images/automl-text-classification-evaluation-image.PNG">

### Get the model evaluation result

After the evalution pipeline is finished, run the below cell to print the evaluation metrics.

In [None]:
model_evaluation = evaluation_job.get_model_evaluation()

In [None]:
# Iterate over the pipeline tasks
for (
    task
) in model_evaluation._backing_pipeline_job._gca_resource.job_detail.task_details:
    # Obtain the artifacts from the evaluation task
    if (
        ("model-evaluation" in task.task_name)
        and ("model-evaluation-import" not in task.task_name)
        and (
            task.state == aiplatform_v1.types.PipelineTaskDetail.State.SUCCEEDED
            or task.state == aiplatform_v1.types.PipelineTaskDetail.State.SKIPPED
        )
    ):
        evaluation_metrics = task.outputs.get("evaluation_metrics").artifacts[
            0
        ]  # ['artifacts']
        evaluation_metrics_gcs_uri = evaluation_metrics.uri

print(evaluation_metrics)
print(evaluation_metrics_gcs_uri)

### Visualize the metrics

Visualize the available metrics like `auRoc` and `logLoss` using a bar-chart.

In [None]:
metrics = []
values = []
for i in evaluation_metrics.metadata.items():
    metrics.append(i[0])
    values.append(i[1])
plt.figure(figsize=(15, 5))
plt.bar(x=metrics, height=values)
plt.title("Evaluation Metrics")
plt.ylabel("Value")
plt.show()

### Check model evaluations in model registry

To ensure that the model evaluations are successfully imported into the model resource, list the evaluations and print them.

In [None]:
# get the model evaluation configuration from the pipeline job
for (
    task
) in model_evaluation._backing_pipeline_job._gca_resource.job_detail.task_details:
    if "model-evaluation-import" in task.task_name:
        val = json.loads(task.execution.metadata.get("output:gcp_resources"))
        model_evaluation = val["resources"][0]

In [None]:
# Print the evaluation metrics
model_evaluation_id = model_evaluation["resourceUri"].split("/")[-1]
print(model_evaluation_id)

# get evaluations from the model
evaluation = model.get_model_evaluation()
evaluation = evaluation.to_dict()
print("Model's evaluation metrics:\n")
metrics = evaluation["metrics"]
for metric in metrics.keys():
    print(f"metric: {metric}, value: {metrics[metric]}\n")

## Cleaning up

To clean up all Google Cloud resources used in this project, you can [delete the Google Cloud
project](https://cloud.google.com/resource-manager/docs/creating-managing-projects#shutting_down_projects) you used for the tutorial.

Otherwise, you can delete the individual resources you created in this tutorial:

- Evaluation job
- Batch prediction job
- Training and deployment job
- Endpoint
- Model
- Dataset
- Cloud Storage bucket (Set `delete_bucket` to True for deletion)

In [None]:
delete_bucket = False

# # delete the evaluation job
evaluation_job.delete()

# # delete the batch prediction job
batch_prediction_job.delete()

# delete the training job
training_job.delete()

# list the endpoints filtering the display name
endpoints = aiplatform.Endpoint.list(
    filter=f"display_name={ENDPOINT_DISPLAY_NAME}", order_by="create_time"
)

# delete the endpoint
if endpoints:
    endpoint = endpoints[0]
    endpoint.undeploy_all()
    endpoint.delete()
    print("Deleted endpoint:", endpoint)

# list the models filtering the display name
models = aiplatform.Model.list(
    filter=f"display_name={MODEL_DISPLAY_NAME}", order_by="create_time"
)
# delete the model
if models:
    model = models[0]
    model.delete()
    print("Deleted model:", model)

# list the datasets filtering the display name
datasets = aiplatform.TextDataset.list(
    filter=f"display_name={DATASET_DISPLAY_NAME}", order_by="create_time"
)
# delete the dataset
if datasets:
    dataset = datasets[0]
    dataset.delete()
    print("Deleted dataset:", dataset)

# delete the Cloud Storage bucket
if delete_bucket and os.getenv("IS_TESTING"):
    ! gsutil rm -r $BUCKET_URI