# Text Classification with Amazon SageMaker BlazingText and Hyperparameter Tuning


---

This notebook's CI test result for us-west-2 is as follows. CI test results in other regions can be found at the end of the notebook. 

![This us-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-west-2/hyperparameter_tuning|blazingtext_text_classification_20_newsgroups|hpo_blazingtext_text_classification_20_newsgroups.ipynb)

---


Automatic model tuning, also known as hyperparameter tuning, finds the best version of a model by running many jobs that test a range of hyperparameters on your dataset. You choose the tunable hyperparameters, a range of values for each, and an objective metric. You choose the objective metric from the metrics that the algorithm computes. Automatic model tuning searches the hyperparameters chosen to find the combination of values that result in the model that optimizes the objective metric.


## Introduction

Text Classification can be used to solve various use-cases like sentiment analysis, spam detection, hashtag prediction etc. This notebook demonstrates the use of SageMaker BlazingText to perform supervised binary/multi class with single or multi label text classification. BlazingText can train the model on more than a billion words in a couple of minutes using a multi-core CPU or a GPU, while achieving performance on par with the state-of-the-art deep learning text classification algorithms. BlazingText extends the `fastText` text classifier to leverage GPU acceleration using custom `CUDA` kernels.

## Install Python packages

In [None]:
import sys

!{sys.executable} -m pip install "scikit_learn==0.20.0"

## Setup

Let's start by specifying:

- The S3 bucket and prefix that you want to use for training and model data. This should be within the same region as the Notebook Instance, training, and hosting. If you don't specify a bucket, SageMaker SDK will create a default bucket following a pre-defined naming convention in the same region. 
- The IAM role ARN used to give SageMaker access to your data. It can be fetched using the **get_execution_role** method from sagemaker python SDK.

In [None]:
import sagemaker
from sagemaker import get_execution_role
import json
import boto3
import pandas as pd
import re
import string
from sklearn.model_selection import train_test_split

sess = sagemaker.Session()

role = get_execution_role()
print(
    role
)  # This is the role that SageMaker would use to leverage AWS resources (S3, CloudWatch) on your behalf

bucket = sess.default_bucket()  # Replace with your own bucket name if needed
print(bucket)
prefix = "blazingtext/supervised/20_newsgroups"  # Replace with the prefix under which you want to store the data if needed

### Data Preparation

Now we'll download a dataset from the web on which we want to train the text classification model. BlazingText expects a single preprocessed text file with space separated tokens and each line of the file should contain a single sentence and the corresponding label(s) prefixed by "\__label\__".

In this example, let us train the text classification model on the [`20 newsgroups dataset`](http://qwone.com/~jason/20Newsgroups/). The `20 newsgroups dataset` consists of 20000 messages taken from 20 Usenet newsgroups.

In [None]:
import os
import shutil

data_dir = "20_newsgroups_bulk"
if os.path.exists(data_dir):  # cleanup existing data folder
    shutil.rmtree(data_dir)

In [None]:
!aws s3 cp s3://sagemaker-sample-files/datasets/text/20_newsgroups/20_newsgroups_bulk.tar.gz .

In [None]:
!tar xzf 20_newsgroups_bulk.tar.gz
!ls 20_newsgroups_bulk

In [None]:
file_list = [os.path.join(data_dir, f) for f in os.listdir(data_dir)]
print("Number of files:", len(file_list))

In [None]:
documents_count = 0
for file in file_list:
    df = pd.read_csv(file, header=None, names=["text"])
    documents_count = documents_count + df.shape[0]
print("Number of documents:", documents_count)

In [None]:
categories_list = [f.split("/")[1] for f in file_list]

In [None]:
categories_list

Let us inspect the dataset to get some understanding about how the data and the label is provided in the dataset. 

In [None]:
df = pd.read_csv("./20_newsgroups_bulk/rec.motorcycles", header=None, names=["text"])
df

In [None]:
df["text"][0]

In [None]:
df = pd.read_csv("./20_newsgroups_bulk/comp.sys.mac.hardware", header=None, names=["text"])
df

In [None]:
df["text"][0]

As we can see from the above, there is a single file for each class in the dataset. Each record is just a plain text paragraphs with header, body, footer and quotes. We will need to process them into a suitable data format.

## Data Preprocessing
We need to preprocess the training data into **space separated tokenized text** format which can be consumed by `BlazingText` algorithm. Also, as mentioned previously, the class label(s) should be prefixed with `__label__` and it should be present in the same line along with the original sentence. We'll use `nltk` library to tokenize the input sentences from `20 newsgroups dataset`.

Download the `nltk` tokenizer and other libraries

In [None]:
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize

nltk.download("punkt")
nltk.download("stopwords")

In [None]:
from sklearn.datasets.twenty_newsgroups import (
    strip_newsgroup_header,
    strip_newsgroup_quoting,
    strip_newsgroup_footer,
)

This following function will remove the header, footer and quotes (of earlier messages in each text).

In [None]:
def strip_newsgroup_item(item):
    item = strip_newsgroup_header(item)
    item = strip_newsgroup_quoting(item)
    item = strip_newsgroup_footer(item)
    return item

The following function will take care of stop words removal, removing leading/trailing whitespace, extra space, tabs, and HTML tags/markups

In [None]:
# Let's get a list of stop words from the NLTK library
stop_words = stopwords.words("english")


def process_text(texts):
    final_text_list = []
    for text in texts:

        # Check if the sentence is a missing value
        if isinstance(text, str) == False:
            text = ""

        filtered_sentence = []

        # Lowercase
        text = text.lower()

        # Remove leading/trailing whitespace, extra space, tabs, and HTML tags/markups
        text = text.strip()
        text = re.sub("\[.*?\]", "", text)
        text = re.sub("https?://\S+|www\.\S+", "", text)
        text = re.sub("<.*?>+", "", text)
        text = re.sub("[%s]" % re.escape(string.punctuation), "", text)
        text = re.sub("\n", "", text)
        text = re.sub("\w*\d\w*", "", text)

        for w in word_tokenize(text):
            # We are applying some custom filtering here, feel free to try different things
            # Check if it is not numeric and its length>2 and not in stop words
            if (not w.isnumeric()) and (len(w) > 2) and (w not in stop_words):
                filtered_sentence.append(w)
        final_string = " ".join(filtered_sentence)  # final string of cleaned words

        final_text_list.append(final_string)

    return final_text_list

Now we will read each of the `20_newsgroups` dataset files, call `strip_newsgroup_item` and `process_text` functions we defined earlier, and then aggregate all data into one dataframe.

In [None]:
all_categories_df = pd.DataFrame()

for file in file_list:
    print(f"Processing {file}")
    label = file.split("/")[1]
    df = pd.read_csv(file, header=None, names=["text"])
    df["text"] = df["text"].apply(strip_newsgroup_item)
    df["text"] = process_text(df["text"].tolist())
    df["label"] = label
    all_categories_df = all_categories_df.append(df, ignore_index=True)

Let's inspect how many categories there are in our dataset.

In [None]:
all_categories_df["label"].value_counts()

In our dataset there are 20 categories which is too much, so we will combine the sub-categories.

In [None]:
# replace to politics
all_categories_df["label"].replace(
    {
        "talk.politics.misc": "politics",
        "talk.politics.guns": "politics",
        "talk.politics.mideast": "politics",
    },
    inplace=True,
)

# replace to recreational
all_categories_df["label"].replace(
    {
        "rec.sport.hockey": "recreational",
        "rec.sport.baseball": "recreational",
        "rec.autos": "recreational",
        "rec.motorcycles": "recreational",
    },
    inplace=True,
)

# replace to religion
all_categories_df["label"].replace(
    {
        "soc.religion.christian": "religion",
        "talk.religion.misc": "religion",
        "alt.atheism": "religion",
    },
    inplace=True,
)

# replace to computer
all_categories_df["label"].replace(
    {
        "comp.windows.x": "computer",
        "comp.sys.ibm.pc.hardware": "computer",
        "comp.os.ms-windows.misc": "computer",
        "comp.graphics": "computer",
        "comp.sys.mac.hardware": "computer",
    },
    inplace=True,
)
# replace to sales
all_categories_df["label"].replace({"misc.forsale": "sales"}, inplace=True)

# replace to science
all_categories_df["label"].replace(
    {
        "sci.crypt": "science",
        "sci.electronics": "science",
        "sci.med": "science",
        "sci.space": "science",
    },
    inplace=True,
)

Now we are left with 6 categories, which is much better.

In [None]:
all_categories_df["label"].value_counts()

Let's calculate number of words for each row.

In [None]:
all_categories_df["word_count"] = all_categories_df["text"].apply(lambda x: len(str(x).split()))
all_categories_df.head()

Let's get basic statistics about the dataset.

In [None]:
all_categories_df["word_count"].describe()

We can see that the mean value is around 86 words. However, there are outliers, such as a text with 6179 words. This can make it harder for the model to result in good performance. We will take care to drop those rows.

Let's drop empty rows first.

In [None]:
no_text = all_categories_df[all_categories_df["word_count"] == 0]
print(len(no_text))

# drop these rows
all_categories_df.drop(no_text.index, inplace=True)

Let's drop the rows that are longer than 128 words. This is done to make it easy for the model to train without outliers.

In [None]:
long_text = all_categories_df[all_categories_df["word_count"] > 128]
print(len(long_text))

# drop these rows
all_categories_df.drop(long_text.index, inplace=True)

In [None]:
all_categories_df["label"].value_counts()

Let's get basic statistics about the dataset after our outliers fixes.

In [None]:
all_categories_df["word_count"].describe()

This looks much more balanced.

Now we drop the `word_count` columns as we will not need it anymore.

In [None]:
all_categories_df.drop(columns="word_count", axis=1, inplace=True)

In [None]:
all_categories_df

We partition the dataset into 80% training and 20% validation set.

In [None]:
train, validation = train_test_split(all_categories_df, test_size=0.2)

In [None]:
def save_to_csv_with_prefix(df, file_name):
    df["text"] = "__label__" + df["label"] + " " + df["text"]
    df.drop(columns="label", axis=1, inplace=True)
    records = df["text"].values.tolist()
    print(len(records))
    f = open(file_name, "w")
    for element in records:
        f.write(element + "\n")
    f.close()

In [None]:
save_to_csv_with_prefix(train, "20_newsgroups.train")
save_to_csv_with_prefix(validation, "20_newsgroups.validation")

Let us inspect the train and the validation datasets after the preprocessing, to get understanding about how the data and the labels now look.

In [None]:
!head 20_newsgroups.train -n 3

In [None]:
!head 20_newsgroups.validation -n 3

We need to upload it to S3 so that it can be consumed by SageMaker to execute training jobs. We'll use Python SDK to upload these two files to the bucket and prefix location that we have set above.   

In [None]:
train_channel = prefix + "/train"
validation_channel = prefix + "/validation"

sess.upload_data(path="20_newsgroups.train", bucket=bucket, key_prefix=train_channel)
sess.upload_data(path="20_newsgroups.validation", bucket=bucket, key_prefix=validation_channel)

s3_train_data = "s3://{}/{}".format(bucket, train_channel)
s3_validation_data = "s3://{}/{}".format(bucket, validation_channel)

Next we need to set up an output location at S3, where the model artifact will be dumped. These artifacts are also the output of the algorithm's training job.

In [None]:
s3_output_location = "s3://{}/{}/output".format(bucket, prefix)
print(s3_output_location)

## Set up hyperparameter tuning job
Now that we are done with all the setup that is needed, we are ready to train our BlazingText model. To begin, let us create a `Estimator` object. This estimator will launch the training job.

In [None]:
region_name = boto3.Session().region_name

In [None]:
container = sagemaker.image_uris.retrieve("blazingtext", region_name, "1")
print("Using SageMaker BlazingText container: {} ({})".format(container, region_name))

## Training the BlazingText model for supervised text classification

BlazingText supports a *supervised* mode for text classification. It extends the `FastText` text classifier to leverage GPU acceleration using custom `CUDA` kernels.
The model can be trained on more than a billion words in a couple of minutes using a multi-core CPU or a GPU, while achieving performance on par with the state-of-the-art deep learning text classification algorithms.
For more information, please refer to the [algorithm documentation](https://docs.aws.amazon.com/sagemaker/latest/dg/blazingtext.html).

Now, let's define the SageMaker `Estimator` with resource configurations and hyperparameters to train Text Classification on `20 newsgroups` dataset, using "supervised" mode on a `c4.4xlarge` instance.

In [None]:
estimator = sagemaker.estimator.Estimator(
    container,
    role,
    instance_count=1,
    instance_type="ml.c4.4xlarge",
    volume_size=30,
    max_run=360000,
    input_mode="File",
    output_path=s3_output_location,
    hyperparameters={
        "mode": "supervised",
        "epochs": 25,
        "min_count": 2,
        "early_stopping": True,
        "patience": 4,
        "min_epochs": 5,
        "word_ngrams": 1,
    },
)

Once we've defined our estimator we can specify the hyperparameters we'd like to tune and their possible values.  We have three different types of hyperparameters.
- Categorical parameters need to take one value from a discrete set.  We define this by passing the list of possible values to `CategoricalParameter(list)`
- Continuous parameters can take any real number value between the minimum and maximum value, defined by `ContinuousParameter(min, max)`
- Integer parameters can take any integer value between the minimum and maximum value, defined by `IntegerParameter(min, max)`

*Note, if possible, it's almost always best to specify a value as the least restrictive type.  For example, tuning learning rate as a continuous value between 0.01 and 0.2 is likely to yield a better result than tuning as a categorical parameter with values 0.01, 0.1, 0.15, or 0.2.*

Refer to [BlazingText Hyperparameters](https://docs.aws.amazon.com/sagemaker/latest/dg/blazingtext_hyperparameters.html) in the Amazon SageMaker documentation for the complete list of hyperparameters.

In [None]:
from sagemaker.tuner import (
    IntegerParameter,
    CategoricalParameter,
    ContinuousParameter,
    HyperparameterTuner,
)

hyperparameter_ranges = {
    "learning_rate": ContinuousParameter(0.05, 0.15),
    "vector_dim": IntegerParameter(32, 300),
}

Next we'll specify the objective metric that we'd like to tune and its definition, which includes the regular expression (Regex) needed to extract that metric from the CloudWatch logs of the training job. Since we are using built-in `BlazingText` algorithm here, it emits two predefined metrics: `train:mean_rho` and `validation:accuracy`, and we elected to monitor `validation:accuracy` as you can see below. In this case, we only need to specify the metric name and do not need to provide regex. If you bring your own algorithm, your algorithm emits metrics by itself. In that case, you'll need to add a `MetricDefinition` object here to define the format of those metrics through regex, so that SageMaker knows how to extract those metrics from your CloudWatch logs.

In [None]:
objective_metric_name = "validation:accuracy"
objective_type = "Maximize"

Now, we'll create a `HyperparameterTuner` object, to which we pass:
- The `BlazingText` estimator we created above
- Our hyperparameter ranges
- Objective metric name and definition
- Tuning resource configurations such as Number of training jobs to run in total and how many training jobs can be run in parallel.

In [None]:
tuner = HyperparameterTuner(
    estimator,
    objective_metric_name,
    hyperparameter_ranges,
    max_jobs=6,
    max_parallel_jobs=2,
    objective_type=objective_type,
)

Now that the hyper-parameters are set up, let us prepare the handshake between our data channels and the algorithm. To do this, we need to create the `sagemaker.inputs.TrainingInput` objects from our data channels. These objects are then put in a simple dictionary, which the algorithm consumes.

In [None]:
train_data = sagemaker.inputs.TrainingInput(
    s3_train_data,
    distribution="FullyReplicated",
    content_type="text/plain",
    s3_data_type="S3Prefix",
)
validation_data = sagemaker.inputs.TrainingInput(
    s3_validation_data,
    distribution="FullyReplicated",
    content_type="text/plain",
    s3_data_type="S3Prefix",
)
data_channels = {"train": train_data, "validation": validation_data}

We have our `Estimator` object, we have set the hyper-parameters for this object, and we have our data channels linked with the algorithm. The only remaining thing to do is to train the algorithm. The following command will train the algorithm. Training the algorithm involves a few steps. Firstly, the instance that we requested while creating the `Estimator` classes is provisioned and is set up with the appropriate libraries. Then, the data from our channels are downloaded into the instance. Once this is done, the training job begins. The provisioning and data downloading will take some time, depending on the size of the data. Therefore, it might be a few minutes before we start getting training logs for our training jobs. The data logs will also print out Accuracy on the validation data for every epoch after training job has executed `min_epochs`. This metric is a proxy for the quality of the algorithm.

A "Job complete" message will be printed once the job has finished. The trained model can be found in the S3 bucket that was set up as `output_path` in the estimator.

## Launch hyperparameter tuning job
Now we can launch a hyperparameter tuning job by calling *fit()* function. After the hyperparameter tuning job is created, we can go to SageMaker console to track the progress of the hyperparameter tuning job until it is completed.

This should take around 12 minutes to complete.

In [None]:
%%time

tuner.fit(inputs=data_channels, logs=True)

## Analyze Results of a Hyperparameter Tuning job

Once you have completed a tuning job, (or even while the job is still running) you can use the code below to analyze the results to understand how each hyperparameter effects the quality of the model.

In [None]:
sm_client = boto3.Session().client("sagemaker")

tuning_job_name = tuner.latest_tuning_job.name
tuning_job_name

## Track hyperparameter tuning job progress
After you launch a tuning job, you can see its progress by calling `describe_tuning_job` API. The output from describe-tuning-job is a JSON object that contains information about the current state of the tuning job. You can call `list_training_jobs_for_tuning_job` to see a detailed list of the training jobs that the tuning job launched.

In [None]:
tuning_job_result = sm_client.describe_hyper_parameter_tuning_job(
    HyperParameterTuningJobName=tuning_job_name
)

status = tuning_job_result["HyperParameterTuningJobStatus"]
if status != "Completed":
    print("Reminder: the tuning job has not been completed.")

job_count = tuning_job_result["TrainingJobStatusCounters"]["Completed"]
print("%d training jobs have completed" % job_count)

is_minimize = (
    tuning_job_result["HyperParameterTuningJobConfig"]["HyperParameterTuningJobObjective"]["Type"]
    != "Maximize"
)
objective_name = tuning_job_result["HyperParameterTuningJobConfig"][
    "HyperParameterTuningJobObjective"
]["MetricName"]

In [None]:
from pprint import pprint

if tuning_job_result.get("BestTrainingJob", None):
    print("Best model found so far:")
    pprint(tuning_job_result["BestTrainingJob"])
else:
    print("No training jobs have reported results yet.")

## Fetch all results as `DataFrame`
We can list hyperparameters and objective metrics of all training jobs and pick up the training job with the best objective metric.

In [None]:
import pandas as pd

tuner_analytics = sagemaker.HyperparameterTuningJobAnalytics(tuning_job_name)

full_df = tuner_analytics.dataframe()

if len(full_df) > 0:
    df = full_df[full_df["FinalObjectiveValue"] > -float("inf")]
    if len(df) > 0:
        df = df.sort_values("FinalObjectiveValue", ascending=is_minimize)
        print("Number of training jobs with valid objective: %d" % len(df))
        print({"lowest": min(df["FinalObjectiveValue"]), "highest": max(df["FinalObjectiveValue"])})
        pd.set_option("display.max_colwidth", -1)  # Don't truncate TrainingJobName
    else:
        print("No training jobs have reported valid results yet.")

df

## Deploy the best trained model
Once the training is done, we can deploy the trained model as an Amazon SageMaker real-time hosted endpoint. This will allow us to make predictions (or inference) from the model. Note that we don't have to host on the same type of instance that we used to train, because usually for inference, less compute power is needed than for training, and in addition, instance endpoints will be up and running for long, it's advisable to choose a cheaper instance for inference.

- `ml.c4.4xlarge` - Compute Optimized instances are ideal for compute bound applications that benefit from high performance processors.
- `ml.m4.xlarge` - General purpose instances provide a balance of compute, memory and networking resources, and can be used for a variety of diverse workloads.

In [None]:
from sagemaker.serializers import JSONSerializer

text_classifier = tuner.deploy(
    initial_instance_count=1, instance_type="ml.m4.xlarge", serializer=JSONSerializer()
)

#### Use JSON format for inference
BlazingText supports `application/json` as the content-type for inference. The payload should contain a list of sentences with the key as "**instances**" while being passed to the endpoint.

In [None]:
sentences = [
    "The modem is an internal AT/(E)ISA 8-bit card (just a little longer than a half-card).",
    "In the cage I usually wave to bikers.  They usually don't wave back.  My wife thinks it's strange but I don't care.",
    "Voyager has the unusual luck to be on a stable trajectory out of the solar system.",
]

# using the same processing logic that we used during data preparation for training
processed_sentences = process_text(sentences)

print(processed_sentences)

payload = {"instances": processed_sentences}

response = text_classifier.predict(payload)

predictions = json.loads(response)
print(json.dumps(predictions, indent=2))

By default, the model will return only one prediction, the one with the highest probability. For retrieving the top k predictions, you can set `k` in the configuration as shown below:

In [None]:
payload = {"instances": processed_sentences, "configuration": {"k": 2}}

response = text_classifier.predict(payload)

predictions = json.loads(response)
print(json.dumps(predictions, indent=2))

### Clean up
Endpoints should be deleted when no longer in use, since (per the [SageMaker pricing page](https://aws.amazon.com/sagemaker/pricing/)) they're billed by time deployed.


In [None]:
text_classifier.delete_endpoint()

## Notebook CI Test Results

This notebook was tested in multiple regions. The test results are as follows, except for us-west-2 which is shown at the top of the notebook.

![This us-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-east-1/hyperparameter_tuning|blazingtext_text_classification_20_newsgroups|hpo_blazingtext_text_classification_20_newsgroups.ipynb)

![This us-east-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-east-2/hyperparameter_tuning|blazingtext_text_classification_20_newsgroups|hpo_blazingtext_text_classification_20_newsgroups.ipynb)

![This us-west-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-west-1/hyperparameter_tuning|blazingtext_text_classification_20_newsgroups|hpo_blazingtext_text_classification_20_newsgroups.ipynb)

![This ca-central-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ca-central-1/hyperparameter_tuning|blazingtext_text_classification_20_newsgroups|hpo_blazingtext_text_classification_20_newsgroups.ipynb)

![This sa-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/sa-east-1/hyperparameter_tuning|blazingtext_text_classification_20_newsgroups|hpo_blazingtext_text_classification_20_newsgroups.ipynb)

![This eu-west-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-west-1/hyperparameter_tuning|blazingtext_text_classification_20_newsgroups|hpo_blazingtext_text_classification_20_newsgroups.ipynb)

![This eu-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-west-2/hyperparameter_tuning|blazingtext_text_classification_20_newsgroups|hpo_blazingtext_text_classification_20_newsgroups.ipynb)

![This eu-west-3 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-west-3/hyperparameter_tuning|blazingtext_text_classification_20_newsgroups|hpo_blazingtext_text_classification_20_newsgroups.ipynb)

![This eu-central-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-central-1/hyperparameter_tuning|blazingtext_text_classification_20_newsgroups|hpo_blazingtext_text_classification_20_newsgroups.ipynb)

![This eu-north-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-north-1/hyperparameter_tuning|blazingtext_text_classification_20_newsgroups|hpo_blazingtext_text_classification_20_newsgroups.ipynb)

![This ap-southeast-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-southeast-1/hyperparameter_tuning|blazingtext_text_classification_20_newsgroups|hpo_blazingtext_text_classification_20_newsgroups.ipynb)

![This ap-southeast-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-southeast-2/hyperparameter_tuning|blazingtext_text_classification_20_newsgroups|hpo_blazingtext_text_classification_20_newsgroups.ipynb)

![This ap-northeast-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-northeast-1/hyperparameter_tuning|blazingtext_text_classification_20_newsgroups|hpo_blazingtext_text_classification_20_newsgroups.ipynb)

![This ap-northeast-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-northeast-2/hyperparameter_tuning|blazingtext_text_classification_20_newsgroups|hpo_blazingtext_text_classification_20_newsgroups.ipynb)

![This ap-south-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-south-1/hyperparameter_tuning|blazingtext_text_classification_20_newsgroups|hpo_blazingtext_text_classification_20_newsgroups.ipynb)
