
# BigQuery ML (BQML) - Generate Text Embedding Using Pre-trained TensorFlow Models

This notebook will explore how to generate NNLM, SWIVEL, and BERT text embedding models using pre-trained TensorFlow models with [TextEmbeddingModelGenerator](https://github.com/GoogleCloudPlatform/bigquery-ml-utils/blob/master/model_generator/text_embedding_generator.py) from the`bigquery-ml-utils` library. The TextEmbeddingModelGenerator automatically loads one of the three text embedding model ([NNLM](https://tfhub.dev/google/nnlm-en-dim50-with-normalization/2), [SWIVEL](https://tfhub.dev/google/tf2-preview/gnews-swivel-20dim/1), [BERT](https://tfhub.dev/tensorflow/bert_en_cased_L-12_H-768_A-12/4)) from TensorFlow hub and integrates a default signature such that the resulting model can be immediately integrated with BQML.

This notebook will cover:
- Installing appropriate libraries.
- Generating desired text embedding model with TextEmbeddingModelGenerator.
- Exporting the generated model to a GCS bucket.

**This content will accompany the blog post - TBD**

---

**Prerequisites:**

None

**Services Used:**
- BigQuery
- TensorFlow Hub: Workbench (this notebook)
- GCS

**Resources:**
- [BigQuery ML (BQML) Overview](https://cloud.google.com/bigquery/docs/bqml-introduction)
- [Overview of BQML methods and workflows](https://cloud.google.com/bigquery/docs/e2e-journey)
- [BigQuery](https://cloud.google.com/bigquery)
    - [Documentation](https://cloud.google.com/bigquery/docs/query-overview)
    - [API](https://cloud.google.com/bigquery/docs/reference/libraries-overview)



---
## Colab Setup

To run this notebook in Colab click [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/GoogleCloudPlatform/bigquery-ml-utils/blob/master/notebooks/bqml-generate-text-embedding-model.ipynb) and run the cells in this section.  Otherwise, skip this section.



In [None]:
!pip install bigquery_ml_utils

**RESTART RUNTIME**

The Next cell will restart the runtime by first stopping it and then Colab will automatically restart - you may need to dismiss a popup warning letting you know about this unexpected restart.  This restart makes the installs above available to the current session.

In [None]:
import IPython
app = IPython.Application.instance()
app.kernel.do_shutdown(True)

---
## Environment Setup
Import necessary packages:

In [None]:
from bigquery_ml_utils import model_generator
import tensorflow as tf
import tensorflow_text

---

## Generate a Text Embedding Model
`bigquery-ml-utils` currently offers three text embedding models - NNLM, SWIVEL, and BERT.

Initiate desired model and local output directory:

In [None]:
MODEL_NAME = "swivel" # options: {"nnlm", "swivel", "bert"}
LOCAL_OUTPUT_DIR = "./swivel" # replace with desired local output directory

Establish an instance of TextEmbeddingModelGenerator and generate desired text embedding model:

In [None]:
text_embedding_model_generator = model_generator.TextEmbeddingModelGenerator()
text_embedding_model_generator.generate_text_embedding_model(MODEL_NAME, LOCAL_OUTPUT_DIR)

Print generated model's signature to confirm that model has been correctly generated:

In [None]:
reload_embedding_model = tf.saved_model.load(LOCAL_OUTPUT_DIR)
print(reload_embedding_model.signatures["serving_default"])

---
## Export Model to GCS Bucket


Authenticate gcloud account with Google sign-in:

In [None]:
import googleapiclient
from google.colab import auth as google_auth

PROJECT_ID="sample-project-id" # replace with project ID

google_auth.authenticate_user()
!gcloud config set project {PROJECT_ID}

Copy model's contents to specified GCS bucket:

In [None]:
GCS_BUCKET="bashtest" # replace with GCS bucket name

# if bucket doesn't exist, make new bucket
# otherwise, use existing bucket
!gsutil mb gs://{GCS_BUCKET}/

# copy model's content to bucket and list out the bucket's content
!gsutil cp -r {LOCAL_OUTPUT_DIR} gs://{GCS_BUCKET}/
!gsutil ls gs://{GCS_BUCKET}/{MODEL_NAME}

---

## Make predictions

Once the generated text embedding model is copied into a GCS bucket, make predictions by following one of the two paths listed below:
- NNLM, SWIVEL: [Make predictions with imported TensorFlow models](https://cloud.google.com/bigquery/docs/making-predictions-with-imported-tensorflow-models)
- BERT: [Make predictions with remote models on Vertex AI](https://cloud.google.com/bigquery/docs/bigquery-ml-remote-model-tutorial)