# BigQuery ML (BQML) Multimodal Embedding

This tutorial accompanies the blog post [Introducing multimodal and structured data embedding support in BigQuery](https://cloud.google.com/blog/products/data-analytics/bigquery-multimodal-embeddings-generation) introduces how to generate multimodal embeddings includes text and image embedding. For more information, please check the [BQML embedding overview](https://cloud.google.com/bigquery/docs/ai-application-overview#text_embedding) and [ML.GENERATE_EMBEDDING documentation](https://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-generate-embedding) to know how to generate and use other embeddings in Big Query.

---
## Find & Replace Project ID, Connection ID and DataSet ID

This workflow uses IPython Magics for readability and for making SQL queries easy to copy/paste to other editors, like the BigQuery console.  It is recommended to first do a find & replace of the project id, connection id and dataset id used in this notebook with the one in your environment.  
- Replace all occurences of `bigquerytestdefault` with the value of your project id.
- Replace all occurences of `github_connection` with the value of your remote connection id.
- Replace all occurences of `bqml_tutorial` with the value of your dataset id.

---
## Colab Setup

To run this notebook in Colab click [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/GoogleCloudPlatform/bigquery-ml-utils/notebooks/bqml-generate-embedding-with-multimodalembedding-model.ipynb) and run the cells in this section.  Otherwise, skip this section.

This cell will authenticate to GCP (follow prompts in the popup).

In [None]:
#@title Setup credentials

from google.colab import auth
auth.authenticate_user()
print('Authenticated')

PROJECT_ID='bigquerytestdefault'
from google.cloud import bigquery
client = bigquery.Client(PROJECT_ID)

In [None]:
#@title Enable data table display

%load_ext google.colab.data_table

In [None]:
#@title Function that displays images from GCS bucket.
import io
from PIL import Image
import matplotlib.pyplot as plt
import tensorflow as tf

def printImages(results):
  image_results_list = list(results)
  amt_of_images = len(image_results_list)

  fig, axes = plt.subplots(nrows=amt_of_images, ncols=2, figsize=(20, 20))
  fig.tight_layout()
  fig.subplots_adjust(hspace=0.5)
  for i in range(amt_of_images):
    gcs_uri = image_results_list[i][0]
    text = image_results_list[i][1]
    f = tf.io.gfile.GFile(gcs_uri, 'rb')
    stream = io.BytesIO(f.read())
    img = Image.open(stream)
    axes[i, 0].axis('off')
    axes[i, 0].imshow(img)
    axes[i, 1].axis('off')
    axes[i, 1].text(0, 0, text, fontsize=10)
  plt.show()

In [None]:
#@title Create Image Object table from public dataset.
%%bigquery --project bigquerytestdefault

CREATE OR REPLACE EXTERNAL TABLE
`bqml_tutorial.image_object_table`
WITH CONNECTION `bigquerytestdefault.us.github_connection`
OPTIONS(
  object_metadata = 'SIMPLE',
  uris = ['gs://gcs-public-data--met/*']);

In [None]:
#@title Display first 10 images in the image object table

display_first_10_images_query = """
  SELECT *
  FROM bqml_tutorial.image_object_table
  WHERE content_type = 'image/jpeg'
  LIMIT 10;
"""
printImages(client.query(display_first_10_images_query))

In [None]:
#@title Create a multimodalembedding model
%%bigquery --project bigquerytestdefault

CREATE OR REPLACE MODEL `bqml_tutorial.multimodalembedding_model`
REMOTE WITH CONNECTION `bigquerytestdefault.us.github_connection`
OPTIONS (endpoint="multimodalembedding@001");

In [None]:
#@title Generate image embeddings from the image object table.
%%bigquery --project bigquerytestdefault

CREATE OR REPLACE TABLE `bqml_tutorial.image_embed_table`
AS
SELECT *
FROM
  ML.GENERATE_EMBEDDING(
    MODEL `bqml_tutorial.multimodalembedding_model`,
    (
      SELECT * FROM `bqml_tutorial.image_object_table` WHERE content_type = 'image/jpeg' LIMIT 10000
    ));

In [None]:
#@title Show image embedding results.
%%bigquery --project bigquerytestdefault

SELECT * FROM `bqml_tutorial.image_embed_table` limit 10;

In [None]:
#@title Input text prompt: "give me a best match picture of a dress". Vector search find best match.
%%bigquery --project bigquerytestdefault

CREATE OR REPLACE TABLE `bqml_tutorial.vector_search_table` AS
SELECT base.uri AS gcs_uri, distance
FROM
  VECTOR_SEARCH(
    TABLE `bqml_tutorial.image_embed_table`,
    "ml_generate_embedding_result",
    (
      SELECT * FROM ML.GENERATE_EMBEDDING(
        MODEL `bqml_tutorial.multimodalembedding_model`,
        (
          SELECT "give me a best match picture of a wedding dress" AS content
        )
      )
    ),
    "ml_generate_embedding_result",
    top_k => 5);

In [None]:
#@title Find the best match image.
best_match_query = """
  SELECT * FROM `bqml_tutorial.vector_search_table`
  ORDER BY distance limit 5;
"""

printImages(client.query(best_match_query))