tools/vertexai-featurestore-migration-kit/notebook/fs-migration.ipynb (621 lines of code) (raw):

{ "cells": [ { "cell_type": "code", "execution_count": null, "id": "0W6WytqaUmLf", "metadata": { "id": "0W6WytqaUmLf" }, "outputs": [], "source": [ "# Copyright 2024 Google LLC\n", "#\n", "# Licensed under the Apache License, Version 2.0 (the \"License\");\n", "# you may not use this file except in compliance with the License.\n", "# You may obtain a copy of the License at\n", "#\n", "# https://www.apache.org/licenses/LICENSE-2.0\n", "#\n", "# Unless required by applicable law or agreed to in writing, software\n", "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", "# See the License for the specific language governing permissions and\n", "# limitations under the License." ] }, { "cell_type": "markdown", "id": "mhoHpwGcPxB_", "metadata": { "id": "mhoHpwGcPxB_" }, "source": [ "# **Legacy Feature Store to Feature Store 2.0 Migration**" ] }, { "cell_type": "markdown", "id": "lApdAb_FQpI5", "metadata": { "id": "lApdAb_FQpI5", "tags": [] }, "source": [ "---\n", "## **Introduction**\n" ] }, { "cell_type": "markdown", "id": "Et28pWkfcGDc", "metadata": { "id": "Et28pWkfcGDc" }, "source": [ "Vertex AI Feature Store is a fully managed feature management solution provided by Google Cloud Platform. It allows organizations to efficiently serve, share, and reuse machine learning (ML) features across projects and teams. As part of Google Cloud's commitment to enhancing its services, a new version of Feature Store (2.0) has been introduced with improved capabilities and performance.\n", "\n", "\n", "This notebook guides you through the migration process from Vertex AI Legacy Feature Store to the new Feature Store 2.0. We'll cover configuration setup, offline store migration, and online store migration.\n", "\n", "Let's get started!\n" ] }, { "cell_type": "markdown", "id": "HC09KXr5QkPp", "metadata": { "id": "HC09KXr5QkPp", "tags": [] }, "source": [ "---\n", "\n", "## **Configuration Setup**" ] }, { "cell_type": "markdown", "id": "1x2uMDD8axuu", "metadata": { "id": "1x2uMDD8axuu" }, "source": [ "The migration process is driven by a configuration file (config.yaml) that defines the feature stores and entity types to be migrated.\n", "\n", "1. **Create `config.yaml`:**\n", " - Create a YAML file named `config.yaml` in your project directory.\n", " - Fill in the following details:\n", " \n", " ```yaml\n", " project_id: \"your-gcp-project-id\" # Replace with your project ID\n", " region: \"your-region\" # Replace with your region (e.g., \"us-central1\")\n", " \n", " bq_dataset_prefix: exp_db_\n", " bq_table_prefix: exp_db_tbl_\n", "\n", " legacy_feature_store:\n", " feature_store_mode: all # Or \"list\" to specify individual stores\n", " # feature_stores: # Only if feature_store_mode is \"list\"\n", " ```\n", "\n", "\n", "- **Explanation:**\n", " - `project_id`: This is your Google Cloud Project ID where the feature stores are located.\n", " - `region`: The region where your feature stores reside.\n", " - `bq_dataset_prefix`: Prefix for the BigQuery dataset name\n", " - `bq_table_prefix`: Prefix for the BigQuery table name\n", " - `legacy_feature_store`:\n", " - `feature_store_mode`: \n", " - `all`: Migrates all feature stores in your project.\n", " - `list`: Migrates specific feature stores listed in `feature_stores`.\n", "\n", "\n", "- **Feature Store Mode: \"list\" Example:**\n", "\n", " ```yaml\n", " project_id: your-project-id # Replace with your project ID\n", " region: your-region # Replace with your region (e.g., \"us-central1\")\n", "\n", " legacy_feature_store:\n", " feature_store_mode: list # Or \"all\"\n", " feature_stores: # Only if feature_store_mode is \"list\"\n", " - name: legacy_feature_store_name\n", " entity_type_mode: all # \"all\" or \"list\"\n", " entity_types:\n", " - name: entity_type_1\n", " entity_id_column: entity_id_new\n", " mapping_file: path/to/mapping_file_1.csv\n", " - name: entity_type_2\n", " entity_id_column: entity_id_new\n", " mapping_file: path/to/mapping_file_2.csv\n", " ```\n", "\n", "- **Explanation of Feature Store Mode: \"list\":**\n", " * **`project_id`**: Your Google Cloud Project ID.\n", " * **`region`**: The Google Cloud region where your feature stores are located.\n", " * **`legacy_feature_store`**: Configuration for the legacy Vertex AI Feature Store.\n", " * **`feature_store_mode`**:\n", " * **`list`**: Process only the feature stores specified in the `feature_stores` section.\n", " * **`feature_stores`**: A list of feature store configurations.\n", " * **`name`**: The name of the legacy feature store to migrate.\n", " * **`entity_type_mode`**:\n", " * **`all`**: Process all entity types in the feature store.\n", " * **`list`**: Process only the entity types specified in the `entity_types` section.\n", " * **`entity_types`**: A list of entity type configurations.\n", " * **`name`**: The name of the entity type within the feature store.\n", " * **`entity_id_column`**: The name of the entity id column to rename.\n", " * **`mapping_file`**: The path to the CSV file containing the feature mapping for this entity type.\n", "\n", "2. **Create Feature Mapping Files (Optional):**\n", " - Create CSV files for each entity type you want to rename features for.\n", " - Each row should have `original_feature_name` and `destination_feature_name`.\n", " - Example:\n", "\n", " ```csv\n", " original_feature_name,destination_feature_name\n", " id,id\n", " name,name_new\n", " style,style_new\n", " ```\n" ] }, { "cell_type": "markdown", "id": "ea7a553e-b755-4bdd-a1e6-3ae5c35bc412", "metadata": { "tags": [] }, "source": [ "---\n", "### **Set Root Dir**" ] }, { "cell_type": "code", "execution_count": null, "id": "7448a6f6-aed7-4cf9-b198-dda500874c86", "metadata": { "tags": [] }, "outputs": [], "source": [ "import os\n", "import sys\n", "\n", "# Add the project root directory to sys.path\n", "project_root = os.path.abspath(os.path.join(os.getcwd(), '..'))\n", "sys.path.insert(0, project_root)" ] }, { "cell_type": "markdown", "id": "diRY0AG1cGC8", "metadata": { "id": "diRY0AG1cGC8" }, "source": [ "---\n", "## **Offline Store Migration**" ] }, { "cell_type": "markdown", "id": "_UQzF-xbeCQC", "metadata": { "id": "_UQzF-xbeCQC" }, "source": [ "### **Setting Up Logging**" ] }, { "cell_type": "code", "execution_count": null, "id": "0o4nJNDUo3qw", "metadata": { "id": "0o4nJNDUo3qw", "tags": [] }, "outputs": [], "source": [ "from logging_config import configure_logging\n", "\n", "configure_logging()" ] }, { "cell_type": "markdown", "id": "d-1AFwsro5IH", "metadata": { "id": "d-1AFwsro5IH" }, "source": [ "These lines are responsible for preparing your project to automatically log important information as it runs. Here's what they do:\n", "\n", "1. **Import the Configuration:**\n", " - The line `from logging_config import configure_logging` brings in a special function named `configure_logging` from a separate file called `logging_config.py`. This function is where all the details about how you want your logs to be handled are defined.\n", "\n", "2. **Activate the Logging Setup:**\n", " - `configure_logging()` is then executed. This function sets up Loguru, a powerful logging library, to do several things:\n", " - **Create a \"logs\" folder:** It makes sure a special folder called \"logs\" exists within your project directory. This is where all the log files will be stored.\n", " - **Set up log files:**\n", " - It creates a main log file named `app.log` where a complete history of your program's events will be kept.\n", " - It also creates a separate log file for each time you run your program. This filename includes a timestamp so you can easily track which log corresponds to which run.\n", " - **Define log levels:** It determines what kind of information gets logged. It sets the default level to `DEBUG`, which means you'll see all kinds of messages, including detailed technical ones. It also sets up a special log level called `REQUEST_TIME` that you can use to track how long specific tasks take.\n", "\n", "In essence, these two lines prepare your project to keep a record of its activities. By calling `configure_logging()`, you ensure that every time you run your program, a new log file is created to store important information about what happened during that run, while the main `app.log` file keeps a running record for you to review later.\n" ] }, { "cell_type": "markdown", "id": "CNYfUVHAq3P7", "metadata": { "id": "CNYfUVHAq3P7" }, "source": [ "### **Export Legacy Stores**" ] }, { "cell_type": "code", "execution_count": null, "id": "IINCKXnEd7Ln", "metadata": { "id": "IINCKXnEd7Ln", "tags": [] }, "outputs": [], "source": [ "from legacy_exporter import LegacyExporter\n", "\n", "legacy_exporter = LegacyExporter()\n", "export_response = legacy_exporter.export_feature_store()" ] }, { "cell_type": "markdown", "id": "F85q0OAbrFtc", "metadata": { "id": "F85q0OAbrFtc" }, "source": [ "**The `LegacyExporter` Class: The Conductor of the Migration**\n", "\n", "The `LegacyExporter` class is the heart of the offline store migration process. Its primary responsibility is to manage the extraction of data from the Legacy Feature Store and its subsequent preparation for import into Feature Store 2.0.\n", "\n", "* **Initialization:**\n", " - When you create a `LegacyExporter` object, it reads the `config.yaml` file to understand your specific migration settings.\n", " - It establishes a connection to the Feature Store service in your Google Cloud project using the project ID and region you provided in the configuration.\n", "\n", "* **`export_feature_store()` Method:**\n", " - This is the core method that does the heavy lifting. Here's what it does, step by step:\n", " 1. **Gather Feature Store Information:** It identifies which feature stores and entity types to migrate based on the configuration file.\n", " 2. **Create BigQuery Destinations:** For each feature store and entity type, it creates dedicated datasets and tables in BigQuery to temporarily hold the exported data.\n", " 3. **Extract Feature Values:** It fetches the feature values (the actual data points) from the Legacy Feature Store.\n", " 4. **Apply Feature Mapping:** If you've provided a feature mapping file, it renames the features according to the mapping before exporting.\n", " 5. **Export to BigQuery:** It writes the extracted and potentially renamed feature values to the appropriate BigQuery tables.\n", " 6. **Organize Results:** It prepares a dictionary (`export_response`) that summarizes what was exported (which feature stores, entity types, features, and the location of the data in BigQuery).\n" ] }, { "cell_type": "markdown", "id": "EkXaXKDYrYns", "metadata": { "id": "EkXaXKDYrYns" }, "source": [ "### **Import into Feature Store 2.0**" ] }, { "cell_type": "code", "execution_count": null, "id": "3tWjLXqarJWW", "metadata": { "id": "3tWjLXqarJWW", "tags": [] }, "outputs": [], "source": [ "from feature_store_importer import FeatureStoreImporter\n", "\n", "feature_store_importer = FeatureStoreImporter()\n", "feature_store_importer.import_features(data=export_response)" ] }, { "cell_type": "markdown", "id": "r_jgkvMtrGTU", "metadata": { "id": "r_jgkvMtrGTU" }, "source": [ "**The `FeatureStoreImporter` Class: The Architect of Feature Store 2.0**\n", "\n", "The `FeatureStoreImporter` class acts as the primary tool for constructing your new Feature Store 2.0 structure. It takes the organized data from BigQuery (prepared by the `LegacyExporter`) and transforms it into Feature Groups and Features, the building blocks of Feature Store 2.0.\n", "\n", "* **Initialization:**\n", " - When a `FeatureStoreImporter` object is created, it, like its counterpart `LegacyExporter`, reads the `config.yaml` file to understand the project settings.\n", " - It uses two essential clients:\n", " - **FeaturestoreServiceClient:** For general interactions with the Feature Store.\n", " - **FeatureRegistryServiceClient:** Specifically for creating and managing Feature Groups and Features.\n", "\n", "* **`import_features(data)` Method:**\n", " - This method orchestrates the entire import process. Here's a simplified breakdown:\n", "\n", " 1. **Check for Data:** If the input `data` (the `export_response` from `LegacyExporter`) is empty, it logs a message and exits gracefully.\n", "\n", " 2. **Iterate Over Exported Data:** It iterates over the data for each feature store and entity type.\n", "\n", " 3. **Create Feature Groups:** For each entity type, it creates a new Feature Group in Feature Store 2.0 using the entity type's name. This Feature Group will act as a container for the related features.\n", "\n", " 4. **Create Features:** Within each Feature Group, it iterates over the list of features and creates individual Feature entries. These Features represent the specific data points you're storing.\n", "\n", " 5. **Associate with BigQuery:** It associates each Feature Group with the corresponding BigQuery table where the data is stored, allowing seamless access for future use.\n" ] }, { "cell_type": "markdown", "id": "nTi6qQklPpT0", "metadata": { "id": "nTi6qQklPpT0" }, "source": [ "---\n", "## **Online Store Migration**" ] }, { "cell_type": "markdown", "id": "516c2a11", "metadata": { "id": "516c2a11" }, "source": [ "### **Generate Intermediate Online FS serving config file**" ] }, { "cell_type": "code", "execution_count": null, "id": "6odkZkX_QZFI", "metadata": { "id": "6odkZkX_QZFI", "tags": [] }, "outputs": [], "source": [ "import json\n", "from utils import transform_json\n", "\n", "ONLINE_STORE_CONFIG_FILE = os.path.join(project_root, 'config', 'online_store_config.json')\n", "\n", "# Generate Intermediate Online FS serving config file\n", "transformed_config = transform_json(export_response)\n", "with open(ONLINE_STORE_CONFIG_FILE, \"w\", encoding=\"utf-8\") as f:\n", " json.dump(transformed_config, f)" ] }, { "cell_type": "markdown", "id": "ElnNxhjVOzBG", "metadata": { "id": "ElnNxhjVOzBG" }, "source": [ "The code snippet above acts as a bridge between the offline store migration (where features and entities are exported) and the online store creation in Feature Store 2.0.\n", "\n", "- **Configuration Transformation:**\n", "It takes the raw output (export_response) from the offline feature store export process and transforms it into a format that's compatible with the creation of a FeatureOnlineStore in Feature Store 2.0. This transformation involves mapping legacy feature store configurations to the new Feature Store 2.0 schema.\n", "\n", "- **Intermediate Configuration File (ONLINE_STORE_CONFIG_FILE):** The transformed configuration is written to a JSON file named online_store_config.json (specified by ONLINE_STORE_CONFIG_FILE). This file serves as a crucial intermediate step in the migration process.\n", "\n", "- **Customization Point:** Since Feature Store 2.0 has evolved from the legacy system, there might be new configurations or differences in how settings are applied. This file provides an opportunity for users to fine-tune the configuration to match their specific needs and preferences.\n" ] }, { "cell_type": "markdown", "id": "b1caa681", "metadata": { "id": "b1caa681" }, "source": [ "### **Review the Online FS serving config file**" ] }, { "cell_type": "markdown", "id": "66777786", "metadata": { "id": "66777786" }, "source": [ "**Points to note while reviewing the online store config**\n", "\n", "1. The `online_store_type` can be set to `bigtable` or `optimized` depending on the use case and the size of features.\n", "2. Currently, the script supports Manual or Cron as Sync modes for FeatureViews. The default value for the field `cron_schedule` will be set to `null` which will mark it as a Manual Sync Mode.\n", "3. The fields `bigtable_min_node_count`,`bigtable_max_node_count`,`cpu_utilization_target` will be skipped if the `online_store_type` is set to `optimized`.\n", "4. Review the mapping of Feature Groups to Feature views." ] }, { "cell_type": "markdown", "id": "56h2bAGgS0us", "metadata": { "id": "56h2bAGgS0us" }, "source": [ "### **Read back the Online FS serving config file**" ] }, { "cell_type": "code", "execution_count": null, "id": "KL7FvePAS1RE", "metadata": { "id": "KL7FvePAS1RE" }, "outputs": [], "source": [ "def read_json_config(config_file):\n", " \"\"\"Reads a JSON file and returns its contents as a dictionary\"\"\"\n", " with open(config_file, 'r', encoding=\"utf-8\") as f:\n", " data = json.load(f)\n", " return data\n", "\n", "\n", "# Read Online Store config File\n", "online_store_config = read_json_config(ONLINE_STORE_CONFIG_FILE)" ] }, { "cell_type": "markdown", "id": "29a069d0", "metadata": { "id": "29a069d0" }, "source": [ "### **Create Online Stores & Feature Views**" ] }, { "cell_type": "code", "execution_count": null, "id": "xgsMLZiQR4SH", "metadata": { "id": "xgsMLZiQR4SH" }, "outputs": [], "source": [ "from logging_config import logger\n", "from online_store_creator import FeatureOnlineStore\n", "\n", "\n", "for online_store_config_obj in online_store_config[\"online_stores\"]:\n", " online_store_obj = FeatureOnlineStore(online_store_config_obj=online_store_config_obj,\n", " project_id=online_store_config[\"project_id\"],\n", " region=online_store_config[\"region\"])\n", " try:\n", " online_store_obj.create_feature_online_store()\n", " except ValueError as e:\n", " logger.error(f\"Error creating online store: {e}\")\n", " continue\n", " online_store_obj.create_feature_views_from_feature_groups()" ] }, { "cell_type": "markdown", "id": "D05U0O-YaRL6", "metadata": { "id": "D05U0O-YaRL6" }, "source": [ "\n", "\n", "**Online Store Creation and Feature View Population:**\n", "\n", "This code automates the setup of online feature stores and their associated feature views in Vertex AI Feature Store 2.0, using configurations specified in the `online_store_config.json` file. Here's a detailed breakdown of what happens:\n", "\n", "1. **Iterate Over Online Store Configurations:**\n", " - The code loops through each entry in the `online_stores` list within the `online_store_config.json` file. Each entry represents the desired configuration for a single online feature store.\n", "\n", "2. **Instantiate FeatureOnlineStore Object:**\n", " - For each configuration, a `FeatureOnlineStore` object is created. This object encapsulates the settings and actions needed to create and manage the online store.\n", " - The object is initialized with:\n", " - `online_store_config_obj`: The specific configuration dictionary for the current online store, including `online_store_type`, `bigtable_min_node_count`, `bigtable_max_node_count`, and `cpu_utilization_target`.\n", " - `project_id`: The Google Cloud project ID where the online store will be created.\n", " - `region`: The region where the online store will be hosted.\n", "\n", "3. **Create Online Store:**\n", " - The `create_feature_online_store()` method is called on the `FeatureOnlineStore` object. This method performs the following:\n", " - **Check for Existence:** It first checks if an online store with the same name (`feature_online_store_name` from the configuration) already exists. If so, it logs an error and skips to the next configuration.\n", " - **Create Online Store Request:** If the store doesn't exist, it constructs a request to create a new online store, using the following details:\n", " - **Feature Online Store ID:** The unique name specified in the configuration (`feature_online_store_name`).\n", " - **Configuration:**\n", " - If `online_store_type` is \"bigtable\", it includes settings like `bigtable_min_node_count`, `bigtable_max_node_count`, and `cpu_utilization_target`.\n", " - If `online_store_type` is \"optimized\", it sets up an optimized store.\n", "\n", " - **Send Request and Poll:** It sends the request and waits (polls) for the operation to complete. If successful, a new online store is created with the specified name and configuration. If there's an error during creation, it's logged, and the process moves on.\n", "\n", "4. **Create Feature Views:**\n", " - After successfully creating the online store, the `create_feature_views_from_feature_groups()` method is called. This method:\n", " - **Iterate Over Feature Views:** It iterates over the `feature_views` list within the online store configuration. Each entry represents a feature view to be created.\n", " - **Feature Group Mapping:** For each feature view, it identifies the associated feature groups and feature IDs from the configuration.\n", " - **Feature View Creation Request:** It constructs a request to create a new feature view in the online store, associating it with the specified feature groups and IDs. It also sets the refresh schedule based on the `cron_schedule` field in the configuration (or defaults to a Manual sync if not specified).\n", " - **Send Request and Poll:** It sends the request and polls the operation until it completes. If successful, a new feature view is created in the online store, providing access to the features from the associated feature groups.\n" ] }, { "cell_type": "markdown", "id": "S4A5phsdspNu", "metadata": { "id": "S4A5phsdspNu" }, "source": [ "---\n", "## **Validation**" ] }, { "cell_type": "code", "execution_count": null, "id": "a6PQB2smsn07", "metadata": { "id": "a6PQB2smsn07" }, "outputs": [], "source": [ "!python3 validation.py --project_id [PROJECT_ID] --region [REGION_ID] --spreadsheet_file_path [PATH_FOR_OUTPUT_SPREADSHEET]" ] }, { "cell_type": "markdown", "id": "9oaqueFisluv", "metadata": { "id": "9oaqueFisluv" }, "source": [ "- **Explanation:**\n", " - Runs the `validation.py` script to compare the results in Legacy Feature Store and Feature Store 2.0.\n", " - Make sure to replace placeholders with your actual project ID, region, and spreadsheet path.\n", " - Output will be stored in the specified spreadsheet, showing feature stores, entities, and features from both versions." ] } ], "metadata": { "colab": { "collapsed_sections": [ "HC09KXr5QkPp", "diRY0AG1cGC8" ], "name": "fs-migration.ipynb", "provenance": [] }, "environment": { "kernel": "python3", "name": "tf2-cpu.2-11.m123", "type": "gcloud", "uri": "us-docker.pkg.dev/deeplearning-platform-release/gcr.io/tf2-cpu.2-11:m123" }, "kernelspec": { "display_name": "Python 3 (Local)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.14" } }, "nbformat": 4, "nbformat_minor": 5 }