feature-store/feature_store_kms_key_encryption.ipynb (606 lines of code) (raw):
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Amazon SageMaker Feature Store: Encrypt Data in your Online or Offline Feature Store using KMS key"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"---\n",
"\n",
"This notebook's CI test result for us-west-2 is as follows. CI test results in other regions can be found at the end of the notebook. \n",
"\n",
"\n",
"\n",
"---"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This notebook demonstrates how to enable encyption for your data in your online or offline Feature Store using KMS key. We start by showing how to programmatically create a KMS key, and how to apply it to the feature store creation process for data encryption. The last portion of this notebook demonstrates how to verify that your KMS key is being used to encerypt your data in your feature store.\n",
"\n",
"### Important\n",
"If you **do not** specify a KMS encryption key, by default we encrypt all data at rest using an AWS KMS key. By defining your [bucket-level key](https://docs.aws.amazon.com/AmazonS3/latest/userguide/bucket-key.html) for SSE, you can reduce AWS KMS requests costs by up to 99 percent. \n",
"\n",
"\n",
"### Overview\n",
"1. Create a KMS key.\n",
" - How to create a KMS key programmatically using the KMS client from boto3?\n",
"2. Attach role to your KMS key.\n",
" - Attach the required entries to your policy for data encryption in your feature store.\n",
"3. Create an online or offline feature store and apply it to your feature store creation process.\n",
" - How to enable encryption for your online store?\n",
" - How to enable encryption for your offline store?\n",
"4. How to verify that your data is encrypted in your online or offline store?\n",
"\n",
"### Prerequisites\n",
"This notebook uses both `boto3` and Python SDK libraries, and the `Python 3 (Data Science)` kernel. This notebook also works with Studio, Jupyter, and JupyterLab. \n",
"\n",
"### Library Dependencies:\n",
"* sagemaker>=2.0.0\n",
"* numpy\n",
"* pandas"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import sagemaker\n",
"import sys\n",
"import boto3\n",
"import pandas as pd\n",
"import numpy as np\n",
"import json\n",
"\n",
"original_version = sagemaker.__version__\n",
"%pip install 'sagemaker>=2.0.0'"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Set up "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"sagemaker_session = sagemaker.Session()\n",
"s3_bucket_name = sagemaker_session.default_bucket()\n",
"prefix = \"sagemaker-featurestore-kms-demo\"\n",
"role = sagemaker.get_execution_role()\n",
"region = sagemaker_session.boto_region_name"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Create a KMS client using boto3. Note that you can access your boto session through your sagemaker session, e.g.,`sagemaker_session`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"kms = sagemaker_session.boto_session.client(\"kms\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### KMS Policy Template\n",
"\n",
"Below is the policy template you will use for creating a KMS key. You will specify your role to grant it access to various KMS operations that will be used in the back-end for encrypting your data in your Online or Offline Feature Store. \n",
"\n",
"**Note**: You will need to substitute your Account number in for `123456789012` in the policy below for these lines: `arn:aws:cloudtrail:*:123456789012:trail/*`. \n",
"\n",
"It is important to understand that the policy below will grant admin privileges for Customer Managed Keys (CMK) around viewing and revoking grants, decrypt and encrypt permissions on CloudTrail and full access permissions through Feature Store. Also, note that the the Feature Store Service creates additonal grants that are used for encryption purposes for your online store. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"policy = {\n",
" \"Version\": \"2012-10-17\",\n",
" \"Id\": \"key-policy-feature-store\",\n",
" \"Statement\": [\n",
" {\n",
" \"Sid\": \"Allow access through Amazon SageMaker Feature Store for all principals in the account that are authorized to use Amazon SageMaker Feature Store\",\n",
" \"Effect\": \"Allow\",\n",
" \"Principal\": {\"AWS\": role},\n",
" \"Action\": [\n",
" \"kms:Encrypt\",\n",
" \"kms:Decrypt\",\n",
" \"kms:DescribeKey\",\n",
" \"kms:CreateGrant\",\n",
" \"kms:RetireGrant\",\n",
" \"kms:ReEncryptFrom\",\n",
" \"kms:ReEncryptTo\",\n",
" \"kms:GenerateDataKey\",\n",
" \"kms:ListAliases\",\n",
" \"kms:ListGrants\",\n",
" ],\n",
" \"Resource\": [\"*\"],\n",
" \"Condition\": {\"StringLike\": {\"kms:ViaService\": \"sagemaker.*.amazonaws.com\"}},\n",
" },\n",
" {\n",
" \"Sid\": \"Allow administrators to view the CMK and revoke grants\",\n",
" \"Effect\": \"Allow\",\n",
" \"Principal\": {\"AWS\": [role]},\n",
" \"Action\": [\"kms:Describe*\", \"kms:Get*\", \"kms:List*\", \"kms:RevokeGrant\"],\n",
" \"Resource\": [\"*\"],\n",
" },\n",
" {\n",
" \"Sid\": \"Enable CloudTrail Encrypt Permissions\",\n",
" \"Effect\": \"Allow\",\n",
" \"Principal\": {\"Service\": \"cloudtrail.amazonaws.com\", \"AWS\": [role]},\n",
" \"Action\": \"kms:GenerateDataKey*\",\n",
" \"Resource\": \"*\",\n",
" \"Condition\": {\n",
" \"StringLike\": {\n",
" \"kms:EncryptionContext:aws:cloudtrail:arn\": [\n",
" \"arn:aws:cloudtrail:*:123456789012:trail/*\",\n",
" \"arn:aws:cloudtrail:*:123456789012:trail/*\",\n",
" ]\n",
" }\n",
" },\n",
" },\n",
" {\n",
" \"Sid\": \"Enable CloudTrail log decrypt permissions\",\n",
" \"Effect\": \"Allow\",\n",
" \"Principal\": {\"AWS\": [role]},\n",
" \"Action\": \"kms:Decrypt\",\n",
" \"Resource\": [\"*\"],\n",
" \"Condition\": {\"Null\": {\"kms:EncryptionContext:aws:cloudtrail:arn\": \"false\"}},\n",
" },\n",
" ],\n",
"}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Create your new KMS key using the policy above and your KMS client. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"try:\n",
" new_kms_key = kms.create_key(\n",
" Policy=json.dumps(policy),\n",
" Description=\"string\",\n",
" KeyUsage=\"ENCRYPT_DECRYPT\",\n",
" CustomerMasterKeySpec=\"SYMMETRIC_DEFAULT\",\n",
" Origin=\"AWS_KMS\",\n",
" )\n",
" AliasName = \"my-new-kms-key\" ## provide a unique alias name\n",
" kms.create_alias(\n",
" AliasName=\"alias/\" + AliasName, TargetKeyId=new_kms_key[\"KeyMetadata\"][\"KeyId\"]\n",
" )\n",
" print(new_kms_key)\n",
"except Exception as e:\n",
" print(\"Error {}\".format(e))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now that we have our KMS key created and the necessary operations added to our role, we now load in our data. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"customer_data = pd.read_csv(\"data/feature_store_introduction_customer.csv\")\n",
"orders_data = pd.read_csv(\"data/feature_store_introduction_orders.csv\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"customer_data.head()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"orders_data.head()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"customer_data.dtypes"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"orders_data.dtypes"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Creating Feature Groups\n",
"\n",
"We first start by creating feature group names for customer_data and orders_data. Following this, we create two Feature Groups, one for customer_dat and another for orders_data"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from time import gmtime, strftime, sleep\n",
"\n",
"customers_feature_group_name = \"customers-feature-group-\" + strftime(\"%d-%H-%M-%S\", gmtime())\n",
"orders_feature_group_name = \"orders-feature-group-\" + strftime(\"%d-%H-%M-%S\", gmtime())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Instantiate a FeatureGroup object for customers_data and orders_data. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from sagemaker.feature_store.feature_group import FeatureGroup\n",
"\n",
"customers_feature_group = FeatureGroup(\n",
" name=customers_feature_group_name, sagemaker_session=sagemaker_session\n",
")\n",
"orders_feature_group = FeatureGroup(\n",
" name=orders_feature_group_name, sagemaker_session=sagemaker_session\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import time\n",
"\n",
"current_time_sec = int(round(time.time()))\n",
"\n",
"record_identifier_feature_name = \"customer_id\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Append EventTime feature to your data frame. This parameter is required, and time stamps each data point."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"customer_data[\"EventTime\"] = pd.Series([current_time_sec] * len(customer_data), dtype=\"float64\")\n",
"orders_data[\"EventTime\"] = pd.Series([current_time_sec] * len(orders_data), dtype=\"float64\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"customer_data.head()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"orders_data.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Load feature definitions to your feature group. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"customers_feature_group.load_feature_definitions(data_frame=customer_data)\n",
"orders_feature_group.load_feature_definitions(data_frame=orders_data)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### How to create an Online or Offline Feature Store that uses your KMS key for encryption?\n",
"\n",
"Below we create two feature groups, `customers_feature_group` and `orders_feature_group` respectively, and explain how use your KMS key to securely encrypt your data in your online or offline feature store. \n",
"\n",
"### How to create an Online Feature store with your KMS key? \n",
"To encrypt data in your online feature store, set `enable_online_store` to be `True` and specify your KMS key as parameter `online_store_kms_key_id`. You will need to substitute your Account number in `arn:aws:kms:us-east-1:123456789012:key/` replacing `123456789012` with your Account number.\n",
"\n",
"```\n",
"customers_feature_group.create(\n",
" s3_uri=f\"s3://{s3_bucket_name}/{prefix}\",\n",
" record_identifier_name=record_identifier_feature_name,\n",
" event_time_feature_name=\"EventTime\",\n",
" role_arn=role,\n",
" enable_online_store=True, \n",
" online_store_kms_key_id = 'arn:aws:kms:us-east-1:123456789012:key/'+ new_kms_key['KeyMetadata']['KeyId']\n",
")\n",
"\n",
"orders_feature_group.create(\n",
" s3_uri=f\"s3://{s3_bucket_name}/{prefix}\",\n",
" record_identifier_name=record_identifier_feature_name,\n",
" event_time_feature_name=\"EventTime\",\n",
" role_arn=role,\n",
" enable_online_store=True,\n",
" online_store_kms_key_id = 'arn:aws:kms:us-east-1:123456789012:key/'+new_kms_key['KeyMetadata']['KeyId']\n",
")\n",
"```\n",
"\n",
"### How to create an Offline Feature store with your KMS key? \n",
"Similar to the above, set `enable_online_store` to be `False` and then specify your KMS key as parameter `offline_store_kms_key_id`. You will need to substitute your Account number in `arn:aws:kms:us-east-1:123456789012:key/` replacing `123456789012` with your Account number.\n",
"\n",
"```\n",
"customers_feature_group.create(\n",
" s3_uri=f\"s3://{s3_bucket_name}/{prefix}\",\n",
" record_identifier_name=record_identifier_feature_name,\n",
" event_time_feature_name=\"EventTime\",\n",
" role_arn=role,\n",
" enable_online_store=False, \n",
" offline_store_kms_key_id = 'arn:aws:kms:us-east-1:123456789012:key/'+ new_kms_key['KeyMetadata']['KeyId']\n",
")\n",
"\n",
"orders_feature_group.create(\n",
" s3_uri=f\"s3://{s3_bucket_name}/{prefix}\",\n",
" record_identifier_name=record_identifier_feature_name,\n",
" event_time_feature_name=\"EventTime\",\n",
" role_arn=role,\n",
" enable_online_store=False,\n",
" offline_store_kms_key_id = 'arn:aws:kms:us-east-1:123456789012:key/'+new_kms_key['KeyMetadata']['KeyId']\n",
")\n",
"```\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For this example we create an online feature store that encrypts your data using your KMS key.\n",
"\n",
"**Note**: You will need to substitute your Account number in `arn:aws:kms:us-east-1:123456789012:key/` replacing `123456789012` with your Account number."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"customers_feature_group.create(\n",
" s3_uri=f\"s3://{s3_bucket_name}/{prefix}\",\n",
" record_identifier_name=record_identifier_feature_name,\n",
" event_time_feature_name=\"EventTime\",\n",
" role_arn=role,\n",
" enable_online_store=False,\n",
" offline_store_kms_key_id=\"arn:aws:kms:us-east-1:123456789012:key/\"\n",
" + new_kms_key[\"KeyMetadata\"][\"KeyId\"],\n",
")\n",
"\n",
"orders_feature_group.create(\n",
" s3_uri=f\"s3://{s3_bucket_name}/{prefix}\",\n",
" record_identifier_name=record_identifier_feature_name,\n",
" event_time_feature_name=\"EventTime\",\n",
" role_arn=role,\n",
" enable_online_store=False,\n",
" offline_store_kms_key_id=\"arn:aws:kms:us-east-1:123456789012:key/\"\n",
" + new_kms_key[\"KeyMetadata\"][\"KeyId\"],\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### How to verify that your KMS key is being used to encrypt your data in your Online or Offline Feature Store? \n",
"\n",
"### Online Store Verification\n",
"To demonstrate that your data is being encrypted in your Online store, use your `kms` client from `boto3` to list the grants under your KMS key. It should show 'SageMakerFeatureStore-' and the name of your feature group you created and should list these operations under Operations:`['Decrypt','Encrypt','GenerateDataKey','ReEncryptFrom','ReEncryptTo','CreateGrant','RetireGrant','DescribeKey']`\n",
"\n",
"An alternative way for you to check that your data is encrypted in your Online store is to check [Cloud Trails](https://console.aws.amazon.com/cloudtrail/) and navigate to your account name. Once here, under General details you should see that SSE-KMS encryption is enabled and with your AWS KMS key shown below it. Below is a screenshot showing this: \n",
"\n",
"\n",
"\n",
"### Offline Store Verification\n",
"To verify that your data in being encrypted in your Offline store, you must navigate to your S3 bucket through the [Console](https://console.aws.amazon.com/s3/home?region=us-east-1) and then navigate to your prefix, offline store, feature group name and into the /data/ folder. Once here, select a parquet file which is the file containing your feature group data. For this example, the directory path in S3 was this: \n",
"\n",
"`Amazon S3/MYBUCKET/PREFIX/123456789012/sagemaker/region/offline-store/customers-feature-group-23-22-44-47/data/year=2021/month=03/day=23/hour=22/20210323T224448Z_IdfObJjhpqLQ5rmG.parquet.` \n",
"\n",
"After selecting the parquet file, navigate to Server-side encryption settings. It should mention that Default encryption is enabled and reference (SSE-KMS) under server-side encryption. If this show, then your data is being encrypted in the offline store. Below is a screenshot of how this should look like in the console: \n",
"\n",
""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For this example since we created a secure Online store using our KMS key, below we use `list_grants` to check that our feature group and required grants are present under operations. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"kms.list_grants(\n",
" KeyId=\"arn:aws:kms:us-east-1:123456789012:key/\" + new_kms_key[\"KeyMetadata\"][\"KeyId\"]\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Clean Up Resources\n",
"Remove the Feature Groups we created. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"customers_feature_group.delete()\n",
"orders_feature_group.delete()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# preserve original sagemaker version\n",
"%pip install 'sagemaker=={}'.format(original_version)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Next Steps\n",
"\n",
"For more information on how to use KMS to encrypt your data in your Feature Store, see [Feature Store Security](https://docs.aws.amazon.com/sagemaker/latest/dg/feature-store-security.html). For general information on KMS keys and CMK, see [Customer Managed Keys](https://docs.aws.amazon.com/kms/latest/developerguide/concepts.html#master_keys). "
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Notebook CI Test Results\n",
"\n",
"This notebook was tested in multiple regions. The test results are as follows, except for us-west-2 which is shown at the top of the notebook.\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n"
]
}
],
"metadata": {
"instance_type": "ml.t3.medium",
"kernelspec": {
"display_name": "Python 3 (Data Science)",
"language": "python",
"name": "python3__SAGEMAKER_INTERNAL__arn:aws:sagemaker:us-east-1:081325390199:image/datascience-1.0"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.10"
}
},
"nbformat": 4,
"nbformat_minor": 4
}