ManagedkdbInsights/boto/welcome.ipynb (1,551 lines of code) (raw):
{
"cells": [
{
"cell_type": "markdown",
"id": "28bea13b-67bd-4a0e-8eab-3b8ffd37259e",
"metadata": {},
"source": [
"# Welcome Notebook\n",
"This notebook walks through the process of creating and populating your first database with FinSpace Managed KX.\n",
"\n",
"## Before you start\n",
"Before you start this notebook, it is assumed you have the following:\n",
"- FinSpace Managed KX environment created in AWS account\n",
"- S3 staging bucket for data and code\n",
" - This notebook boto's profile and the Managed KX environment can access the bucket\n",
"- Setup in ~/.aws directory\n",
" - config is set (json and region)\n",
" - default credentials are set (aws_access_key_id, aws_secret_access_key, aws_session_token)\n",
"\n",
"## Steps\n",
"1. Untar hdb.tar.gz for the hdb data\n",
"2. Upload hdb to staging S3 bucket\n",
"3. Create database\n",
"4. Add HDB data to database\n",
"5. Create a Cluster\n",
"6. Get the connectionString\n",
"7. Query Cluster using PyKX\n",
"\n",
"## Managed kdb Insights Archtecture\n",
"<img src=\"images/Managed kdb Insights-HDB Migration.png\" width=\"60%\">\n"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "d9d543f3-1cd5-4a0e-8be7-a9eb0ac35878",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"import os\n",
"import boto3\n",
"import json\n",
"import datetime\n",
"\n",
"from managed_kx import *\n",
"from env import *"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "d5265616-6aa4-4b7b-b038-8e26e71d19e7",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"# Source data directory\n",
"SOURCE_DATA_DIR=\"hdb\"\n",
"\n",
"# S3 bucket for external data and code\n",
"S3_DEST=f\"s3://{S3_BUCKET}/data/{SOURCE_DATA_DIR}/\"\n",
"CODEBASE=\"code\"\n",
"CODE_PATH=f\"code/{CODEBASE}.zip\"\n",
"\n",
"NODE_COUNT=1\n",
"CACHE_SIZE=1200\n",
"\n",
"# Managed KX Database and Cluster names to create\n",
"DB_NAME=\"welcomedb\"\n",
"#DELETE_CLUSTER=True\n",
"#DELETE_DATABASE=True\n",
"\n",
"create_delete=False\n",
"\n",
"if create_delete:\n",
" TODAY=datetime.datetime.now().strftime(\"%Y%m%d_%H%M\") \n",
" DB_NAME=f\"create_delete_db_{TODAY}\"\n",
" DELETE_CLUSTER=True\n",
" DELETE_DATABASE=True\n",
"\n",
"CLUSTER_NAME=f\"cluster_{DB_NAME}\"\n"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "7e054ddd-3313-4ac3-b0b3-3c93b55e977e",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"# Using credentials and create service client\n",
"session = boto3.Session()\n",
"\n",
"# create finspace client\n",
"client = session.client(service_name='finspace')"
]
},
{
"cell_type": "markdown",
"id": "849f954c-7cfa-4b29-be4f-0854aa7cbd06",
"metadata": {},
"source": [
"# 0. Environment Check\n",
"Be sure the infrastructure ID has been entitled to the bucket you will be staging the HDB to. The environment will also need access to the KMX key used when creating the environment.\n",
"\n",
"## Permission Templates\n",
"\n",
"### S3 Permission\n",
"Example of code and data access to the same S3 bucket.\n",
"\n",
"```\n",
"{\n",
" \"Version\": \"2012-10-17\",\n",
" \"Statement\": [\n",
" {\n",
" \"Effect\": \"Allow\",\n",
" \"Principal\": {\n",
" \"Service\": \"finspace.amazonaws.com\"\n",
" },\n",
" \"Action\": [\n",
" \"s3:GetObject\",\n",
" \"s3:GetObjectTagging\",\n",
" \"s3:ListBucket\"\n",
" ],\n",
" \"Resource\": [\n",
" \"arn:aws:s3:::S3_BUCKET/*\",\n",
" \"arn:aws:s3:::S3_BUCKET\"\n",
" ],\n",
" \"Condition\": {\n",
" \"StringEquals\": {\n",
" \"aws:SourceAccount\": \"ACCOUNT_ID\"\n",
" },\n",
" \"ArnEquals\": {\n",
" \"aws:SourceArn\": \"arn:aws:finspace:us-east-1:ACCOUNT_ID:kxEnvironment/ENV_ID/*\"\n",
" }\n",
" }\n",
" }\n",
" ]\n",
"}\n",
"\n",
"```\n",
"\n",
"### KMS Key\n",
"Be sure the environment has access to use the KMS key given in environment creation.\n",
"\n",
"```\n",
"\"Statement\": [\n",
" {\n",
" \"Sid\": \"Enable Managed kdb Insights Access\",\n",
" \"Effect\": \"Allow\",\n",
" \"Principal\": {\n",
" \"Service\": \"finspace.amazonaws.com\"\n",
" },\n",
" \"Action\": [\n",
" \"kms:Encrypt\",\n",
" \"kms:Decrypt\",\n",
" \"kms:GenerateDataKey\"\n",
" ],\n",
" \"Resource\": \"arn:aws:kms:us-east-1:ACCOUNT_ID:key/KEY_ID\",\n",
" \"Condition\": {\n",
" \"StringEquals\": {\n",
" \"aws:SourceAccount\": \"ACCOUNT_ID\"\n",
" },\n",
" \"ArnLike\": {\n",
" \"aws:SourceArn\": \"arn:aws:finspace:us-east-1:ACCOUNT_ID:kxEnvironment/ENV_ID/*\"\n",
" }\n",
" }\n",
" }\n",
" ]\n",
"```\n"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "9d218119-1aa3-4485-a940-5dcde32b3fc7",
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Environment Information\n",
"{\n",
" \"availabilityZoneIds\": [\n",
" \"use1-az1\",\n",
" \"use1-az6\",\n",
" \"use1-az2\"\n",
" ],\n",
" \"awsAccountId\": \"829845998889\",\n",
" \"certificateAuthorityArn\": \"arn:aws:acm-pca:us-east-1:603187226814:certificate-authority/f9935247-b7f3-456c-b8e0-48b29b70c076\",\n",
" \"creationTimestamp\": \"2023-06-05 20:44:00+00:00\",\n",
" \"dedicatedServiceAccountId\": \"603187226814\",\n",
" \"description\": \"Managed kdb Insights environment\",\n",
" \"dnsStatus\": \"NONE\",\n",
" \"environmentArn\": \"arn:aws:finspace:us-east-1:829845998889:kxEnvironment/jlcenjvtkgzrdek2qqv7ic\",\n",
" \"environmentId\": \"jlcenjvtkgzrdek2qqv7ic\",\n",
" \"kmsKeyId\": \"arn:aws:kms:us-east-1:829845998889:key/6643484a-28c3-46f4-8fba-87e67129e027\",\n",
" \"name\": \"Managed_kdb_20230605\",\n",
" \"status\": \"CREATED\",\n",
" \"tgwStatus\": \"SUCCESSFULLY_UPDATED\",\n",
" \"transitGatewayConfiguration\": {\n",
" \"routableCIDRSpace\": \"100.64.0.0/26\",\n",
" \"transitGatewayID\": \"tgw-0307e3bd926e58138\"\n",
" },\n",
" \"updateTimestamp\": \"2024-11-15 18:23:23.230000+00:00\"\n",
"}\n"
]
}
],
"source": [
"resp=get_kx_environment(client)\n",
"\n",
"print(\"Environment Information\")\n",
"print(json.dumps(resp,sort_keys=True,indent=4,default=str))\n"
]
},
{
"cell_type": "markdown",
"id": "bc29d8fc-c234-4c65-a633-bb9e16d6a772",
"metadata": {},
"source": [
"## 1. Untar hdb.tar.gz\n",
"hdb database will be found in hdb directory"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "93169539-bb7e-4c56-840c-9bb5e9c3de82",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"!rm -rf hdb"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "157b75f5-b582-490e-ae17-eb14eaafa21e",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"!tar -xf hdb.tar.gz"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "3fec4ecf-cba3-440f-a56e-4ec726c9f8a8",
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"total 68\n",
"drwxrwxr-x 12 ec2-user ec2-user 4096 Nov 26 14:44 .\n",
"drwxrwxr-x 7 ec2-user ec2-user 4096 Nov 26 15:02 ..\n",
"drwxrwxr-x 3 ec2-user ec2-user 4096 Nov 26 14:44 2024.11.16\n",
"drwxrwxr-x 3 ec2-user ec2-user 4096 Nov 26 14:44 2024.11.17\n",
"drwxrwxr-x 3 ec2-user ec2-user 4096 Nov 26 14:44 2024.11.18\n",
"drwxrwxr-x 3 ec2-user ec2-user 4096 Nov 26 14:44 2024.11.19\n",
"drwxrwxr-x 3 ec2-user ec2-user 4096 Nov 26 14:44 2024.11.20\n",
"drwxrwxr-x 3 ec2-user ec2-user 4096 Nov 26 14:44 2024.11.21\n",
"drwxrwxr-x 3 ec2-user ec2-user 4096 Nov 26 14:44 2024.11.22\n",
"drwxrwxr-x 3 ec2-user ec2-user 4096 Nov 26 14:44 2024.11.23\n",
"drwxrwxr-x 3 ec2-user ec2-user 4096 Nov 26 14:44 2024.11.24\n",
"drwxrwxr-x 3 ec2-user ec2-user 4096 Nov 26 14:44 2024.11.25\n",
"-rw-rw-r-- 1 ec2-user ec2-user 16392 Nov 26 14:44 sym\n"
]
}
],
"source": [
"!ls -la hdb"
]
},
{
"cell_type": "markdown",
"id": "b3c8cbbe-654e-4385-92bc-5c7b80b5f0f3",
"metadata": {},
"source": [
"# 2. Upload hdb data\n",
"using aws cli, copy hdb to staging bucket"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "af169292-13fc-4b1b-863d-789d5a042d52",
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"upload: hdb/2024.11.16/example/.d to s3://kdb-demo-829845998889-kms/data/hdb/2024.11.16/example/.d\n",
"upload: hdb/2024.11.17/example/.d to s3://kdb-demo-829845998889-kms/data/hdb/2024.11.17/example/.d\n",
"upload: hdb/2024.11.16/example/sym to s3://kdb-demo-829845998889-kms/data/hdb/2024.11.16/example/sym\n",
"upload: hdb/2024.11.16/example/time to s3://kdb-demo-829845998889-kms/data/hdb/2024.11.16/example/time\n",
"upload: hdb/2024.11.18/example/.d to s3://kdb-demo-829845998889-kms/data/hdb/2024.11.18/example/.d\n",
"upload: hdb/2024.11.19/example/.d to s3://kdb-demo-829845998889-kms/data/hdb/2024.11.19/example/.d\n",
"upload: hdb/2024.11.17/example/sym to s3://kdb-demo-829845998889-kms/data/hdb/2024.11.17/example/sym\n",
"upload: hdb/2024.11.18/example/sym to s3://kdb-demo-829845998889-kms/data/hdb/2024.11.18/example/sym\n",
"upload: hdb/2024.11.19/example/time to s3://kdb-demo-829845998889-kms/data/hdb/2024.11.19/example/time\n",
"upload: hdb/2024.11.20/example/.d to s3://kdb-demo-829845998889-kms/data/hdb/2024.11.20/example/.d\n",
"upload: hdb/2024.11.19/example/sym to s3://kdb-demo-829845998889-kms/data/hdb/2024.11.19/example/sym\n",
"upload: hdb/2024.11.20/example/sym to s3://kdb-demo-829845998889-kms/data/hdb/2024.11.20/example/sym\n",
"upload: hdb/2024.11.20/example/time to s3://kdb-demo-829845998889-kms/data/hdb/2024.11.20/example/time\n",
"upload: hdb/2024.11.17/example/number to s3://kdb-demo-829845998889-kms/data/hdb/2024.11.17/example/number\n",
"upload: hdb/2024.11.16/example/number to s3://kdb-demo-829845998889-kms/data/hdb/2024.11.16/example/number\n",
"upload: hdb/2024.11.21/example/.d to s3://kdb-demo-829845998889-kms/data/hdb/2024.11.21/example/.d\n",
"upload: hdb/2024.11.22/example/.d to s3://kdb-demo-829845998889-kms/data/hdb/2024.11.22/example/.d\n",
"upload: hdb/2024.11.17/example/time to s3://kdb-demo-829845998889-kms/data/hdb/2024.11.17/example/time\n",
"upload: hdb/2024.11.18/example/time to s3://kdb-demo-829845998889-kms/data/hdb/2024.11.18/example/time\n",
"upload: hdb/2024.11.22/example/time to s3://kdb-demo-829845998889-kms/data/hdb/2024.11.22/example/time\n",
"upload: hdb/2024.11.18/example/number to s3://kdb-demo-829845998889-kms/data/hdb/2024.11.18/example/number\n",
"upload: hdb/2024.11.23/example/.d to s3://kdb-demo-829845998889-kms/data/hdb/2024.11.23/example/.d\n",
"upload: hdb/2024.11.21/example/time to s3://kdb-demo-829845998889-kms/data/hdb/2024.11.21/example/time\n",
"upload: hdb/2024.11.22/example/sym to s3://kdb-demo-829845998889-kms/data/hdb/2024.11.22/example/sym\n",
"upload: hdb/2024.11.23/example/time to s3://kdb-demo-829845998889-kms/data/hdb/2024.11.23/example/time\n",
"upload: hdb/2024.11.23/example/sym to s3://kdb-demo-829845998889-kms/data/hdb/2024.11.23/example/sym\n",
"upload: hdb/2024.11.21/example/sym to s3://kdb-demo-829845998889-kms/data/hdb/2024.11.21/example/sym\n",
"upload: hdb/2024.11.20/example/number to s3://kdb-demo-829845998889-kms/data/hdb/2024.11.20/example/number\n",
"upload: hdb/2024.11.24/example/.d to s3://kdb-demo-829845998889-kms/data/hdb/2024.11.24/example/.d\n",
"upload: hdb/2024.11.24/example/sym to s3://kdb-demo-829845998889-kms/data/hdb/2024.11.24/example/sym\n",
"upload: hdb/2024.11.25/example/.d to s3://kdb-demo-829845998889-kms/data/hdb/2024.11.25/example/.d\n",
"upload: hdb/2024.11.22/example/number to s3://kdb-demo-829845998889-kms/data/hdb/2024.11.22/example/number\n",
"upload: hdb/2024.11.21/example/number to s3://kdb-demo-829845998889-kms/data/hdb/2024.11.21/example/number\n",
"upload: hdb/2024.11.24/example/time to s3://kdb-demo-829845998889-kms/data/hdb/2024.11.24/example/time\n",
"upload: hdb/2024.11.19/example/number to s3://kdb-demo-829845998889-kms/data/hdb/2024.11.19/example/number\n",
"upload: hdb/2024.11.24/example/number to s3://kdb-demo-829845998889-kms/data/hdb/2024.11.24/example/number\n",
"upload: hdb/2024.11.25/example/time to s3://kdb-demo-829845998889-kms/data/hdb/2024.11.25/example/time\n",
"upload: hdb/2024.11.25/example/sym to s3://kdb-demo-829845998889-kms/data/hdb/2024.11.25/example/sym\n",
"upload: hdb/sym to s3://kdb-demo-829845998889-kms/data/hdb/sym \n",
"upload: hdb/2024.11.25/example/number to s3://kdb-demo-829845998889-kms/data/hdb/2024.11.25/example/number\n",
"upload: hdb/2024.11.23/example/number to s3://kdb-demo-829845998889-kms/data/hdb/2024.11.23/example/number\n",
" PRE 2023.04.14/\n",
" PRE 2023.04.15/\n",
" PRE 2023.04.16/\n",
" PRE 2023.04.17/\n",
" PRE 2023.04.18/\n",
" PRE 2023.04.19/\n",
" PRE 2023.04.20/\n",
" PRE 2023.04.21/\n",
" PRE 2023.04.22/\n",
" PRE 2023.04.23/\n",
" PRE 2024.11.04/\n",
" PRE 2024.11.05/\n",
" PRE 2024.11.06/\n",
" PRE 2024.11.07/\n",
" PRE 2024.11.08/\n",
" PRE 2024.11.11/\n",
" PRE 2024.11.12/\n",
" PRE 2024.11.13/\n",
" PRE 2024.11.16/\n",
" PRE 2024.11.17/\n",
" PRE 2024.11.18/\n",
" PRE 2024.11.19/\n",
" PRE 2024.11.20/\n",
" PRE 2024.11.21/\n",
" PRE 2024.11.22/\n",
" PRE 2024.11.23/\n",
" PRE 2024.11.24/\n",
" PRE 2024.11.25/\n",
"2024-11-26 15:02:36 16392 sym\n"
]
},
{
"data": {
"text/plain": [
"0"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cp = \"\"\n",
"\n",
"if AWS_ACCESS_KEY_ID is not None:\n",
" cp = f\"\"\"\n",
"export AWS_ACCESS_KEY_ID={AWS_ACCESS_KEY_ID}\n",
"export AWS_SECRET_ACCESS_KEY={AWS_SECRET_ACCESS_KEY}\n",
"export AWS_SESSION_TOKEN={AWS_SESSION_TOKEN}\n",
"\"\"\"\n",
"\n",
"cp += f\"\"\"\n",
"aws s3 sync --exclude .DS_Store {SOURCE_DATA_DIR} {S3_DEST}\n",
"aws s3 ls {S3_DEST}\n",
"\"\"\"\n",
" \n",
"# execute the S3 copy\n",
"os.system(cp)"
]
},
{
"cell_type": "markdown",
"id": "67476efe-d308-4158-9e24-8fbe71509f76",
"metadata": {},
"source": [
"## 3. Create database"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "83d00c39-876a-4bba-ab66-a3aa4fb9b65a",
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CREATING Database: welcomedb\n",
"CREATED Database: welcomedb\n",
"{\n",
" \"createdTimestamp\": \"2024-11-26 15:02:36.831000+00:00\",\n",
" \"databaseArn\": \"arn:aws:finspace:us-east-1:829845998889:kxEnvironment/jlcenjvtkgzrdek2qqv7ic/kxDatabase/welcomedb\",\n",
" \"databaseName\": \"welcomedb\",\n",
" \"description\": \"Welcome kdb database\",\n",
" \"environmentId\": \"jlcenjvtkgzrdek2qqv7ic\",\n",
" \"lastModifiedTimestamp\": \"2024-11-26 15:02:36.831000+00:00\"\n",
"}\n"
]
}
],
"source": [
"# assume it exists\n",
"create_db=False\n",
"\n",
"try:\n",
" resp = client.get_kx_database(environmentId=ENV_ID, databaseName=DB_NAME)\n",
" resp.pop('ResponseMetadata', None)\n",
"except:\n",
" # does not exist, will create\n",
" create_db=True\n",
"\n",
"if create_db:\n",
" print(f\"CREATING Database: {DB_NAME}\")\n",
" resp = client.create_kx_database(environmentId=ENV_ID, databaseName=DB_NAME, description=\"Welcome kdb database\")\n",
" resp.pop('ResponseMetadata', None)\n",
"\n",
" print(f\"CREATED Database: {DB_NAME}\")\n",
"\n",
"print(json.dumps(resp,sort_keys=True,indent=4,default=str))"
]
},
{
"cell_type": "markdown",
"id": "a41c84b3-2243-4abb-9032-8ae77a5e31f7",
"metadata": {},
"source": [
"## 4. Add HDB data to database"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "60b3a8df-c7ed-4837-99e3-07d95e7fbac0",
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Changeset...\n",
"{\n",
" \"changeRequests\": [\n",
" {\n",
" \"changeType\": \"PUT\",\n",
" \"dbPath\": \"/2024.11.19/\",\n",
" \"s3Path\": \"s3://kdb-demo-829845998889-kms/data/hdb/2024.11.19/\"\n",
" },\n",
" {\n",
" \"changeType\": \"PUT\",\n",
" \"dbPath\": \"/2024.11.17/\",\n",
" \"s3Path\": \"s3://kdb-demo-829845998889-kms/data/hdb/2024.11.17/\"\n",
" },\n",
" {\n",
" \"changeType\": \"PUT\",\n",
" \"dbPath\": \"/2024.11.25/\",\n",
" \"s3Path\": \"s3://kdb-demo-829845998889-kms/data/hdb/2024.11.25/\"\n",
" },\n",
" {\n",
" \"changeType\": \"PUT\",\n",
" \"dbPath\": \"/2024.11.22/\",\n",
" \"s3Path\": \"s3://kdb-demo-829845998889-kms/data/hdb/2024.11.22/\"\n",
" },\n",
" {\n",
" \"changeType\": \"PUT\",\n",
" \"dbPath\": \"/2024.11.23/\",\n",
" \"s3Path\": \"s3://kdb-demo-829845998889-kms/data/hdb/2024.11.23/\"\n",
" },\n",
" {\n",
" \"changeType\": \"PUT\",\n",
" \"dbPath\": \"/2024.11.18/\",\n",
" \"s3Path\": \"s3://kdb-demo-829845998889-kms/data/hdb/2024.11.18/\"\n",
" },\n",
" {\n",
" \"changeType\": \"PUT\",\n",
" \"dbPath\": \"/2024.11.16/\",\n",
" \"s3Path\": \"s3://kdb-demo-829845998889-kms/data/hdb/2024.11.16/\"\n",
" },\n",
" {\n",
" \"changeType\": \"PUT\",\n",
" \"dbPath\": \"/2024.11.24/\",\n",
" \"s3Path\": \"s3://kdb-demo-829845998889-kms/data/hdb/2024.11.24/\"\n",
" },\n",
" {\n",
" \"changeType\": \"PUT\",\n",
" \"dbPath\": \"/2024.11.21/\",\n",
" \"s3Path\": \"s3://kdb-demo-829845998889-kms/data/hdb/2024.11.21/\"\n",
" },\n",
" {\n",
" \"changeType\": \"PUT\",\n",
" \"dbPath\": \"/2024.11.20/\",\n",
" \"s3Path\": \"s3://kdb-demo-829845998889-kms/data/hdb/2024.11.20/\"\n",
" },\n",
" {\n",
" \"changeType\": \"PUT\",\n",
" \"dbPath\": \"/\",\n",
" \"s3Path\": \"s3://kdb-demo-829845998889-kms/data/hdb/sym\"\n",
" }\n",
" ],\n",
" \"changesetId\": \"lsm0f3Z2MgmYnY89ZK2Mjw\",\n",
" \"createdTimestamp\": \"2024-11-26 15:02:37.550000+00:00\",\n",
" \"databaseName\": \"welcomedb\",\n",
" \"environmentId\": \"jlcenjvtkgzrdek2qqv7ic\",\n",
" \"lastModifiedTimestamp\": \"2024-11-26 15:02:37.550000+00:00\",\n",
" \"status\": \"PENDING\"\n",
"}\n"
]
}
],
"source": [
"changes=[]\n",
"\n",
"for f in os.listdir(\"hdb\"):\n",
" if os.path.isdir(f\"hdb/{f}\"):\n",
" changes.append( { 'changeType': 'PUT', 's3Path': f\"{S3_DEST}{f}/\", 'dbPath': f\"/{f}/\" } )\n",
" else:\n",
" changes.append( { 'changeType': 'PUT', 's3Path': f\"{S3_DEST}{f}\", 'dbPath': f\"/\" } )\n",
"\n",
"resp = client.create_kx_changeset(environmentId=ENV_ID, databaseName=DB_NAME, changeRequests=changes)\n",
"\n",
"resp.pop('ResponseMetadata', None)\n",
"changeset_id = resp['changesetId']\n",
"\n",
"print(\"Changeset...\")\n",
"print(json.dumps(resp,sort_keys=True,indent=4,default=str))"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "4b344419-1261-43a3-89aa-f682ec54b0b2",
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Status is IN_PROGRESS, total wait 0:00:00, waiting 10 sec ...\n",
"Status is IN_PROGRESS, total wait 0:00:10, waiting 10 sec ...\n",
"Status is IN_PROGRESS, total wait 0:00:20, waiting 10 sec ...\n"
]
},
{
"data": {
"text/plain": [
"{'changesetId': 'lsm0f3Z2MgmYnY89ZK2Mjw',\n",
" 'databaseName': 'welcomedb',\n",
" 'environmentId': 'jlcenjvtkgzrdek2qqv7ic',\n",
" 'changeRequests': [{'changeType': 'PUT',\n",
" 's3Path': 's3://kdb-demo-829845998889-kms/data/hdb/2024.11.19/',\n",
" 'dbPath': '/2024.11.19/'},\n",
" {'changeType': 'PUT',\n",
" 's3Path': 's3://kdb-demo-829845998889-kms/data/hdb/2024.11.17/',\n",
" 'dbPath': '/2024.11.17/'},\n",
" {'changeType': 'PUT',\n",
" 's3Path': 's3://kdb-demo-829845998889-kms/data/hdb/2024.11.25/',\n",
" 'dbPath': '/2024.11.25/'},\n",
" {'changeType': 'PUT',\n",
" 's3Path': 's3://kdb-demo-829845998889-kms/data/hdb/2024.11.22/',\n",
" 'dbPath': '/2024.11.22/'},\n",
" {'changeType': 'PUT',\n",
" 's3Path': 's3://kdb-demo-829845998889-kms/data/hdb/2024.11.23/',\n",
" 'dbPath': '/2024.11.23/'},\n",
" {'changeType': 'PUT',\n",
" 's3Path': 's3://kdb-demo-829845998889-kms/data/hdb/2024.11.18/',\n",
" 'dbPath': '/2024.11.18/'},\n",
" {'changeType': 'PUT',\n",
" 's3Path': 's3://kdb-demo-829845998889-kms/data/hdb/2024.11.16/',\n",
" 'dbPath': '/2024.11.16/'},\n",
" {'changeType': 'PUT',\n",
" 's3Path': 's3://kdb-demo-829845998889-kms/data/hdb/2024.11.24/',\n",
" 'dbPath': '/2024.11.24/'},\n",
" {'changeType': 'PUT',\n",
" 's3Path': 's3://kdb-demo-829845998889-kms/data/hdb/2024.11.21/',\n",
" 'dbPath': '/2024.11.21/'},\n",
" {'changeType': 'PUT',\n",
" 's3Path': 's3://kdb-demo-829845998889-kms/data/hdb/2024.11.20/',\n",
" 'dbPath': '/2024.11.20/'},\n",
" {'changeType': 'PUT',\n",
" 's3Path': 's3://kdb-demo-829845998889-kms/data/hdb/sym',\n",
" 'dbPath': '/'}],\n",
" 'createdTimestamp': datetime.datetime(2024, 11, 26, 15, 2, 37, 550000, tzinfo=tzlocal()),\n",
" 'activeFromTimestamp': datetime.datetime(2024, 11, 26, 15, 2, 58, 53000, tzinfo=tzlocal()),\n",
" 'lastModifiedTimestamp': datetime.datetime(2024, 11, 26, 15, 2, 58, 53000, tzinfo=tzlocal()),\n",
" 'status': 'COMPLETED'}"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"wait_for_changeset_status(client, environmentId=ENV_ID, databaseName=DB_NAME, changesetId=changeset_id, show_wait=True)"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "5c1d2691-fe9a-47a7-8b1c-55ee1283e702",
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"====================================================================================================\n",
"Database: welcomedb, Changesets: 1 \n",
"====================================================================================================\n",
" Changeset: lsm0f3Z2MgmYnY89ZK2Mjw: Created: 2024-11-26 15:02:37.550000+00:00 (COMPLETED)\n"
]
},
{
"data": {
"text/html": [
"<style type=\"text/css\">\n",
"</style>\n",
"<table id=\"T_14bb3\">\n",
" <thead>\n",
" <tr>\n",
" <th id=\"T_14bb3_level0_col0\" class=\"col_heading level0 col0\" >changeType</th>\n",
" <th id=\"T_14bb3_level0_col1\" class=\"col_heading level0 col1\" >s3Path</th>\n",
" <th id=\"T_14bb3_level0_col2\" class=\"col_heading level0 col2\" >dbPath</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <td id=\"T_14bb3_row0_col0\" class=\"data row0 col0\" >PUT</td>\n",
" <td id=\"T_14bb3_row0_col1\" class=\"data row0 col1\" >s3://kdb-demo-829845998889-kms/data/hdb/2024.11.19/</td>\n",
" <td id=\"T_14bb3_row0_col2\" class=\"data row0 col2\" >/2024.11.19/</td>\n",
" </tr>\n",
" <tr>\n",
" <td id=\"T_14bb3_row1_col0\" class=\"data row1 col0\" >PUT</td>\n",
" <td id=\"T_14bb3_row1_col1\" class=\"data row1 col1\" >s3://kdb-demo-829845998889-kms/data/hdb/2024.11.17/</td>\n",
" <td id=\"T_14bb3_row1_col2\" class=\"data row1 col2\" >/2024.11.17/</td>\n",
" </tr>\n",
" <tr>\n",
" <td id=\"T_14bb3_row2_col0\" class=\"data row2 col0\" >PUT</td>\n",
" <td id=\"T_14bb3_row2_col1\" class=\"data row2 col1\" >s3://kdb-demo-829845998889-kms/data/hdb/2024.11.25/</td>\n",
" <td id=\"T_14bb3_row2_col2\" class=\"data row2 col2\" >/2024.11.25/</td>\n",
" </tr>\n",
" <tr>\n",
" <td id=\"T_14bb3_row3_col0\" class=\"data row3 col0\" >PUT</td>\n",
" <td id=\"T_14bb3_row3_col1\" class=\"data row3 col1\" >s3://kdb-demo-829845998889-kms/data/hdb/2024.11.22/</td>\n",
" <td id=\"T_14bb3_row3_col2\" class=\"data row3 col2\" >/2024.11.22/</td>\n",
" </tr>\n",
" <tr>\n",
" <td id=\"T_14bb3_row4_col0\" class=\"data row4 col0\" >PUT</td>\n",
" <td id=\"T_14bb3_row4_col1\" class=\"data row4 col1\" >s3://kdb-demo-829845998889-kms/data/hdb/2024.11.23/</td>\n",
" <td id=\"T_14bb3_row4_col2\" class=\"data row4 col2\" >/2024.11.23/</td>\n",
" </tr>\n",
" <tr>\n",
" <td id=\"T_14bb3_row5_col0\" class=\"data row5 col0\" >PUT</td>\n",
" <td id=\"T_14bb3_row5_col1\" class=\"data row5 col1\" >s3://kdb-demo-829845998889-kms/data/hdb/2024.11.18/</td>\n",
" <td id=\"T_14bb3_row5_col2\" class=\"data row5 col2\" >/2024.11.18/</td>\n",
" </tr>\n",
" <tr>\n",
" <td id=\"T_14bb3_row6_col0\" class=\"data row6 col0\" >PUT</td>\n",
" <td id=\"T_14bb3_row6_col1\" class=\"data row6 col1\" >s3://kdb-demo-829845998889-kms/data/hdb/2024.11.16/</td>\n",
" <td id=\"T_14bb3_row6_col2\" class=\"data row6 col2\" >/2024.11.16/</td>\n",
" </tr>\n",
" <tr>\n",
" <td id=\"T_14bb3_row7_col0\" class=\"data row7 col0\" >PUT</td>\n",
" <td id=\"T_14bb3_row7_col1\" class=\"data row7 col1\" >s3://kdb-demo-829845998889-kms/data/hdb/2024.11.24/</td>\n",
" <td id=\"T_14bb3_row7_col2\" class=\"data row7 col2\" >/2024.11.24/</td>\n",
" </tr>\n",
" <tr>\n",
" <td id=\"T_14bb3_row8_col0\" class=\"data row8 col0\" >PUT</td>\n",
" <td id=\"T_14bb3_row8_col1\" class=\"data row8 col1\" >s3://kdb-demo-829845998889-kms/data/hdb/2024.11.21/</td>\n",
" <td id=\"T_14bb3_row8_col2\" class=\"data row8 col2\" >/2024.11.21/</td>\n",
" </tr>\n",
" <tr>\n",
" <td id=\"T_14bb3_row9_col0\" class=\"data row9 col0\" >PUT</td>\n",
" <td id=\"T_14bb3_row9_col1\" class=\"data row9 col1\" >s3://kdb-demo-829845998889-kms/data/hdb/2024.11.20/</td>\n",
" <td id=\"T_14bb3_row9_col2\" class=\"data row9 col2\" >/2024.11.20/</td>\n",
" </tr>\n",
" <tr>\n",
" <td id=\"T_14bb3_row10_col0\" class=\"data row10 col0\" >PUT</td>\n",
" <td id=\"T_14bb3_row10_col1\" class=\"data row10 col1\" >s3://kdb-demo-829845998889-kms/data/hdb/sym</td>\n",
" <td id=\"T_14bb3_row10_col2\" class=\"data row10 col2\" >/</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n"
],
"text/plain": [
"<pandas.io.formats.style.Styler at 0x7f1d4f52b490>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"note_str = \"\"\n",
"\n",
"c_set_list = list_kx_changesets(client, environmentId=ENV_ID, databaseName=DB_NAME)\n",
"\n",
"if len(c_set_list) == 0:\n",
" note_str = \"<<Could not get changesets>>\"\n",
"\n",
"print(100*\"=\")\n",
"print(f\"Database: {DB_NAME}, Changesets: {len(c_set_list)} {note_str}\")\n",
"print(100*\"=\")\n",
"\n",
"# sort by create time\n",
"c_set_list = sorted(c_set_list, key=lambda d: d['createdTimestamp']) \n",
"\n",
"for c in c_set_list:\n",
" c_set_id = c['changesetId']\n",
" print(f\" Changeset: {c_set_id}: Created: {c['createdTimestamp']} ({c['status']})\")\n",
" c_rqs = client.get_kx_changeset(environmentId=ENV_ID, databaseName=DB_NAME, changesetId=c_set_id)['changeRequests']\n",
"\n",
" chs_pdf = pd.DataFrame.from_dict(c_rqs).style.hide(axis='index')\n",
" display(chs_pdf)"
]
},
{
"cell_type": "markdown",
"id": "9e42568f-5255-49d5-9bd9-0fd55298200d",
"metadata": {},
"source": [
"## 5. Create a Cluster for the database"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "a89c57ce-896b-47d4-b71e-5984c5e5a3fe",
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"updating: init.q (deflated 35%)\n",
"updating: lib.q (stored 0%)\n",
"upload: ./code.zip to s3://kdb-demo-829845998889-kms/code/code.zip\n",
"2023-06-05 21:25:21 0 \n",
"2024-11-14 17:42:32 16585 basictick.zip\n",
"2024-11-15 18:24:29 1184 bmll.zip\n",
"2024-11-26 15:03:11 455 code.zip\n",
"2023-12-21 19:47:37 574 codebundle.zip\n",
"2024-02-02 21:34:56 582 codebundle1.zip\n",
"2023-12-21 21:26:00 582 codebundle2.zip\n",
"2024-11-14 17:30:52 2607 dbmaint.zip\n",
"2024-09-04 17:42:17 556 foo.q.zip\n",
"2023-11-22 14:58:53 1530 jpmc_code.zip\n",
"2024-01-01 19:57:08 33781 kdb-tick-flat-largetable.zip\n",
"2023-12-30 22:56:33 38867 kdb-tick-flat.zip\n",
"2024-01-08 13:05:33 28741 kdb-tick.zip\n",
"2023-08-22 16:58:18 765 qcode.zip\n",
"2024-10-16 22:31:45 465 taqcode.zip\n",
"2024-04-26 16:38:46 487423 torq_app.zip\n",
"2024-03-06 19:01:11 5807282 torq_app_20240306_1901.zip\n",
"2024-03-06 19:13:22 5807290 torq_app_20240306_1913.zip\n",
"2024-03-13 15:57:24 5807307 torq_app_20240313_1557.zip\n",
"2024-03-13 18:16:01 5807310 torq_app_20240313_1815.zip\n",
"2024-03-14 16:03:45 5807310 torq_app_20240314_1603.zip\n",
"2024-03-15 16:59:48 5807310 torq_app_20240315_1659.zip\n",
"2024-03-18 20:09:13 8925181 torq_app_20240318_2009.zip\n",
"2024-01-30 16:52:19 3583 tradeplus.zip\n"
]
},
{
"data": {
"text/plain": [
"0"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# zip the code\n",
"#os.system(f\"zip -r -X {CODEBASE}.zip {CODEBASE} -x '*.ipynb_checkpoints*'\")\n",
"os.system(f\"cd {CODEBASE}; zip -r -X ../{CODEBASE}.zip . -x '*.ipynb_checkpoints*' '.DS_Store';\")\n",
"\n",
"# copy code to S3\n",
"cp = \"\"\n",
"\n",
"if AWS_ACCESS_KEY_ID is not None:\n",
" cp = f\"\"\"\n",
"export AWS_ACCESS_KEY_ID={AWS_ACCESS_KEY_ID}\n",
"export AWS_SECRET_ACCESS_KEY={AWS_SECRET_ACCESS_KEY}\n",
"export AWS_SESSION_TOKEN={AWS_SESSION_TOKEN}\n",
"\"\"\"\n",
"\n",
"cp += f\"\"\"\n",
"aws s3 cp --exclude .DS_Store {CODEBASE}.zip s3://{S3_BUCKET}/code/{CODEBASE}.zip\n",
"aws s3 ls s3://{S3_BUCKET}/code/\n",
"\"\"\"\n",
"\n",
"# execute the S3 copy\n",
"os.system(cp)"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "36142f08-5a68-4c22-993e-e169628133e2",
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Creating: cluster_welcomedb\n"
]
}
],
"source": [
"print(f\"Creating: {CLUSTER_NAME}\")\n",
"\n",
"resp = client.create_kx_cluster(\n",
" environmentId=ENV_ID, \n",
" clusterName=CLUSTER_NAME,\n",
" clusterDescription=f\"Demo Cluster for database {DB_NAME}\",\n",
" clusterType='HDB',\n",
" releaseLabel = '1.0',\n",
" capacityConfiguration={ \"nodeType\": \"kx.s.large\", \"nodeCount\": NODE_COUNT },\n",
" databases=[{ \n",
" 'databaseName': DB_NAME, \n",
" 'cacheConfigurations': [\n",
" {'dbPaths':['/'], 'cacheType': 'CACHE_250' }\n",
" ] \n",
" }],\n",
" cacheStorageConfigurations=[{ 'type': 'CACHE_250', 'size': CACHE_SIZE }],\n",
" azMode=AZ_MODE,\n",
" availabilityZoneId=AZ_ID,\n",
" vpcConfiguration={ \n",
" 'vpcId': VPC_ID,\n",
" 'securityGroupIds': SECURITY_GROUPS,\n",
" 'subnetIds': SUBNET_IDS,\n",
" 'ipAddressType': 'IP_V4' },\n",
" code={ 's3Bucket': S3_BUCKET, 's3Key': CODE_PATH },\n",
" initializationScript=f\"init.q\",\n",
" commandLineArguments=[\n",
" {'key': 's', 'value': '2'}, \n",
" {'key': 'dbname', 'value': DB_NAME}, \n",
" ]\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "4c21dbb8-9878-441f-997b-f27164e1e6bf",
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Cluster: cluster_welcomedb status is PENDING, total wait 0:00:00, waiting 30 sec ...\n",
"Cluster: cluster_welcomedb status is CREATING, total wait 0:00:30, waiting 30 sec ...\n",
"Cluster: cluster_welcomedb status is CREATING, total wait 0:01:00, waiting 30 sec ...\n",
"Cluster: cluster_welcomedb status is CREATING, total wait 0:01:30, waiting 30 sec ...\n",
"Cluster: cluster_welcomedb status is CREATING, total wait 0:02:00, waiting 30 sec ...\n",
"Cluster: cluster_welcomedb status is CREATING, total wait 0:02:30, waiting 30 sec ...\n",
"Cluster: cluster_welcomedb status is CREATING, total wait 0:03:00, waiting 30 sec ...\n",
"Cluster: cluster_welcomedb status is CREATING, total wait 0:03:30, waiting 30 sec ...\n",
"Cluster: cluster_welcomedb status is CREATING, total wait 0:04:00, waiting 30 sec ...\n",
"Cluster: cluster_welcomedb status is CREATING, total wait 0:04:30, waiting 30 sec ...\n",
"Cluster: cluster_welcomedb status is CREATING, total wait 0:05:00, waiting 30 sec ...\n",
"Cluster: cluster_welcomedb status is CREATING, total wait 0:05:30, waiting 30 sec ...\n",
"Cluster: cluster_welcomedb status is CREATING, total wait 0:06:00, waiting 30 sec ...\n",
"Cluster: cluster_welcomedb status is CREATING, total wait 0:06:30, waiting 30 sec ...\n",
"Cluster: cluster_welcomedb status is CREATING, total wait 0:07:00, waiting 30 sec ...\n",
"Cluster: cluster_welcomedb status is CREATING, total wait 0:07:30, waiting 30 sec ...\n",
"Cluster: cluster_welcomedb status is CREATING, total wait 0:08:00, waiting 30 sec ...\n",
"Cluster: cluster_welcomedb status is CREATING, total wait 0:08:30, waiting 30 sec ...\n",
"Cluster: cluster_welcomedb status is CREATING, total wait 0:09:00, waiting 30 sec ...\n",
"Cluster: cluster_welcomedb status is CREATING, total wait 0:09:30, waiting 30 sec ...\n",
"Cluster: cluster_welcomedb status is CREATING, total wait 0:10:00, waiting 30 sec ...\n",
"Cluster: cluster_welcomedb status is CREATING, total wait 0:10:30, waiting 30 sec ...\n",
"Cluster: cluster_welcomedb status is CREATING, total wait 0:11:00, waiting 30 sec ...\n",
"Cluster: cluster_welcomedb status is CREATING, total wait 0:11:30, waiting 30 sec ...\n",
"Cluster: cluster_welcomedb status is CREATING, total wait 0:12:00, waiting 30 sec ...\n",
"Cluster: cluster_welcomedb status is CREATING, total wait 0:12:30, waiting 30 sec ...\n",
"Cluster: cluster_welcomedb status is CREATING, total wait 0:13:00, waiting 30 sec ...\n",
"Cluster: cluster_welcomedb status is CREATING, total wait 0:13:30, waiting 30 sec ...\n",
"Cluster: cluster_welcomedb status is CREATING, total wait 0:14:00, waiting 30 sec ...\n",
"Cluster: cluster_welcomedb status is CREATING, total wait 0:14:30, waiting 30 sec ...\n",
"Cluster: cluster_welcomedb status is CREATING, total wait 0:15:00, waiting 30 sec ...\n",
"Cluster: cluster_welcomedb status is CREATING, total wait 0:15:30, waiting 30 sec ...\n",
"Cluster: cluster_welcomedb status is CREATING, total wait 0:16:00, waiting 30 sec ...\n",
"Cluster: cluster_welcomedb status is CREATING, total wait 0:16:30, waiting 30 sec ...\n",
"Cluster: cluster_welcomedb status is CREATING, total wait 0:17:00, waiting 30 sec ...\n",
"Cluster: cluster_welcomedb status is CREATING, total wait 0:17:30, waiting 30 sec ...\n",
"Cluster: cluster_welcomedb status is CREATING, total wait 0:18:00, waiting 30 sec ...\n",
"Cluster: cluster_welcomedb status is CREATING, total wait 0:18:30, waiting 30 sec ...\n",
"Cluster: cluster_welcomedb status is CREATING, total wait 0:19:00, waiting 30 sec ...\n",
"Cluster: cluster_welcomedb status is CREATING, total wait 0:19:30, waiting 30 sec ...\n",
"Cluster: cluster_welcomedb status is now RUNNING, total wait 0:20:00\n",
"\n",
"** DONE **\n"
]
}
],
"source": [
"wait_for_cluster_status(client, environmentId=ENV_ID, clusterName=CLUSTER_NAME, show_wait=True)\n",
"print()\n",
"print(\"** DONE **\")"
]
},
{
"cell_type": "markdown",
"id": "fb1a5cac-c0f6-4478-8f67-0c1060fe986a",
"metadata": {
"tags": []
},
"source": [
"## 6. Get the Connection String\n",
"This assumes that the IAM role exists and the user (KDB_USERNAME) have beed already added as well."
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "477af774-ee69-4cd8-bb79-0bb983b6fb91",
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Cluster: --------------------------------------------------------------------------------\n",
"{\n",
" \"availabilityZoneId\": \"use1-az6\",\n",
" \"azMode\": \"SINGLE\",\n",
" \"cacheStorageConfigurations\": [\n",
" {\n",
" \"size\": 1200,\n",
" \"type\": \"CACHE_250\"\n",
" }\n",
" ],\n",
" \"capacityConfiguration\": {\n",
" \"nodeCount\": 1,\n",
" \"nodeType\": \"kx.s.large\"\n",
" },\n",
" \"clusterDescription\": \"Demo Cluster for database welcomedb\",\n",
" \"clusterName\": \"cluster_welcomedb\",\n",
" \"clusterType\": \"HDB\",\n",
" \"code\": {\n",
" \"s3Bucket\": \"kdb-demo-829845998889-kms\",\n",
" \"s3Key\": \"code/code.zip\"\n",
" },\n",
" \"commandLineArguments\": [\n",
" {\n",
" \"key\": \"s\",\n",
" \"value\": \"2\"\n",
" },\n",
" {\n",
" \"key\": \"dbname\",\n",
" \"value\": \"welcomedb\"\n",
" }\n",
" ],\n",
" \"createdTimestamp\": \"2024-11-26 16:28:37.307000+00:00\",\n",
" \"databases\": [\n",
" {\n",
" \"cacheConfigurations\": [\n",
" {\n",
" \"cacheType\": \"CACHE_250\",\n",
" \"dbPaths\": [\n",
" \"/\"\n",
" ]\n",
" }\n",
" ],\n",
" \"changesetId\": \"lsm0f3Z2MgmYnY89ZK2Mjw\",\n",
" \"databaseName\": \"welcomedb\"\n",
" }\n",
" ],\n",
" \"initializationScript\": \"init.q\",\n",
" \"lastModifiedTimestamp\": \"2024-11-26 16:48:29.956000+00:00\",\n",
" \"releaseLabel\": \"1.0\",\n",
" \"status\": \"RUNNING\",\n",
" \"volumes\": [],\n",
" \"vpcConfiguration\": {\n",
" \"ipAddressType\": \"IP_V4\",\n",
" \"securityGroupIds\": [\n",
" \"sg-0c99f1cfb9c3c7fd9\"\n",
" ],\n",
" \"subnetIds\": [\n",
" \"subnet-04052219ec25b062b\"\n",
" ],\n",
" \"vpcId\": \"vpc-0fe2b9c50f3ad382f\"\n",
" }\n",
"}\n"
]
}
],
"source": [
"try:\n",
" resp = client.get_kx_cluster(environmentId=ENV_ID, clusterName=CLUSTER_NAME)\n",
"except client.exceptions.ResourceNotFoundException:\n",
" print(F\"Cluster: {CLUSTER_NAME} did not create\")\n",
"\n",
"if resp['ResponseMetadata']['HTTPStatusCode'] != 200:\n",
" sys.stderr.write(\"Error:\\n {resp}\")\n",
"else:\n",
" resp.pop('ResponseMetadata', None)\n",
"\n",
"kx_cluster = resp\n",
"\n",
"print(\"Cluster: \"+(\"-\"*80))\n",
"print(json.dumps(kx_cluster, sort_keys=True, indent=4, default=str))\n"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "7747eeb0-e0c4-43b0-8208-336fb3ea6286",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"# Give permissions time to propogate after cluster creation....\n",
"time.sleep(60)\n"
]
},
{
"cell_type": "markdown",
"id": "3f67a2fe-d592-43b8-868a-8219e603d77f",
"metadata": {},
"source": [
"## 7. Query Cluster using PyKX"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "b68d3ef3-6a39-4390-aeb4-723dcb5166d2",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"# Query the HDB\n",
"hdb = get_pykx_connection(client, \n",
" environmentId=ENV_ID, clusterName=CLUSTER_NAME, \n",
" userName=KDB_USERNAME, boto_session=session)"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "06a4bdb0-0e77-4c2a-aade-3d4a112013cd",
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Tables (1): ['example']\n"
]
},
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>t</th>\n",
" <th>f</th>\n",
" <th>a</th>\n",
" </tr>\n",
" <tr>\n",
" <th>c</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>date</th>\n",
" <td>b'd'</td>\n",
" <td></td>\n",
" <td></td>\n",
" </tr>\n",
" <tr>\n",
" <th>sym</th>\n",
" <td>b's'</td>\n",
" <td></td>\n",
" <td>p</td>\n",
" </tr>\n",
" <tr>\n",
" <th>time</th>\n",
" <td>b'p'</td>\n",
" <td></td>\n",
" <td></td>\n",
" </tr>\n",
" <tr>\n",
" <th>number</th>\n",
" <td>b'j'</td>\n",
" <td></td>\n",
" <td></td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" t f a\n",
"c \n",
"date b'd' \n",
"sym b's' p\n",
"time b'p' \n",
"number b'j' "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# Tables\n",
"tables = hdb(\"tables[]\").py()\n",
"print(f\"Tables ({len(tables)}): {tables}\")\n",
"\n",
"# Schema\n",
"schema_pdf = hdb(\"meta `example\").pd()\n",
"display(schema_pdf)\n"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "15469493-9105-480f-9a16-8a18eda7bc4c",
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>counts</th>\n",
" <th>avg_num</th>\n",
" <th>avg_sq_num</th>\n",
" </tr>\n",
" <tr>\n",
" <th>date</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>2024-11-16</th>\n",
" <td>1000000</td>\n",
" <td>499800.464405</td>\n",
" <td>3.331963e+11</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2024-11-17</th>\n",
" <td>1000000</td>\n",
" <td>499865.008159</td>\n",
" <td>3.332692e+11</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2024-11-18</th>\n",
" <td>1000000</td>\n",
" <td>499912.379127</td>\n",
" <td>3.332060e+11</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2024-11-19</th>\n",
" <td>1000000</td>\n",
" <td>500078.393386</td>\n",
" <td>3.334004e+11</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2024-11-20</th>\n",
" <td>1000000</td>\n",
" <td>500264.684412</td>\n",
" <td>3.336606e+11</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2024-11-21</th>\n",
" <td>1000000</td>\n",
" <td>499849.962912</td>\n",
" <td>3.333436e+11</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2024-11-22</th>\n",
" <td>1000000</td>\n",
" <td>500082.257829</td>\n",
" <td>3.334472e+11</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2024-11-23</th>\n",
" <td>1000000</td>\n",
" <td>500169.492354</td>\n",
" <td>3.336027e+11</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2024-11-24</th>\n",
" <td>1000000</td>\n",
" <td>499903.047577</td>\n",
" <td>3.331833e+11</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2024-11-25</th>\n",
" <td>1000000</td>\n",
" <td>500277.492790</td>\n",
" <td>3.335924e+11</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" counts avg_num avg_sq_num\n",
"date \n",
"2024-11-16 1000000 499800.464405 3.331963e+11\n",
"2024-11-17 1000000 499865.008159 3.332692e+11\n",
"2024-11-18 1000000 499912.379127 3.332060e+11\n",
"2024-11-19 1000000 500078.393386 3.334004e+11\n",
"2024-11-20 1000000 500264.684412 3.336606e+11\n",
"2024-11-21 1000000 499849.962912 3.333436e+11\n",
"2024-11-22 1000000 500082.257829 3.334472e+11\n",
"2024-11-23 1000000 500169.492354 3.336027e+11\n",
"2024-11-24 1000000 499903.047577 3.331833e+11\n",
"2024-11-25 1000000 500277.492790 3.335924e+11"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Rows: 10,000,000\n"
]
}
],
"source": [
"# Simple Query, uses function from lib\n",
"res_table = hdb(\"select counts:count i, avg_num: avg number, avg_sq_num: avg sq number by date from example\").pd()\n",
"display(res_table)\n",
"\n",
"# Number of Rows in Table\n",
"rows = hdb(\"count example\").py()\n",
"print(f\"Rows: {rows:,}\")"
]
},
{
"cell_type": "markdown",
"id": "238fdb54-2db6-4e6d-bad9-ee3e864b37ba",
"metadata": {},
"source": [
"## q Queries\n",
"You can also use the q magic to run q code directly on the remote cluster by connecting to the. cluster over IPC and sending code over to execute."
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "c7021717-01bb-41c5-bf7e-f16b919a4d47",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"# get the connection string to the cluster\n",
"conn_str = get_kx_connection_string(client, \n",
" environmentId=ENV_ID, clusterName=CLUSTER_NAME, \n",
" userName=KDB_USERNAME, boto_session=session)\n",
"\n",
"# parse the connection string to components\n",
"host, port, username, password = parse_connection_string(conn_str)\n"
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "31a956b8-b730-4f2e-88cb-34f864fb8a45",
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"date | counts avg_num avg_sq_num \n",
"----------| -----------------------------\n",
"2024.11.16| 1000000 499800.5 3.331963e+11\n",
"2024.11.17| 1000000 499865 3.332692e+11\n",
"2024.11.18| 1000000 499912.4 3.33206e+11 \n",
"2024.11.19| 1000000 500078.4 3.334004e+11\n",
"2024.11.20| 1000000 500264.7 3.336606e+11\n",
"2024.11.21| 1000000 499850 3.333436e+11\n",
"2024.11.22| 1000000 500082.3 3.334472e+11\n",
"2024.11.23| 1000000 500169.5 3.336027e+11\n",
"2024.11.24| 1000000 499903 3.331833e+11\n",
"2024.11.25| 1000000 500277.5 3.335924e+11\n"
]
}
],
"source": [
"%%q --host $host --port $port --user $username --pass $password\n",
"\n",
"select counts:count i, avg_num: avg number, avg_sq_num: avg sq number by date from example"
]
},
{
"cell_type": "markdown",
"id": "aaec5839-e519-42d2-ab62-a524cee0bdfc",
"metadata": {
"tags": []
},
"source": [
"# Clean Up"
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "941c540a-f4ed-46a1-a7ba-7e125b72501c",
"metadata": {
"tags": []
},
"outputs": [
{
"ename": "NameError",
"evalue": "name 'DELETE_CLUSTER' is not defined",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)",
"Cell \u001b[0;32mIn[16], line 8\u001b[0m\n\u001b[1;32m 4\u001b[0m db_list\n\u001b[1;32m 6\u001b[0m cluster_deleted\u001b[38;5;241m=\u001b[39m\u001b[38;5;28;01mFalse\u001b[39;00m\n\u001b[0;32m----> 8\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[43mDELETE_CLUSTER\u001b[49m: \n\u001b[1;32m 9\u001b[0m \u001b[38;5;66;03m# list all clusters\u001b[39;00m\n\u001b[1;32m 10\u001b[0m resp\u001b[38;5;241m=\u001b[39mclient\u001b[38;5;241m.\u001b[39mget_kx_cluster(environmentId\u001b[38;5;241m=\u001b[39mENV_ID, clusterName\u001b[38;5;241m=\u001b[39mCLUSTER_NAME)\n\u001b[1;32m 12\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m resp[\u001b[38;5;124m'\u001b[39m\u001b[38;5;124mResponseMetadata\u001b[39m\u001b[38;5;124m'\u001b[39m][\u001b[38;5;124m'\u001b[39m\u001b[38;5;124mHTTPStatusCode\u001b[39m\u001b[38;5;124m'\u001b[39m] \u001b[38;5;241m!=\u001b[39m \u001b[38;5;241m200\u001b[39m:\n",
"\u001b[0;31mNameError\u001b[0m: name 'DELETE_CLUSTER' is not defined"
]
}
],
"source": [
"# Cluster Deletion\n",
"# ------------------------------------------------------------\n",
"db_list = list_kx_databases(client, environmentId=ENV_ID)\n",
"db_list\n",
"\n",
"cluster_deleted=False\n",
"\n",
"if DELETE_CLUSTER: \n",
" # list all clusters\n",
" resp=client.get_kx_cluster(environmentId=ENV_ID, clusterName=CLUSTER_NAME)\n",
"\n",
" if resp['ResponseMetadata']['HTTPStatusCode'] != 200:\n",
" sys.stderr.write(\"Error:\\n {resp}\")\n",
" else:\n",
" resp.pop('ResponseMetadata', None)\n",
"\n",
" if resp['status'] != 'DELETING':\n",
" try:\n",
" resp = client.delete_kx_cluster(environmentId=ENV_ID, clusterName=CLUSTER_NAME)\n",
" if resp['ResponseMetadata']['HTTPStatusCode'] != 200:\n",
" sys.stderr.write(\"Error:\\n {resp}\")\n",
" else:\n",
" resp.pop('ResponseMetadata', None)\n",
" except Exception as e: \n",
" sys.stderr.write(f\"Error deleting cluster: {CLUSTER_NAME}\\n{e}\")\n",
" cluster_deleted = False\n",
"\n",
" try:\n",
" wait_for_cluster_status(client, environmentId=ENV_ID, clusterName=CLUSTER_NAME, status='DELETED', show_wait=False)\n",
" print()\n",
" print(\"** DONE **\")\n",
"\n",
" cluster_deleted = True\n",
" except client.exceptions.ResourceNotFoundException:\n",
" cluster_deleted = True\n",
"else:\n",
" print(f\"DELETE_CLUSTER: {DELETE_CLUSTER}\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "66da3a6c-c7ba-46f1-8908-198f36b3349d",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"# Database Deletion\n",
"# Requires cluster to have been deleted\n",
"if DELETE_DATABASE:\n",
" if cluster_deleted:\n",
" # if the database exists, delete it\n",
" if has_database(client, environmentId=ENV_ID, databaseName=DB_NAME):\n",
" try:\n",
" resp = client.delete_kx_database(environmentId=ENV_ID, databaseName=DB_NAME)\n",
" if resp['ResponseMetadata']['HTTPStatusCode'] != 200:\n",
" sys.stderr.write(\"Error:\\n {resp}\")\n",
" else:\n",
" resp.pop('ResponseMetadata', None)\n",
"\n",
" resp\n",
" except Exception as e: \n",
" sys.stderr.write(f\"Error: \\n{e}\")\n",
" else:\n",
" print(f\"Database already deleted: {DB_NAME} \")\n",
" else:\n",
" print(f\"Cluster deleted? {cluster_deleted}, will not delete database if cluster not deleted\")\n",
"else:\n",
" print(f\"DELETE_DATABASE: {DELETE_DATABASE}\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b7fe3a21-1bf9-47ba-b162-70ab5445b44f",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"print( f\"Last Run: {datetime.datetime.now()}\" )"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3ef82e62-de9c-47b7-9869-40f52af80fac",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "conda_pytorch_p310",
"language": "python",
"name": "conda_pytorch_p310"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.14"
}
},
"nbformat": 4,
"nbformat_minor": 5
}