pai-python-sdk/framework/xgboost_breast_cancer/xgboost_breast_cancer.ipynb (832 lines of code) (raw):
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"# 使用 PAI Python SDK 训练和部署 XGBoost 模型\n",
"\n",
"\n",
"[XGBoost](https://xgboost.readthedocs.io/) 是基于决策树的梯度提升算法([Gradient Boosting](https://en.wikipedia.org/wiki/Gradient_boosting))的高效工程实现,是一个流行的机器学习库,它能够处理大的数据集合,并且做了许多训练性能优化工作。\n",
"\n",
"在这个教程示例中,我们将使用PAI Python SDK,在PAI上完成XGBoost模型的训练,然后将输出的模型部署为在线推理服务,并进行调用测试。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"## 费用说明\n",
"\n",
"本示例将会使用以下云产品,并产生相应的费用账单:\n",
"\n",
"- PAI-DLC:运行训练任务,详细计费说明请参考[PAI-DLC计费说明](https://help.aliyun.com/zh/pai/product-overview/billing-of-dlc)\n",
"- PAI-EAS:部署推理服务,详细计费说明请参考[PAI-EAS计费说明](https://help.aliyun.com/zh/pai/product-overview/billing-of-eas)\n",
"- OSS:存储训练任务输出的模型、TensorBoard日志等,详细计费说明请参考[OSS计费概述](https://help.aliyun.com/zh/oss/product-overview/billing-overview)\n",
"\n",
"\n",
"> 通过参与云产品免费试用,使用**指定资源机型**提交训练作业或是部署推理服务,可以免费试用PAI产品,具体请参考[PAI免费试用](https://help.aliyun.com/zh/pai/product-overview/free-quota-for-new-users)。\n",
"\n"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"## Step1: 准备工作\n",
"\n",
"我们需要首先安装 PAI Python SDK 以运行本示例。\n"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"!python -m pip install --upgrade pai"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"\n",
"SDK 需要配置访问阿里云服务需要的 AccessKey,以及当前使用的工作空间和OSS Bucket。在 PAI SDK 安装之后,通过在 **命令行终端** 中执行以下命令,按照引导配置密钥,工作空间等信息。\n",
"\n",
"\n",
"```shell\n",
"\n",
"# 以下命令,请在 命令行终端 中执行.\n",
"\n",
"python -m pai.toolkit.config\n",
"\n",
"```\n",
"\n",
"我们可以通过以下代码验证当前的配置。\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"# 验证安装\n",
"import pai\n",
"from pai.session import get_default_session\n",
"\n",
"print(pai.__version__)\n",
"\n",
"sess = get_default_session()\n",
"\n",
"assert sess.workspace_name is not None\n",
"print(sess.workspace_name)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step2: 准备数据集\n",
"\n",
"我们将使用[Breast Cancer数据集](https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Diagnostic)),训练和测试XGBoost模型。准备数据集的步骤如下:\n",
"\n",
"1. 通过 `scikit-learn` 下载和拆分 Breast Cancer 数据集,使用 `csv` 格式保存到本地。\n",
"\n",
"2. 将本地数据集上传到OSS Bucket上,获得数据集的OSS URI,供云上执行的训练作业使用。"
]
},
{
"cell_type": "markdown",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"使用SKLearn下载和拆分数据集。"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"train data local path: ./train_data/train.csv\n",
"test data local path: ./test_data/train.csv\n"
]
}
],
"source": [
"import sys\n",
"\n",
"# 安装 sklearn, 用于数据集下载和切分\n",
"!{sys.executable} -m pip install --quiet scikit-learn\n",
"\n",
"# 创建数据集目录\n",
"!mkdir -p ./train_data\n",
"!mkdir -p ./test_data\n",
"\n",
"from sklearn import datasets\n",
"from sklearn.model_selection import train_test_split\n",
"\n",
"df = datasets.load_breast_cancer(as_frame=True)\n",
"\n",
"train, test = train_test_split(df.frame, test_size=0.3)\n",
"\n",
"train_data_local = \"./train_data/train.csv\"\n",
"test_data_local = \"./test_data/train.csv\"\n",
"\n",
"train.to_csv(train_data_local, index=False)\n",
"test.to_csv(test_data_local, index=False)\n",
"\n",
"print(f\"train data local path: {train_data_local}\")\n",
"print(f\"test data local path: {test_data_local}\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"上传数据集到OSS Bucket。"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# 上传数据集到OSS Bucket\n",
"from pai.common.oss_utils import upload\n",
"\n",
"\n",
"# 上传训练数据到OSS\n",
"train_data = upload(\n",
" train_data_local,\n",
" \"pai/xgboost-example/train_data/\",\n",
" sess.oss_bucket,\n",
")\n",
"\n",
"\n",
"test_data = upload(\n",
" test_data_local,\n",
" \"pai/xgboost-example/test_data/\",\n",
" sess.oss_bucket,\n",
")\n",
"\n",
"print(f\"train data: {train_data}\")\n",
"print(f\"test data: {test_data}\")"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step3: 提交训练作业\n",
"\n",
"通过PAI Python SDK提供`Estimator`,用户可以将训练脚本,提交到PAI创建一个训练作业,获得输出模型,主要流程包括:\n",
"\n",
"1. 用户编写训练作业脚本\n",
"\n",
"训练脚本负责模型代码的编写,它需要遵循PAI训练作业的规则获取作业超参,读取输入数据,并且将需要保存模型到指定的输出目录。\n",
"\n",
"2. 构建`Estimator`对象\n",
"\n",
"通过`Estimator` API,用户配置训练作业使用的脚本,镜像,超参,以及机器实例类型等信息。\n",
"本地的脚本会有Estimator上传到OSS Bucket,然后加载到训练作业内。\n",
"\n",
"3. 调用`Estimator.fit`API提交作业\n",
"\n",
"通过`.fit`提交一个训练作业,默认`.fit`方法会等到作业停止之后,才会退出,作业结束后,用户可以通过`estimator.model_data()`获得输出模型OSS URI路径。\n",
"\n",
"更加完整的介绍请参考 [文档: 提交训练作业](https://pai-sdk.oss-cn-shanghai.aliyuncs.com/pai/doc/latest/user-guide/estimator.html)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"我们通过XGboost提供的SKlearn API,构建了一个XGBoost的训练脚本:\n",
"\n",
"- 训练作业默认接收两个输入Channel: train 和 test,训练脚本会从 `/ml/input/data/{channel_name}` 中读取训练数据。\n",
"\n",
"- 训练结束之后,训练脚本需要将模型写出到到 `/ml/output/model` 目录下。"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"!mkdir -p xgb_src/"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Overwriting xgb_src/train.py\n"
]
}
],
"source": [
"%%writefile xgb_src/train.py\n",
"\n",
"\n",
"import argparse\n",
"import logging\n",
"import os\n",
"\n",
"import pandas as pd\n",
"from xgboost import XGBClassifier\n",
"\n",
"logging.basicConfig(format=\"%(levelname)s:%(message)s\", level=logging.INFO)\n",
"\n",
"TRAINING_BASE_DIR = \"/ml/\"\n",
"TRAINING_OUTPUT_MODEL_DIR = os.path.join(TRAINING_BASE_DIR, \"output/model/\")\n",
"\n",
"\n",
"def load_dataset(channel_name):\n",
" path = os.path.join(TRAINING_BASE_DIR, \"input/data\", channel_name)\n",
" if not os.path.exists(path):\n",
" return None, None\n",
"\n",
" # use first file in the channel dir.\n",
" file_name = next(\n",
" iter([f for f in os.listdir(path) if os.path.isfile(os.path.join(path, f))]),\n",
" None,\n",
" )\n",
" if not file_name:\n",
" logging.warning(f\"Not found input file in channel path: {path}\")\n",
" return None, None\n",
"\n",
" file_path = os.path.join(path, file_name)\n",
" df = pd.read_csv(\n",
" filepath_or_buffer=file_path,\n",
" sep=\",\",\n",
" )\n",
"\n",
" train_y = df[\"target\"]\n",
" train_x = df.drop([\"target\"], axis=1)\n",
" return train_x, train_y\n",
"\n",
"\n",
"def main():\n",
" parser = argparse.ArgumentParser(description=\"XGBoost train arguments\")\n",
" # 用户指定的任务参数\n",
" parser.add_argument(\n",
" \"--n_estimators\", type=int, default=500, help=\"The number of base model.\"\n",
" )\n",
" parser.add_argument(\n",
" \"--objective\", type=str, help=\"Objective function used by XGBoost\"\n",
" )\n",
"\n",
" parser.add_argument(\n",
" \"--max_depth\", type=int, default=3, help=\"The maximum depth of the tree.\"\n",
" )\n",
"\n",
" parser.add_argument(\n",
" \"--eta\",\n",
" type=float,\n",
" default=0.2,\n",
" help=\"Step size shrinkage used in update to prevents overfitting.\",\n",
" )\n",
" parser.add_argument(\n",
" \"--eval_metric\",\n",
" type=str,\n",
" default=None,\n",
" help=\"Evaluation metrics for validation data\"\n",
" )\n",
"\n",
" args, _ = parser.parse_known_args()\n",
"\n",
" # 加载数据集\n",
" train_x, train_y = load_dataset(\"train\")\n",
" print(\"Train dataset: train_shape={}\".format(train_x.shape))\n",
" test_x, test_y = load_dataset(\"test\")\n",
" if test_x is None or test_y is None:\n",
" print(\"Test dataset not found\")\n",
" eval_set = [(train_x, train_y)]\n",
" else:\n",
" eval_set = [(train_x, train_y), (test_x, test_y)]\n",
"\n",
" clf = XGBClassifier(\n",
" max_depth=args.max_depth,\n",
" eta=args.eta,\n",
" n_estimators=args.n_estimators,\n",
" objective=args.objective,\n",
" )\n",
" clf.fit(train_x, train_y, eval_set=eval_set, eval_metric=args.eval_metric)\n",
"\n",
" model_path = os.environ.get(\"PAI_OUTPUT_MODEL\")\n",
" os.makedirs(model_path, exist_ok=True)\n",
" clf.save_model(os.path.join(model_path, \"model.json\"))\n",
" print(f\"Save model succeed: model_path={model_path}/model.json\")\n",
"\n",
"\n",
"if __name__ == \"__main__\":\n",
" main()\n"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"### 使用Estimator提交训练作业\n",
"\n",
"通过 Estimator, 我们将以上构建的训练脚本 (xgb_src/train.py) 上传到 OSS上,通过`fit` 提交一个在云端执行XGBoost训练作业。 fit API接收的inputs分别是之前上传的训练和测试的数据,会被挂载到作业容器中(分别挂载到 `/ml/input/data/{channel_name}/`),供训练脚本读取输入数据。\n",
"\n",
"提交之后,SDK 会打印作业的详情URL,并且打印作业日志,直到作业退出(成功,失败,或是停止)。用户可以点击作业URL查看任务详情,执行日志,模型的Metric,机器资源使用率等信息。"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"registry.cn-beijing.aliyuncs.com/pai-dlc/xgboost-training:1.6.0-cpu-py36-ubuntu18.04\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"Uploading file: /var/folders/jn/9tcbd4h56z5g3wbbd5sms38m0000gp/T/tmpwwk7er5t/source.tar.gz: 100%|██████████| 1.39k/1.39k [00:00<00:00, 8.84kB/s]\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"View the job detail by accessing the console URI: https://pai.console.aliyun.com/?regionId=cn-beijing&workspaceId=90914#/training/jobs/train10qiryeueit\n",
"TrainingJob launch starting\n",
"KUBERNETES_PORT=tcp://10.192.0.1:443\n",
"KUBERNETES_SERVICE_PORT=6443\n",
"PIP_TRUSTED_HOST=mirrors.cloud.aliyuncs.com\n",
"CMAKE_VERSION=3.14\n",
"SCRAPE_PROMETHEUS_METRICS=yes\n",
"MASTER_ADDR=train10qiryeueit-master-0\n",
"HOSTNAME=train10qiryeueit-master-0\n",
"MASTER_PORT=23456\n",
"HOME=/root\n",
"PAI_USER_ARGS=--max_depth 5 --eval_metric auc --n_estimator 100 --criterion gini\n",
"PYTHONUNBUFFERED=0\n",
"PAI_HPS_MAX_DEPTH=5\n",
"NPROC_PER_NODE=0\n",
"PAI_CONFIG_DIR=/ml/input/config/\n",
"PAI_OUTPUT_CHECKPOINTS=/ml/output/checkpoints/\n",
"WORLD_SIZE=1\n",
"REGION_ID=cn-beijing\n",
"RANK=0\n",
"PAI_INPUT_TRAIN=/ml/input/data/train/train.csv\n",
"TENANT_API_SERVER_URL=https://10.224.139.70:6443\n",
"PAI_TRAINING_JOB_ID=train10qiryeueit\n",
"PAI_OUTPUT_TENSORBOARD=/ml/output/tensorboard/\n",
"KUBERNETES_PORT_443_TCP_ADDR=10.192.0.1\n",
"PAI_OUTPUT_MODEL=/ml/output/model/\n",
"PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin\n",
"PIP_INDEX_URL=https://mirrors.cloud.aliyuncs.com/pypi/simple\n",
"KUBERNETES_PORT_443_TCP_PORT=443\n",
"KUBERNETES_PORT_443_TCP_PROTO=tcp\n",
"PAI_HPS_N_ESTIMATOR=100\n",
"PAI_TRAINING_USE_ECI=true\n",
"KUBERNETES_CONTAINER_RESOURCE_GPU=0\n",
"PAI_INPUT_TEST=/ml/input/data/test/train.csv\n",
"KUBERNETES_PORT_443_TCP=tcp://10.192.0.1:443\n",
"KUBERNETES_SERVICE_PORT_HTTPS=443\n",
"KUBERNETES_SERVICE_HOST=10.224.139.70\n",
"PWD=/root\n",
"PAI_HPS={\"criterion\":\"gini\",\"eval_metric\":\"auc\",\"max_depth\":\"5\",\"n_estimator\":\"100\"}\n",
"PAI_HPS_CRITERION=gini\n",
"PAI_HPS_EVAL_METRIC=auc\n",
"PAI_WORKING_DIR=/ml/usercode/\n",
"PAI_ODPS_CREDENTIAL=/ml/input/credential/odps.json\n",
"Change to Working Directory, /ml/usercode/\n",
"Installing dependencies from /ml/input/config//requirements.txt\n",
"/usr/lib/python3/dist-packages/secretstorage/dhcrypto.py:15: CryptographyDeprecationWarning: int_from_bytes is deprecated, use int.from_bytes instead\n",
" from cryptography.utils import int_from_bytes\n",
"/usr/lib/python3/dist-packages/secretstorage/util.py:19: CryptographyDeprecationWarning: int_from_bytes is deprecated, use int.from_bytes instead\n",
" from cryptography.utils import int_from_bytes\n",
"Looking in indexes: https://mirrors.cloud.aliyuncs.com/pypi/simple\n",
"WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv\n",
"User program launching\n",
"-----------------------------------------------------------------\n",
"Train dataset: train_shape=(398, 30)\n",
"/usr/local/lib/python3.6/dist-packages/xgboost/sklearn.py:817: UserWarning: `eval_metric` in `fit` method is deprecated for better compatibility with scikit-learn, use `eval_metric` in constructor or`set_params` instead.\n",
" UserWarning,\n",
"[0]\tvalidation_0-auc:0.99620\tvalidation_1-auc:0.94604\n",
"[1]\tvalidation_0-auc:1.00000\tvalidation_1-auc:0.95106\n",
"[2]\tvalidation_0-auc:1.00000\tvalidation_1-auc:0.95235\n",
"[3]\tvalidation_0-auc:1.00000\tvalidation_1-auc:0.95393\n",
"[4]\tvalidation_0-auc:1.00000\tvalidation_1-auc:0.95307\n",
"[5]\tvalidation_0-auc:1.00000\tvalidation_1-auc:0.97445\n",
"[6]\tvalidation_0-auc:1.00000\tvalidation_1-auc:0.97402\n",
"[7]\tvalidation_0-auc:1.00000\tvalidation_1-auc:0.97388\n",
"[8]\tvalidation_0-auc:1.00000\tvalidation_1-auc:0.97912\n",
"[9]\tvalidation_0-auc:1.00000\tvalidation_1-auc:0.97898\n",
"[10]\tvalidation_0-auc:1.00000\tvalidation_1-auc:0.97876\n",
"[11]\tvalidation_0-auc:1.00000\tvalidation_1-auc:0.97869\n",
"[12]\tvalidation_0-auc:1.00000\tvalidation_1-auc:0.97912\n",
"[13]\tvalidation_0-auc:1.00000\tvalidation_1-auc:0.97869\n",
"[14]\tvalidation_0-auc:1.00000\tvalidation_1-auc:0.97898\n",
"[15]\tvalidation_0-auc:1.00000\tvalidation_1-auc:0.97898\n",
"[16]\tvalidation_0-auc:1.00000\tvalidation_1-auc:0.97854\n",
"[17]\tvalidation_0-auc:1.00000\tvalidation_1-auc:0.97883\n",
"[18]\tvalidation_0-auc:1.00000\tvalidation_1-auc:0.97883\n",
"[19]\tvalidation_0-auc:1.00000\tvalidation_1-auc:0.97854\n",
"[20]\tvalidation_0-auc:1.00000\tvalidation_1-auc:0.97826\n",
"[21]\tvalidation_0-auc:1.00000\tvalidation_1-auc:0.97826\n",
"[22]\tvalidation_0-auc:1.00000\tvalidation_1-auc:0.97826\n",
"[23]\tvalidation_0-auc:1.00000\tvalidation_1-auc:0.97819\n",
"[24]\tvalidation_0-auc:1.00000\tvalidation_1-auc:0.97826\n",
"[25]\tvalidation_0-auc:1.00000\tvalidation_1-auc:0.97840\n",
"[26]\tvalidation_0-auc:1.00000\tvalidation_1-auc:0.97840\n",
"[27]\tvalidation_0-auc:1.00000\tvalidation_1-auc:0.97833\n",
"[28]\tvalidation_0-auc:1.00000\tvalidation_1-auc:0.97833\n",
"[29]\tvalidation_0-auc:1.00000\tvalidation_1-auc:0.97840\n",
"[30]\tvalidation_0-auc:1.00000\tvalidation_1-auc:0.97905\n",
"[31]\tvalidation_0-auc:1.00000\tvalidation_1-auc:0.97984\n",
"[32]\tvalidation_0-auc:1.00000\tvalidation_1-auc:0.97984\n",
"[33]\tvalidation_0-auc:1.00000\tvalidation_1-auc:0.98213\n",
"[34]\tvalidation_0-auc:1.00000\tvalidation_1-auc:0.98213\n",
"[35]\tvalidation_0-auc:1.00000\tvalidation_1-auc:0.98235\n",
"[36]\tvalidation_0-auc:1.00000\tvalidation_1-auc:0.98235\n",
"[37]\tvalidation_0-auc:1.00000\tvalidation_1-auc:0.98192\n",
"[38]\tvalidation_0-auc:1.00000\tvalidation_1-auc:0.98278\n",
"[39]\tvalidation_0-auc:1.00000\tvalidation_1-auc:0.98263\n",
"[40]\tvalidation_0-auc:1.00000\tvalidation_1-auc:0.98278\n",
"[41]\tvalidation_0-auc:1.00000\tvalidation_1-auc:0.98249\n",
"[42]\tvalidation_0-auc:1.00000\tvalidation_1-auc:0.98249\n",
"[43]\tvalidation_0-auc:1.00000\tvalidation_1-auc:0.98292\n",
"[44]\tvalidation_0-auc:1.00000\tvalidation_1-auc:0.98292\n",
"[45]\tvalidation_0-auc:1.00000\tvalidation_1-auc:0.98292\n",
"[46]\tvalidation_0-auc:1.00000\tvalidation_1-auc:0.98292\n",
"[47]\tvalidation_0-auc:1.00000\tvalidation_1-auc:0.98249\n",
"[48]\tvalidation_0-auc:1.00000\tvalidation_1-auc:0.98235\n",
"[49]\tvalidation_0-auc:1.00000\tvalidation_1-auc:0.98235\n",
"[50]\tvalidation_0-auc:1.00000\tvalidation_1-auc:0.98249\n",
"[51]\tvalidation_0-auc:1.00000\tvalidation_1-auc:0.98235\n",
"[52]\tvalidation_0-auc:1.00000\tvalidation_1-auc:0.98235\n",
"[53]\tvalidation_0-auc:1.00000\tvalidation_1-auc:0.98235\n",
"[54]\tvalidation_0-auc:1.00000\tvalidation_1-auc:0.98235\n",
"[55]\tvalidation_0-auc:1.00000\tvalidation_1-auc:0.98235\n",
"[56]\tvalidation_0-auc:1.00000\tvalidation_1-auc:0.98249\n",
"[57]\tvalidation_0-auc:1.00000\tvalidation_1-auc:0.98249\n",
"[58]\tvalidation_0-auc:1.00000\tvalidation_1-auc:0.98249\n",
"[59]\tvalidation_0-auc:1.00000\tvalidation_1-auc:0.98249\n",
"[60]\tvalidation_0-auc:1.00000\tvalidation_1-auc:0.98249\n",
"[61]\tvalidation_0-auc:1.00000\tvalidation_1-auc:0.98249\n",
"[62]\tvalidation_0-auc:1.00000\tvalidation_1-auc:0.98220\n",
"[63]\tvalidation_0-auc:1.00000\tvalidation_1-auc:0.98235\n",
"[64]\tvalidation_0-auc:1.00000\tvalidation_1-auc:0.98235\n",
"[65]\tvalidation_0-auc:1.00000\tvalidation_1-auc:0.98220\n",
"[66]\tvalidation_0-auc:1.00000\tvalidation_1-auc:0.98220\n",
"[67]\tvalidation_0-auc:1.00000\tvalidation_1-auc:0.98249\n",
"[68]\tvalidation_0-auc:1.00000\tvalidation_1-auc:0.98249\n",
"[69]\tvalidation_0-auc:1.00000\tvalidation_1-auc:0.98249\n",
"[70]\tvalidation_0-auc:1.00000\tvalidation_1-auc:0.98263\n",
"[71]\tvalidation_0-auc:1.00000\tvalidation_1-auc:0.98263\n",
"[72]\tvalidation_0-auc:1.00000\tvalidation_1-auc:0.98263\n",
"[73]\tvalidation_0-auc:1.00000\tvalidation_1-auc:0.98263\n",
"[74]\tvalidation_0-auc:1.00000\tvalidation_1-auc:0.98263\n",
"[75]\tvalidation_0-auc:1.00000\tvalidation_1-auc:0.98220\n",
"[76]\tvalidation_0-auc:1.00000\tvalidation_1-auc:0.98249\n",
"[77]\tvalidation_0-auc:1.00000\tvalidation_1-auc:0.98263\n",
"[78]\tvalidation_0-auc:1.00000\tvalidation_1-auc:0.98249\n",
"[79]\tvalidation_0-auc:1.00000\tvalidation_1-auc:0.98249\n",
"[80]\tvalidation_0-auc:1.00000\tvalidation_1-auc:0.98249\n",
"[81]\tvalidation_0-auc:1.00000\tvalidation_1-auc:0.98235\n",
"[82]\tvalidation_0-auc:1.00000\tvalidation_1-auc:0.98235\n",
"[83]\tvalidation_0-auc:1.00000\tvalidation_1-auc:0.98235\n",
"[84]\tvalidation_0-auc:1.00000\tvalidation_1-auc:0.98235\n",
"[85]\tvalidation_0-auc:1.00000\tvalidation_1-auc:0.98235\n",
"[86]\tvalidation_0-auc:1.00000\tvalidation_1-auc:0.98235\n",
"[87]\tvalidation_0-auc:1.00000\tvalidation_1-auc:0.98235\n",
"[88]\tvalidation_0-auc:1.00000\tvalidation_1-auc:0.98235\n",
"[89]\tvalidation_0-auc:1.00000\tvalidation_1-auc:0.98235\n",
"[90]\tvalidation_0-auc:1.00000\tvalidation_1-auc:0.98235\n",
"[91]\tvalidation_0-auc:1.00000\tvalidation_1-auc:0.98235\n",
"[92]\tvalidation_0-auc:1.00000\tvalidation_1-auc:0.98235\n",
"[93]\tvalidation_0-auc:1.00000\tvalidation_1-auc:0.98235\n",
"[94]\tvalidation_0-auc:1.00000\tvalidation_1-auc:0.98235\n",
"[95]\tvalidation_0-auc:1.00000\tvalidation_1-auc:0.98235\n",
"[96]\tvalidation_0-auc:1.00000\tvalidation_1-auc:0.98235\n",
"[97]\tvalidation_0-auc:1.00000\tvalidation_1-auc:0.98235\n",
"[98]\tvalidation_0-auc:1.00000\tvalidation_1-auc:0.98235\n",
"[99]\tvalidation_0-auc:1.00000\tvalidation_1-auc:0.98235\n",
"Save model succeed: model_path=/ml/output/model//model.json\n",
"\n",
"Training job (train10qiryeueit) succeeded, you can check the logs/metrics/output in the console:\n",
"https://pai.console.aliyun.com/?regionId=cn-beijing&workspaceId=90914#/training/jobs/train10qiryeueit\n",
"oss://mlops-poc-beijing/pai/training_job/estimator_20240712_101220_265_j03jyg/model/\n"
]
}
],
"source": [
"from pai.estimator import Estimator\n",
"from pai.image import retrieve\n",
"\n",
"\n",
"# 获取PAI提供的XGBoost训练镜像\n",
"image_uri = retrieve(\"xgboost\", framework_version=\"latest\").image_uri\n",
"print(image_uri)\n",
"\n",
"# 构建一个Estimator实例\n",
"est = Estimator(\n",
" # 作业启动脚本\n",
" command=\"python train.py $PAI_USER_ARGS\",\n",
" # 作业脚本的本地文件夹路径,会被打包上传到OSS\n",
" source_dir=\"./xgb_src/\",\n",
" image_uri=image_uri,\n",
" # 作业超参: 会通过Command arguments的方式传递给到作业脚本\n",
" hyperparameters={\n",
" \"n_estimator\": 100,\n",
" \"criterion\": \"gini\",\n",
" \"max_depth\": 5,\n",
" \"eval_metric\": \"auc\",\n",
" },\n",
" # 作业使用的机器实例\n",
" instance_type=\"ecs.c6.large\",\n",
")\n",
"\n",
"# 使用上传到OSS的训练数据作为作业的数据\n",
"est.fit(\n",
" inputs={\n",
" \"train\": train_data, # train_data 将被挂载到`/ml/input/data/train`目录\n",
" \"test\": test_data, # test_data 将被挂载到`/ml/input/data/test`目录\n",
" },\n",
")\n",
"print(est.model_data())"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step4: 部署模型\n",
"\n",
"以上训练获得模型,我们将使用[预置XGBoost Processor](https://help.aliyun.com/document_detail/470490.html)部署为一个在线服务。主要流程包括:\n",
"\n",
"1. 通过构建一个InferenceSpec\n",
"\n",
"InferenceSpec负责描述模型如何部署为一个在线服务,例如模型使用镜像部署,还是使用processor部署等。\n",
"\n",
"2. 构建Model对象\n",
"\n",
"Model对象可以直接部署服务,也可以通过`.register`注册到PAI的模型仓库。\n",
"\n",
"3. 使用`Model.deploy`部署在线服务。\n",
"\n",
"通过指定服务名称,机器实例类型,部署一个新的在线推理服务。\n"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [],
"source": [
"!mkdir xgb_infer/"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Writing xgb_infer/serving.py\n"
]
}
],
"source": [
"%%writefile xgb_infer/serving.py\n",
"\n",
"\n",
"import logging\n",
"import os\n",
"from typing import List\n",
"\n",
"import numpy as np\n",
"import uvicorn\n",
"import xgboost\n",
"from fastapi import FastAPI\n",
"from xgboost import XGBClassifier\n",
"\n",
"logging.basicConfig(\n",
" format=\"%(asctime)s: %(message)s\",\n",
" datefmt=\"%m/%d/%Y %I:%M:%S %p\",\n",
" level=logging.INFO,\n",
")\n",
"logger = logging.getLogger(__name__)\n",
"\n",
"app = FastAPI()\n",
"\n",
"\n",
"def load_model():\n",
" model_dir = os.environ.get(\"MODEL_MOUNT_PATH\", \"/eas/workspace/model\")\n",
" logger.info(\"model_dir: %s\", model_dir)\n",
" name = next((name for name in os.listdir(model_dir) if name.endswith(\"json\")), None)\n",
" logger.info(\"model dir files: %s\", os.listdir(model_dir))\n",
" if not name:\n",
" raise RuntimeError(\"Not found sklearn learn model under the model directory.\")\n",
"\n",
" xgb_model = XGBClassifier()\n",
" xgb_model.load_model(os.path.join(model_dir, name))\n",
" return xgb_model\n",
"\n",
"\n",
"model = load_model()\n",
"\n",
"\n",
"@app.post(\"/\")\n",
"def predict_v1(data: List):\n",
" global model\n",
" logger.info(\"API PredictV1 Invocation.\")\n",
" x = np.asarray(data)\n",
" y = model.predict(x)\n",
" return y.tolist()\n",
"\n",
"\n",
"if __name__ == \"__main__\":\n",
" logger.info(\"FastAPI server launching\")\n",
" logger.info(\"Environment Variables: %s\", os.environ)\n",
" logger.info(\"XGBoost Version: %s\", xgboost.__version__)\n",
" port = int(os.environ.get(\"LISTENING_PORT\", 8000))\n",
" uvicorn.run(app, host=\"0.0.0.0\", port=port)\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from pai.model import InferenceSpec, Model, container_serving_spec\n",
"from pai.common.utils import random_str\n",
"\n",
"model_data = est.model_data()\n",
"\n",
"\n",
"image_uri = retrieve(\"xgboost\", framework_version=\"latest\").image_uri\n",
"\n",
"inference_spec = container_serving_spec(\n",
" # 推理代码目录\n",
" source_dir=\"./xgb_infer\",\n",
" # 启动命令\n",
" command=\"python serving.py\",\n",
" # 推理镜像\n",
" image_uri=image_uri,\n",
" port=5000,\n",
" requirements=[\"uvicorn[standard]\", \"fastapi\"],\n",
")\n",
"model = Model(\n",
" inference_spec=inference_spec,\n",
" model_data=model_data,\n",
")\n",
"\n",
"predictor = model.deploy(\n",
" service_name=\"example_xgb_{}\".format(random_str(6)),\n",
" instance_type=\"ecs.c6.xlarge\",\n",
")"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step5: 测试在线服务\n",
"\n",
"`Model.deploy`方法返回一个 `Predictor` 对象,`Predictor.predict`方法支持向创建的推理服务发送推理请求,拿到预测结果。"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 0, 0, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 0, 0]\n"
]
}
],
"source": [
"# print(p.service_name)\n",
"\n",
"test_x = test.drop([\"target\"], axis=1)\n",
"\n",
"res = predictor.predict(test_x.to_numpy().tolist())\n",
"\n",
"print(res)"
]
},
{
"cell_type": "markdown",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"在测试结束后,删除服务。"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [],
"source": [
"predictor.delete_service()"
]
}
],
"metadata": {
"execution": {
"timeout": 1800
},
"kernelspec": {
"display_name": "pai-dev-py36",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.19"
},
"vscode": {
"interpreter": {
"hash": "63703143536f433679c5464335316251eaa13807b3fcc3854dae32f2699871d6"
}
}
},
"nbformat": 4,
"nbformat_minor": 0
}