documents/solution/ai/use-gpu-ecs-to-deploy-chatGLM.yaml (371 lines of code) (raw):
Outputs:
WebUIUrl:
Description:
zh-cn: WebUI访问域名。
en: URL of WebUI.
Value:
Fn::Sub:
- http://${PublicIp}:7860
- PublicIp:
Fn::GetAtt:
- EcsInstance
- PublicIp
ROSTemplateFormatVersion: '2015-09-01'
Description:
zh-cn: 创建ECS实例与GPDB数据库,配置安全组与网络环境,安装ChatGLM模型及依赖,通过WebUI提供服务,自动检查与启动服务,对外暴露7860端口。
en: Create ECS instances and GPDB databases, configure security groups and network
environments, install the ChatGLM model and its dependencies, provide services
via WebUI, automatically monitor and start the service, and expose port 7860 externally.
Parameters:
SystemDiskCategory:
AssociationProperty: ALIYUN::ECS::Disk::SystemDiskCategory
AssociationPropertyMetadata:
InstanceType: ${InstanceType}
ZoneId: ${ZoneId}
Type: String
Description:
zh-cn: '<font color=''blue''><b>可选值:</b></font><br>[cloud_efficiency: <font color=''green''>高效云盘</font>]<br>[cloud_ssd: <font color=''green''>SSD云盘</font>]<br>[cloud_essd: <font color=''green''>ESSD云盘</font>]<br>[cloud: <font color=''green''>普通云盘</font>]<br>[ephemeral_ssd: <font color=''green''>本地SSD盘</font>]'
en: '<font color=''blue''><b>Optional values:</b></font><br>[cloud_efficiency: <font color=''green''>Efficient Cloud Disk</font>]<br>[cloud_ssd: <font color=''green''>SSD Cloud Disk</font>]<br>[cloud_essd: <font color=''green''>ESSD Cloud Disk</font>]<br>[cloud: <font color=''green''>Cloud Disk</font>]<br>[ephemeral_ssd: <font color=''green''>Local SSD Cloud Disk</font>]'
Label:
zh-cn: 系统磁盘类型
en: System Disk Category
InstancePassword:
ConstraintDescription:
zh-cn: 长度8-30,必须包含大写字母、小写字母、数字、特殊符号三种;特殊字符包括:()`~!@#$%^&*_-+=|{}[]:;' <>,.?/
en: 'Length 8-30, must contain upper case letters, lower case letters, Numbers, special symbols three; special characters include: ()`~!@#$%^&*_-+=|{}[]:;''<>,.?/'
Description:
zh-cn: 长度8-30,必须包含大写字母、小写字母、数字、特殊符号三个;<br>特殊字符包括:()`~!@#$%^&*_-+=|{}[]:;'<>,.?/
en: The 8-30 long login password of instance, consists of the uppercase, lowercase letter and number. <br> special characters include()`~!@#$%^&*_-+=|{}[]:;'<>,.?/
MinLength: '8'
Label:
zh-cn: 实例密码
en: Instance Password
AllowedPattern: '[0-9A-Za-z\_\-&:;''<>,=%`~!@#\(\)\$\^\*\+\|\{\}\[\]\.\?\/]+$'
NoEcho: true
MaxLength: '30'
Type: String
AccountName:
Default: mytest
Type: String
Label:
zh-cn: 数据库账号名称
en: DB Account
AccountPassword:
NoEcho: true
Type: String
Label:
zh-cn: 数据库账号密码
en: DB AccountPassword
AssociationProperty: ALIYUN::RDS::Instance::AccountPassword
InstanceType:
AssociationProperty: ALIYUN::ECS::Instance::InstanceType
AssociationPropertyMetadata:
ZoneId: ${ZoneId}
Type: String
Label:
zh-cn: 实例类型
en: Instance Type
ZoneId:
AssociationProperty: ALIYUN::ECS::Instance::ZoneId
Type: String
Description:
zh-cn: 可用区ID。<br><b>注: <font color='blue'>选择可用区前请确认该可用区是否支持创建ECS资源的规格</font></b>
en: Availability Zone ID,<br><b>note: <font color='blue'>Before selecting, please confirm that the Availability Zone supports the specification of creating ECS resources</font></b>
Label:
zh-cn: 可用区ID
en: Available Zone ID
ADBPGInstanceSpec:
Type: String
Label:
en: DBInstanceSpec
zh-cn: 实例规格
ADBPGSegmentStorage:
Type: Number
Label:
en: SegmentStorageSize
zh-cn: 节点存储容量(G)
Default: 200
Resources:
Account:
Type: ALIYUN::GPDB::Account
Properties:
DBInstanceId:
Ref: DBInstance
AccountPassword:
Ref: AccountPassword
AccountName:
Ref: AccountName
EcsSecurityGroup:
Type: ALIYUN::ECS::SecurityGroup
Properties:
SecurityGroupIngress:
- Priority: 100
PortRange: 80/80
NicType: intranet
SourceCidrIp: 0.0.0.0/0
IpProtocol: tcp
- Priority: 100
PortRange: 7860/7860
NicType: intranet
SourceCidrIp: 0.0.0.0/0
IpProtocol: tcp
- Priority: 100
PortRange: 443/443
NicType: intranet
SourceCidrIp: 0.0.0.0/0
IpProtocol: tcp
- Priority: 100
PortRange: 3389/3389
NicType: intranet
SourceCidrIp: 0.0.0.0/0
IpProtocol: tcp
VpcId:
Ref: EcsVpc
WaitConditionHandle:
Type: ALIYUN::ROS::WaitConditionHandle
EcsVSwitch:
Type: ALIYUN::ECS::VSwitch
Properties:
VpcId:
Ref: EcsVpc
CidrBlock: 192.168.1.0/24
ZoneId:
Ref: ZoneId
DBInstance:
Type: ALIYUN::GPDB::ElasticDBInstance
Properties:
SegNodeNum: 4
InstanceSpec:
Ref: ADBPGInstanceSpec
DBInstanceCategory: Basic
EngineVersion: '6.0'
ZoneId:
Ref: ZoneId
VPCId:
Ref: EcsVpc
VSwitchId:
Ref: EcsVSwitch
SegStorageType: cloud_essd
StorageSize:
Ref: ADBPGSegmentStorage
DBInstanceMode: StorageElastic
SecurityIPList:
Fn::GetAtt:
- EcsInstance
- PrivateIp
WaitCondition:
Type: ALIYUN::ROS::WaitCondition
Properties:
Count: 1
Handle:
Ref: WaitConditionHandle
Timeout: 1800
DependsOn: EcsInstance
EcsInstance:
Type: ALIYUN::ECS::Instance
Properties:
SystemDiskCategory:
Ref: SystemDiskCategory
VpcId:
Fn::GetAtt:
- EcsVpc
- VpcId
SecurityGroupId:
Ref: EcsSecurityGroup
ImageId: ubuntu_22
InternetMaxBandwidthOut: 80
IoOptimized: optimized
VSwitchId:
Ref: EcsVSwitch
Password:
Ref: InstancePassword
InstanceType:
Ref: InstanceType
EcsVpc:
Type: ALIYUN::ECS::VPC
Properties:
CidrBlock: 192.168.0.0/16
InstallChatGLM:
Type: ALIYUN::ECS::RunCommand
Properties:
InstanceIds:
- Ref: EcsInstance
Type: RunShellScript
Sync: true
Timeout: 3600
CommandContent:
Fn::Sub: |-
#!/bin/sh
cd /root
echo "---------- Download Data Center Driver For Ubuntu 22.04 ---------- \n" | tee /root/runinit.log
echo "---------- Begin to download ... @ `date` ---------- \n" | tee -a /root/runinit.log
wget -O nvidia-driver-local-repo-ubuntu2204-525.105.17_1.0-1_amd64.deb "https://cn.download.nvidia.com/tesla/525.105.17/nvidia-driver-local-repo-ubuntu2204-525.105.17_1.0-1_amd64.deb" | tee -a /root/runinit.log
echo "---------- Begin to install nvidia & pgdriver ... @ `date` ---------- \n" | tee -a /root/runinit.log
sudo dpkg -i nvidia-driver-local-repo-ubuntu2204-525.105.17_1.0-1_amd64.deb | tee -a /root/runinit.log
sudo cp /var/nvidia-driver-local-repo-ubuntu2204-525.105.17/nvidia-driver-local-321ACFBA-keyring.gpg /usr/share/keyrings/
sudo apt-get update | tee -a /root/runinit.log
sudo DEBIAN_FRONTEND=noninteractive apt-get install nvidia-driver-525 -y | tee -a /root/runinit.log
sudo DEBIAN_FRONTEND=noninteractive apt-get install postgresql-server-dev-all -y | tee -a /root/runinit.log
echo "---------- Check driver ... @ `date` ---------- \n" | tee -a /root/runinit.log
nvidia-smi | tee -a /root/runinit.log
echo "---------- pip3.10 upgrade ... @ `date` ---------- \n" | tee -a /root/runinit.log
pip3.10 install --upgrade pip
pip3.10 cache purge
echo "---------- Prepare requirements.txt ... @ `date` ---------- \n" | tee -a /root/runinit.log
cat > /root/requirements.txt << EOF
langchain==0.0.146
transformers==4.27.1
unstructured[local-inference]
layoutparser[layoutmodels,tesseract]
nltk
sentence-transformers
beautifulsoup4
icetk
cpm_kernels
faiss-cpu
accelerate
gradio==3.28.3
fastapi
uvicorn
peft
EOF
echo "---------- pip install ... @ `date` ---------- \n" | tee -a /root/runinit.log
pip3.10 install -r requirements.txt | tee -a /root/runinit.log
pip3.10 install psycopg2 | tee -a /root/runinit.log
pip3.10 install psycopg2cffi | tee -a /root/runinit.log
pip3.10 install tabulate | tee -a /root/runinit.log
echo -e "\n PreRun Completely @ `date '+%Y-%m-%d %H:%M:%S'` ... " | tee -a /root/runinit.log
cat > /root/chatbot.py <<EOF
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import os, time
from subprocess import Popen, PIPE
import argparse
import logging
import warnings
warnings.filterwarnings("ignore")
logging.basicConfig(level=logging.DEBUG,
format='%(asctime)s %(levelname)s %(funcName)s %(message)s',
datefmt='%a, %d %b %Y %H:%M:%S',
filename='chatbot.log',
filemode='w')
console = logging.StreamHandler()
console.setLevel(logging.WARN)
formatter = logging.Formatter('%(asctime)s %(levelname)s %(funcName)s %(message)s')
console.setFormatter(formatter)
logging.getLogger('').addHandler(console)
parser = argparse.ArgumentParser(description='deploy chatGLM.')
parser.add_argument('-db_connection', '--db_connection', action="store", dest='db_connection',
help='input alicloud GPDB connection info.')
parser.add_argument('-db_name', '--db_name', action="store", dest='db_name',
help='input alicloud GPDB name.')
parser.add_argument('-db_port', '--db_port', action="store", dest='db_port',
help='input alicloud GPDB port.')
parser.add_argument('-db_username', '--db_username', action="store", dest='db_username',
help='input alicloud GPDB account username.')
parser.add_argument('-db_password', '--db_password', action="store", dest='db_password',
help='input alicloud GPDB account password.')
parser.add_argument('-ecs_public_ip', '--ecs_public_ip', action="store", dest='ecs_public_ip',
help='input alicloud ECS instance public ip.')
args = parser.parse_args()
def LocalShellCmd(cmd, env=None, shell=True):
p = Popen(
cmd,
stdin = PIPE,
stdout = PIPE,
stderr = PIPE,
env = env,
shell = shell
)
stdout, stderr = p.communicate()
rc = p.wait()
logging.debug("LocalShellCmd => cmd = [%s] \n stdout => [%s] \n" % (cmd, stdout))
assert (rc == 0)
return stdout.strip()
def envCheck():
cmd = "tail -n 1 /root/runinit.log | grep 'PreRun Completely' > /dev/null 2>&1"
LocalShellCmd(cmd)
cmd = "nvidia-smi > /dev/null 2>&1"
LocalShellCmd(cmd)
cmd = "dpkg -l | grep nvidia-driver-525"
LocalShellCmd(cmd)
cmd = "dpkg -l | grep postgresql-server-dev-all"
LocalShellCmd(cmd)
if __name__ == '__main__':
print("\n" + "*"*30 + """ 提示:\n
1)如果脚本执行过程中报错, 可以通过查看 /root/chatbot.log 文件进行自助排错(很简单的)!
2)如果需要重启 WEBUI 等服务或者查看数据库信息等, 可以参考 /root/env.txt 文件!\n"""+ "*"*30 + "\n")
print("*"*30 + "Step0: 正在进行环境检查, 比如驱动和安装依赖包等" + "*"*30)
envCheck()
print("*"*30 + "Step4: 设置操作系统环境变量,准备下载模型并且启动WEB程序,耗时很长!" + "*"*30)
# setting os system variables
os.chdir("/root")
ecsPubIpAddr = args.ecs_public_ip if args.ecs_public_ip else ""
os.environ["PG_HOST"] = args.db_connection if args.db_connection else ""
os.environ["PG_PORT"] = args.db_port if args.db_port else "5432"
os.environ["PG_USER"] = args.db_username if args.db_username else ""
os.environ["PG_PASSWORD"] = args.db_password if args.db_password else ""
os.environ["PG_DATABASE"] = args.db_name if args.db_name else ""
logging.debug("""ADBPG SYSTEM VARIABLE =>
export PG_HOST=%s
export PG_PORT=%s
export PG_USER=%s
export PG_PASSWORD=%s
export PG_DATABASE=%s
""" % (os.environ["PG_HOST"], os.environ["PG_PORT"], os.environ["PG_USER"], os.environ["PG_PASSWORD"], os.environ["PG_DATABASE"]))
with open("env.txt", "w") as fw:
fw.write("export PG_HOST=%s\n" % os.environ["PG_HOST"])
fw.write("export PG_PORT=%s\n" % os.environ["PG_PORT"])
fw.write("export PG_USER=%s\n" % os.environ["PG_USER"])
fw.write("export PG_PASSWORD=%s\n" % os.environ["PG_PASSWORD"])
fw.write("export PG_DATABASE=%s\n" % os.environ["PG_DATABASE"])
fw.write("#webui url=> %s:7860\n" % ecsPubIpAddr)
cmd1 = "cd /root; git clone https://github.com/wangxuqi/langchain-ChatGLM.git ; cd langchain-ChatGLM ; git checkout analyticdb_store"
cmd2 = "nohup python3.10 /root/langchain-ChatGLM/webui.py > webui.log 2>&1 &"
print("*"*35 + "Step4.1: 下载langchain代码!" + "*"*30)
LocalShellCmd(cmd1)
print("*"*35 + """Step4.2: 开始运行chatGLM模型, 由于模型比较大(17GB左右),下载需要较长的时间, 预计需要耗时15分钟左右,请耐心等待,
具体进度可以通过 \033[1;5;32;4m tail -f webui.log \033[0m 来查看 ...""" + "*"*30)
LocalShellCmd(cmd2)
print("*="*30)
print("""
【阿里云不对您在镜像上使用的第三方模型的合法性、安全性、准确性进行任何保证,并不对由此引发的任何损害承担责任;您应自觉遵守在镜像上安装的第三方模型的用户协议、使用规范和相关法律法规,并就使用第三方模型的合法性、合规性自行承担相关责任。】
环境一切准备就绪,您可以通过浏览器打开\n\t\t\t=>=>=> %s:7860 <=<=<=\n\t 来访问和体验有记忆能力的Chatbot了!!!
""" % ecsPubIpAddr)
print("*="*30)
EOF
python3.10 /root/chatbot.py --ecs_public_ip=${EcsInstance.PublicIp} --db_connection=${DBInstance.ConnectionString} --db_port=${DBInstance.Port} --db_username=${Account.AccountName} --db_password=${AccountPassword} --db_name=${Account.AccountName}
sleep 30
i=1
while [ $i -le 10 ]
do
netstat -ntlp | grep 7860
if [ $? -eq 0 ];then
echo 'web service start success.' >> /root/web_service.log
${WaitConditionHandle.CurlCli} --data-binary '{"status": "SUCCESS"}'
break
else
echo 'web service start failed.' >> /root/web_service.log
python3.10 /root/chatbot.py --ecs_public_ip=${EcsInstance.PublicIp} --db_connection=${DBInstance.ConnectionString} --db_port=${DBInstance.Port} --db_username=${Account.AccountName} --db_password=${AccountPassword} --db_name=${Account.AccountName}
sleep 30
let "i++"
fi
done
DependsOn:
- Account
- EcsInstance
Metadata:
ALIYUN::ROS::Interface:
ParameterGroups:
- Parameters:
- ZoneId
- InstanceType
- SystemDiskCategory
- InstancePassword
Label:
default: ECS
- Parameters:
- ADBPGInstanceSpec
- ADBPGSegmentStorage
- AccountName
- AccountPassword
Label:
default: Database
TemplateTags:
- acs:technical-solution:AI:向量数据库构建企业智能知识库-tech_solu_20