documents/solution/ai/use-gpu-ecs-to-deploy-chatGLM.yaml (371 lines of code) (raw):

Outputs: WebUIUrl: Description: zh-cn: WebUI访问域名。 en: URL of WebUI. Value: Fn::Sub: - http://${PublicIp}:7860 - PublicIp: Fn::GetAtt: - EcsInstance - PublicIp ROSTemplateFormatVersion: '2015-09-01' Description: zh-cn: 创建ECS实例与GPDB数据库,配置安全组与网络环境,安装ChatGLM模型及依赖,通过WebUI提供服务,自动检查与启动服务,对外暴露7860端口。 en: Create ECS instances and GPDB databases, configure security groups and network environments, install the ChatGLM model and its dependencies, provide services via WebUI, automatically monitor and start the service, and expose port 7860 externally. Parameters: SystemDiskCategory: AssociationProperty: ALIYUN::ECS::Disk::SystemDiskCategory AssociationPropertyMetadata: InstanceType: ${InstanceType} ZoneId: ${ZoneId} Type: String Description: zh-cn: '<font color=''blue''><b>可选值:</b></font><br>[cloud_efficiency: <font color=''green''>高效云盘</font>]<br>[cloud_ssd: <font color=''green''>SSD云盘</font>]<br>[cloud_essd: <font color=''green''>ESSD云盘</font>]<br>[cloud: <font color=''green''>普通云盘</font>]<br>[ephemeral_ssd: <font color=''green''>本地SSD盘</font>]' en: '<font color=''blue''><b>Optional values:</b></font><br>[cloud_efficiency: <font color=''green''>Efficient Cloud Disk</font>]<br>[cloud_ssd: <font color=''green''>SSD Cloud Disk</font>]<br>[cloud_essd: <font color=''green''>ESSD Cloud Disk</font>]<br>[cloud: <font color=''green''>Cloud Disk</font>]<br>[ephemeral_ssd: <font color=''green''>Local SSD Cloud Disk</font>]' Label: zh-cn: 系统磁盘类型 en: System Disk Category InstancePassword: ConstraintDescription: zh-cn: 长度8-30,必须包含大写字母、小写字母、数字、特殊符号三种;特殊字符包括:()`~!@#$%^&*_-+=|{}[]:;' <>,.?/ en: 'Length 8-30, must contain upper case letters, lower case letters, Numbers, special symbols three; special characters include: ()`~!@#$%^&*_-+=|{}[]:;''<>,.?/' Description: zh-cn: 长度8-30,必须包含大写字母、小写字母、数字、特殊符号三个;<br>特殊字符包括:()`~!@#$%^&*_-+=|{}[]:;'<>,.?/ en: The 8-30 long login password of instance, consists of the uppercase, lowercase letter and number. <br> special characters include()`~!@#$%^&*_-+=|{}[]:;'<>,.?/ MinLength: '8' Label: zh-cn: 实例密码 en: Instance Password AllowedPattern: '[0-9A-Za-z\_\-&:;''<>,=%`~!@#\(\)\$\^\*\+\|\{\}\[\]\.\?\/]+$' NoEcho: true MaxLength: '30' Type: String AccountName: Default: mytest Type: String Label: zh-cn: 数据库账号名称 en: DB Account AccountPassword: NoEcho: true Type: String Label: zh-cn: 数据库账号密码 en: DB AccountPassword AssociationProperty: ALIYUN::RDS::Instance::AccountPassword InstanceType: AssociationProperty: ALIYUN::ECS::Instance::InstanceType AssociationPropertyMetadata: ZoneId: ${ZoneId} Type: String Label: zh-cn: 实例类型 en: Instance Type ZoneId: AssociationProperty: ALIYUN::ECS::Instance::ZoneId Type: String Description: zh-cn: 可用区ID。<br><b>注: <font color='blue'>选择可用区前请确认该可用区是否支持创建ECS资源的规格</font></b> en: Availability Zone ID,<br><b>note: <font color='blue'>Before selecting, please confirm that the Availability Zone supports the specification of creating ECS resources</font></b> Label: zh-cn: 可用区ID en: Available Zone ID ADBPGInstanceSpec: Type: String Label: en: DBInstanceSpec zh-cn: 实例规格 ADBPGSegmentStorage: Type: Number Label: en: SegmentStorageSize zh-cn: 节点存储容量(G) Default: 200 Resources: Account: Type: ALIYUN::GPDB::Account Properties: DBInstanceId: Ref: DBInstance AccountPassword: Ref: AccountPassword AccountName: Ref: AccountName EcsSecurityGroup: Type: ALIYUN::ECS::SecurityGroup Properties: SecurityGroupIngress: - Priority: 100 PortRange: 80/80 NicType: intranet SourceCidrIp: 0.0.0.0/0 IpProtocol: tcp - Priority: 100 PortRange: 7860/7860 NicType: intranet SourceCidrIp: 0.0.0.0/0 IpProtocol: tcp - Priority: 100 PortRange: 443/443 NicType: intranet SourceCidrIp: 0.0.0.0/0 IpProtocol: tcp - Priority: 100 PortRange: 3389/3389 NicType: intranet SourceCidrIp: 0.0.0.0/0 IpProtocol: tcp VpcId: Ref: EcsVpc WaitConditionHandle: Type: ALIYUN::ROS::WaitConditionHandle EcsVSwitch: Type: ALIYUN::ECS::VSwitch Properties: VpcId: Ref: EcsVpc CidrBlock: 192.168.1.0/24 ZoneId: Ref: ZoneId DBInstance: Type: ALIYUN::GPDB::ElasticDBInstance Properties: SegNodeNum: 4 InstanceSpec: Ref: ADBPGInstanceSpec DBInstanceCategory: Basic EngineVersion: '6.0' ZoneId: Ref: ZoneId VPCId: Ref: EcsVpc VSwitchId: Ref: EcsVSwitch SegStorageType: cloud_essd StorageSize: Ref: ADBPGSegmentStorage DBInstanceMode: StorageElastic SecurityIPList: Fn::GetAtt: - EcsInstance - PrivateIp WaitCondition: Type: ALIYUN::ROS::WaitCondition Properties: Count: 1 Handle: Ref: WaitConditionHandle Timeout: 1800 DependsOn: EcsInstance EcsInstance: Type: ALIYUN::ECS::Instance Properties: SystemDiskCategory: Ref: SystemDiskCategory VpcId: Fn::GetAtt: - EcsVpc - VpcId SecurityGroupId: Ref: EcsSecurityGroup ImageId: ubuntu_22 InternetMaxBandwidthOut: 80 IoOptimized: optimized VSwitchId: Ref: EcsVSwitch Password: Ref: InstancePassword InstanceType: Ref: InstanceType EcsVpc: Type: ALIYUN::ECS::VPC Properties: CidrBlock: 192.168.0.0/16 InstallChatGLM: Type: ALIYUN::ECS::RunCommand Properties: InstanceIds: - Ref: EcsInstance Type: RunShellScript Sync: true Timeout: 3600 CommandContent: Fn::Sub: |- #!/bin/sh cd /root echo "---------- Download Data Center Driver For Ubuntu 22.04 ---------- \n" | tee /root/runinit.log echo "---------- Begin to download ... @ `date` ---------- \n" | tee -a /root/runinit.log wget -O nvidia-driver-local-repo-ubuntu2204-525.105.17_1.0-1_amd64.deb "https://cn.download.nvidia.com/tesla/525.105.17/nvidia-driver-local-repo-ubuntu2204-525.105.17_1.0-1_amd64.deb" | tee -a /root/runinit.log echo "---------- Begin to install nvidia & pgdriver ... @ `date` ---------- \n" | tee -a /root/runinit.log sudo dpkg -i nvidia-driver-local-repo-ubuntu2204-525.105.17_1.0-1_amd64.deb | tee -a /root/runinit.log sudo cp /var/nvidia-driver-local-repo-ubuntu2204-525.105.17/nvidia-driver-local-321ACFBA-keyring.gpg /usr/share/keyrings/ sudo apt-get update | tee -a /root/runinit.log sudo DEBIAN_FRONTEND=noninteractive apt-get install nvidia-driver-525 -y | tee -a /root/runinit.log sudo DEBIAN_FRONTEND=noninteractive apt-get install postgresql-server-dev-all -y | tee -a /root/runinit.log echo "---------- Check driver ... @ `date` ---------- \n" | tee -a /root/runinit.log nvidia-smi | tee -a /root/runinit.log echo "---------- pip3.10 upgrade ... @ `date` ---------- \n" | tee -a /root/runinit.log pip3.10 install --upgrade pip pip3.10 cache purge echo "---------- Prepare requirements.txt ... @ `date` ---------- \n" | tee -a /root/runinit.log cat > /root/requirements.txt << EOF langchain==0.0.146 transformers==4.27.1 unstructured[local-inference] layoutparser[layoutmodels,tesseract] nltk sentence-transformers beautifulsoup4 icetk cpm_kernels faiss-cpu accelerate gradio==3.28.3 fastapi uvicorn peft EOF echo "---------- pip install ... @ `date` ---------- \n" | tee -a /root/runinit.log pip3.10 install -r requirements.txt | tee -a /root/runinit.log pip3.10 install psycopg2 | tee -a /root/runinit.log pip3.10 install psycopg2cffi | tee -a /root/runinit.log pip3.10 install tabulate | tee -a /root/runinit.log echo -e "\n PreRun Completely @ `date '+%Y-%m-%d %H:%M:%S'` ... " | tee -a /root/runinit.log cat > /root/chatbot.py <<EOF #!/usr/bin/env python # -*- coding: utf-8 -*- import os, time from subprocess import Popen, PIPE import argparse import logging import warnings warnings.filterwarnings("ignore") logging.basicConfig(level=logging.DEBUG, format='%(asctime)s %(levelname)s %(funcName)s %(message)s', datefmt='%a, %d %b %Y %H:%M:%S', filename='chatbot.log', filemode='w') console = logging.StreamHandler() console.setLevel(logging.WARN) formatter = logging.Formatter('%(asctime)s %(levelname)s %(funcName)s %(message)s') console.setFormatter(formatter) logging.getLogger('').addHandler(console) parser = argparse.ArgumentParser(description='deploy chatGLM.') parser.add_argument('-db_connection', '--db_connection', action="store", dest='db_connection', help='input alicloud GPDB connection info.') parser.add_argument('-db_name', '--db_name', action="store", dest='db_name', help='input alicloud GPDB name.') parser.add_argument('-db_port', '--db_port', action="store", dest='db_port', help='input alicloud GPDB port.') parser.add_argument('-db_username', '--db_username', action="store", dest='db_username', help='input alicloud GPDB account username.') parser.add_argument('-db_password', '--db_password', action="store", dest='db_password', help='input alicloud GPDB account password.') parser.add_argument('-ecs_public_ip', '--ecs_public_ip', action="store", dest='ecs_public_ip', help='input alicloud ECS instance public ip.') args = parser.parse_args() def LocalShellCmd(cmd, env=None, shell=True): p = Popen( cmd, stdin = PIPE, stdout = PIPE, stderr = PIPE, env = env, shell = shell ) stdout, stderr = p.communicate() rc = p.wait() logging.debug("LocalShellCmd => cmd = [%s] \n stdout => [%s] \n" % (cmd, stdout)) assert (rc == 0) return stdout.strip() def envCheck(): cmd = "tail -n 1 /root/runinit.log | grep 'PreRun Completely' > /dev/null 2>&1" LocalShellCmd(cmd) cmd = "nvidia-smi > /dev/null 2>&1" LocalShellCmd(cmd) cmd = "dpkg -l | grep nvidia-driver-525" LocalShellCmd(cmd) cmd = "dpkg -l | grep postgresql-server-dev-all" LocalShellCmd(cmd) if __name__ == '__main__': print("\n" + "*"*30 + """ 提示:\n 1)如果脚本执行过程中报错, 可以通过查看 /root/chatbot.log 文件进行自助排错(很简单的)! 2)如果需要重启 WEBUI 等服务或者查看数据库信息等, 可以参考 /root/env.txt 文件!\n"""+ "*"*30 + "\n") print("*"*30 + "Step0: 正在进行环境检查, 比如驱动和安装依赖包等" + "*"*30) envCheck() print("*"*30 + "Step4: 设置操作系统环境变量,准备下载模型并且启动WEB程序,耗时很长!" + "*"*30) # setting os system variables os.chdir("/root") ecsPubIpAddr = args.ecs_public_ip if args.ecs_public_ip else "" os.environ["PG_HOST"] = args.db_connection if args.db_connection else "" os.environ["PG_PORT"] = args.db_port if args.db_port else "5432" os.environ["PG_USER"] = args.db_username if args.db_username else "" os.environ["PG_PASSWORD"] = args.db_password if args.db_password else "" os.environ["PG_DATABASE"] = args.db_name if args.db_name else "" logging.debug("""ADBPG SYSTEM VARIABLE => export PG_HOST=%s export PG_PORT=%s export PG_USER=%s export PG_PASSWORD=%s export PG_DATABASE=%s """ % (os.environ["PG_HOST"], os.environ["PG_PORT"], os.environ["PG_USER"], os.environ["PG_PASSWORD"], os.environ["PG_DATABASE"])) with open("env.txt", "w") as fw: fw.write("export PG_HOST=%s\n" % os.environ["PG_HOST"]) fw.write("export PG_PORT=%s\n" % os.environ["PG_PORT"]) fw.write("export PG_USER=%s\n" % os.environ["PG_USER"]) fw.write("export PG_PASSWORD=%s\n" % os.environ["PG_PASSWORD"]) fw.write("export PG_DATABASE=%s\n" % os.environ["PG_DATABASE"]) fw.write("#webui url=> %s:7860\n" % ecsPubIpAddr) cmd1 = "cd /root; git clone https://github.com/wangxuqi/langchain-ChatGLM.git ; cd langchain-ChatGLM ; git checkout analyticdb_store" cmd2 = "nohup python3.10 /root/langchain-ChatGLM/webui.py > webui.log 2>&1 &" print("*"*35 + "Step4.1: 下载langchain代码!" + "*"*30) LocalShellCmd(cmd1) print("*"*35 + """Step4.2: 开始运行chatGLM模型, 由于模型比较大(17GB左右),下载需要较长的时间, 预计需要耗时15分钟左右,请耐心等待, 具体进度可以通过 \033[1;5;32;4m tail -f webui.log \033[0m 来查看 ...""" + "*"*30) LocalShellCmd(cmd2) print("*="*30) print(""" 【阿里云不对您在镜像上使用的第三方模型的合法性、安全性、准确性进行任何保证,并不对由此引发的任何损害承担责任;您应自觉遵守在镜像上安装的第三方模型的用户协议、使用规范和相关法律法规,并就使用第三方模型的合法性、合规性自行承担相关责任。】 环境一切准备就绪,您可以通过浏览器打开\n\t\t\t=>=>=> %s:7860 <=<=<=\n\t 来访问和体验有记忆能力的Chatbot了!!! """ % ecsPubIpAddr) print("*="*30) EOF python3.10 /root/chatbot.py --ecs_public_ip=${EcsInstance.PublicIp} --db_connection=${DBInstance.ConnectionString} --db_port=${DBInstance.Port} --db_username=${Account.AccountName} --db_password=${AccountPassword} --db_name=${Account.AccountName} sleep 30 i=1 while [ $i -le 10 ] do netstat -ntlp | grep 7860 if [ $? -eq 0 ];then echo 'web service start success.' >> /root/web_service.log ${WaitConditionHandle.CurlCli} --data-binary '{"status": "SUCCESS"}' break else echo 'web service start failed.' >> /root/web_service.log python3.10 /root/chatbot.py --ecs_public_ip=${EcsInstance.PublicIp} --db_connection=${DBInstance.ConnectionString} --db_port=${DBInstance.Port} --db_username=${Account.AccountName} --db_password=${AccountPassword} --db_name=${Account.AccountName} sleep 30 let "i++" fi done DependsOn: - Account - EcsInstance Metadata: ALIYUN::ROS::Interface: ParameterGroups: - Parameters: - ZoneId - InstanceType - SystemDiskCategory - InstancePassword Label: default: ECS - Parameters: - ADBPGInstanceSpec - ADBPGSegmentStorage - AccountName - AccountPassword Label: default: Database TemplateTags: - acs:technical-solution:AI:向量数据库构建企业智能知识库-tech_solu_20