integrations/torchserve/documentation.yaml

exporter_type: included app_name_short: TorchServe app_name: TorchServe app_site_name: {{app_name_short}} app_site_url: https://pytorch.org/serve/index.html exporter_name: the {{app_name_short}} exporter exporter_repo_url: https://pytorch.org/serve/metrics.html gke_setup_url: /kubernetes-engine/docs/tutorials/scalable-ml-models-torchserve additional_prereq_info: | {{app_name_short}} exposes Prometheus-format metrics automatically when the `metrics_mode` flag is specified either in the `config.properties` file or as an environment variable. If you are following [this guide]({{gke_setup_url}}), we recommend making the following edits to your `config.properties` file: ``` inference_address=http://0.0.0.0:8080 management_address=http://0.0.0.0:8081 + metrics_address=http://0.0.0.0:8082 + metrics_mode=prometheus number_of_netty_threads=32 job_queue_size=1000 install_py_dep_per_model=true model_store=/home/model-server/model-store load_models=all ``` Also, when deploying this image to GKE, modify your deployment and service YAML to expose the added metrics port: ``` apiVersion: apps/v1 kind: Deployment metadata: name: t5-inference labels: model: t5 version: v1.0 machine: gpu spec: replicas: 1 selector: matchLabels: model: t5 version: v1.0 machine: gpu template: metadata: labels: model: t5 version: v1.0 machine: gpu spec: nodeSelector: cloud.google.com/gke-accelerator: nvidia-l4 containers: - name: inference ... args: ["torchserve", "--start", "--foreground"] resources: ... ports: - containerPort: 8080 name: http - containerPort: 8081 name: management + - containerPort: 8082 + name: metrics --- apiVersion: v1 kind: Service metadata: name: t5-inference labels: model: t5 version: v1.0 machine: gpu spec: ... ports: - port: 8080 name: http targetPort: http - port: 8081 name: management targetPort: management + - port: 8082 + name: metrics + targetPort: metrics To verify that {{exporter_name}} is emitting metrics on the expected endpoints, do the following: 1. Set up port forwarding by using the following command: <pre class="devsite-click-to-copy"> kubectl -n {{namespace_name}} port-forward {{service_name}} 8082 </pre> 2. Access the endpoint `localhost:8082/metrics` by using the browser or the `curl` utility in another terminal session. dashboard_available: true multiple_dashboards: false dashboard_display_name: {{app_name_short}} Prometheus Overview podmonitoring_config: | apiVersion: monitoring.googleapis.com/v1 kind: PodMonitoring metadata: name: torchserve labels: app.kubernetes.io/name: torchserve app.kubernetes.io/part-of: google-cloud-managed-prometheus spec: endpoints: - port: 8082 scheme: http interval: 30s path: /metrics selector: matchLabels: model: t5 version: v1.0 machine: gpu additional_podmonitoring_info: | Ensure that the values of the `port` and `matchLabels` fields match those of the {{app_name_short}} pods you want to monitor. sample_promql_query: up{job="torchserve", cluster="{{cluster_name}}", namespace="{{namespace_name}}"}

integrations/torchserve/documentation.yaml (113 lines of code) (raw):