integrations/tgi/documentation.yaml (41 lines of code) (raw):
exporter_type: included
app_name_short: TGI
app_name: Text Generation Inference
app_site_name: {{app_name_short}}
app_site_url: https://huggingface.co/docs/text-generation-inference/en/index
exporter_name: the {{app_name_short}} exporter
exporter_repo_url: https://huggingface.co/docs/text-generation-inference/en/reference/metrics
gke_setup_url: /kubernetes-engine/docs/tutorials/serve-gemma-gpu-tgi
additional_prereq_info: |
{{app_site_name}} exposes Prometheus-format metrics automatically; you do not
have to install it separately. To verify that {{exporter_name}} is emitting
metrics on the expected endpoints, do the following:
1. Set up port forwarding by using the following command:
<pre class="devsite-click-to-copy">
kubectl -n {{namespace_name}} port-forward {{pod_name}} 8080:8080
</pre>
2. Access the endpoint `localhost:8080/metrics` by using the browser
or the `curl` utility in another terminal session.
dashboard_available: true
multiple_dashboards: false
dashboard_display_name: {{app_name_short}} Prometheus Overview
podmonitoring_config: |
apiVersion: monitoring.googleapis.com/v1
kind: PodMonitoring
metadata:
name: tgi
labels:
app.kubernetes.io/name: tgi
app.kubernetes.io/part-of: google-cloud-managed-prometheus
spec:
endpoints:
- port: 8080
scheme: http
interval: 30s
path: /metrics
selector:
matchLabels:
app: tgi-gemma-server
additional_podmonitoring_info: |
Ensure that the values of the `port` and `matchLabels` fields match those of the {{app_name_short}} pods you want to monitor.
sample_promql_query: up{job="tgi", cluster="{{cluster_name}}", namespace="{{namespace_name}}"}