glean_parser/schemas/metrics.2-0-0.schema.yaml (663 lines of code) (raw):

# This Source Code Form is subject to the terms of the Mozilla Public # License, v. 2.0. If a copy of the MPL was not distributed with this # file, You can obtain one at http://mozilla.org/MPL/2.0/. --- $schema: http://json-schema.org/draft-07/schema# title: Metrics description: | Schema for the metrics.yaml files for Mozilla's Glean telemetry SDK. The top-level of the `metrics.yaml` file has a key defining each category of metrics. Categories must be snake_case, and they may also have dots `.` to define subcategories. $id: moz://mozilla.org/schemas/glean/metrics/2-0-0 definitions: token: type: string pattern: "^[A-Za-z_][A-Za-z0-9_\\.]*$" snake_case: type: string pattern: "^[a-z_][a-z0-9_]*$" dotted_snake_case: type: string pattern: "^[a-z_][a-z0-9_]{0,29}(\\.[a-z_][a-z0-9_]{0,29})*$" maxLength: 40 event_extra_key: type: string pattern: "^[ -~]+$" maxLength: 40 # Prior to version 2.0.0 of the schema, special ping names with underscores # were also supported. kebab_case: type: string pattern: "^[a-z][a-z0-9-]{0,29}$" long_id: allOf: - $ref: "#/definitions/snake_case" - maxLength: 40 short_id: allOf: - $ref: "#/definitions/snake_case" - maxLength: 70 labeled_metric_id: type: string pattern: "^[ -~]+$" maxLength: 111 # Note: this should be category + metric + 1 metric: description: | Describes a single metric. See https://mozilla.github.io/glean_parser/metrics-yaml.html type: object additionalProperties: false properties: type: title: Metric type description: | **Required.** Specifies the type of a metric, like "counter" or "event". This defines which operations are valid for the metric, how it is stored and how data analysis tooling displays it. The supported types are: - `event`: Record a specific event (with optional metadata). Additional properties: `extra_keys`. - `boolean`: A metric storing values of true or false. - `string`: A metric storing Unicode string values. - `string_list`: a list of Unicode strings. - `counter`: A numeric value that can only be incremented. - `quantity`: A numeric value that is set directly. - `timespan`: Represents a time interval. Additional properties: `time_unit`. - `timing_distribution`: Record the distribution of multiple timings. Additional properties: `time_unit`. - `datetime`: A date/time value. Represented as an ISO datetime in UTC. Additional properties: `time_unit`. - `uuid`: Record a UUID v4. - `url`: Record a valid URL string. - `memory_distribution`: A histogram for recording memory usage values. Additional properties: `memory_unit`. - `custom_distribution`: A histogram with a custom range and number of buckets. This metric type is for legacy support only and is only allowed for metrics coming from GeckoView. Additional properties: `range_min`, `range_max`, `bucket_count`, `histogram_type`. - `rate`: Used to record the rate something happens relative to some other thing. For example, the number of HTTP connections that experience an error relative to the number of total HTTP connections made. - Additionally, labeled versions of many metric types are supported. These support the `labels`_ parameter, allowing multiple instances of the metric to be stored at a given set of labels. The labeled metric types include: `labeled_boolean`, `labeled_string`, `labeled_counter`, `labeled_custom_distribution`, `labeled_memory_distribution`, `labeled_timing_distribution`, `labeled_quantity`. - `text`: Record long text data. - `object`: Record structured data based on a pre-defined schema Additional properties: `structure`. type: string enum: - event - boolean - string - string_list - counter - quantity - timespan - timing_distribution - custom_distribution - memory_distribution - datetime - uuid - url - jwe - labeled_boolean - labeled_string - labeled_counter - labeled_custom_distribution - labeled_memory_distribution - labeled_timing_distribution - labeled_quantity - rate - text - object description: title: Description description: | **Required.** A textual description of what this metric does, what it means, and its edge cases or any other helpful information. Descriptions may contain [markdown syntax](https://www.markdownguide.org/basic-syntax/). type: string metadata: title: Metadata description: | Additional metadata about this metric. Currently limited to a list of tags. type: object properties: tags: title: Tags description: Which tags are specified for this metric. type: array items: type: string maxLength: 80 default: {} lifetime: title: Lifetime description: | Defines the lifetime of the metric. It must be one of the following values: - `ping` (default): The metric is reset each time it is sent in a ping. - `user`: The metric contains a property that is part of the user's profile and is never reset. - `application`: The metric contains a property that is related to the application, and is reset only at application restarts. enum: - ping - user - application default: ping send_in_pings: title: Send in pings description: | Which pings the metric should be sent on. If not specified, the metric is sent on the "default ping", which is the `events` ping for events, and the `metrics` ping for everything else. Most metrics don't need to specify this. (There is an additional special value of `all-pings` for internal Glean metrics only that is used to indicate that a metric may appear in any ping.) type: array items: anyOf: - $ref: "#/definitions/kebab_case" # Allow "special" ping names that start with "glean_" used # internally by the Glean SDK - type: string pattern: "^glean_.*$" default: - default notification_emails: title: Notification emails description: | **Required.** A list of email addresses to notify for important events with the metric or when people with context or ownership for the metric need to be contacted. type: array minItems: 1 items: type: string format: email bugs: title: Related bugs description: | **Required.** A list of bug URLs (e.g. Bugzilla and Github) that are relevant to this metric, e.g., tracking its original implementation or later changes to it. Prior to version 2.0.0 of the schema, bugs could also be integers. type: array minItems: 1 items: type: string format: uri data_reviews: title: Review references description: | **Required.** A list of URIs to any data collection reviews relevant to the metric. type: array items: type: string format: uri disabled: title: Disabled description: | If `true`, the metric is disabled, and any metric collection on it will be silently ignored at runtime. type: boolean default: false expires: title: Expires description: | **Required.** By default it may be one of the following values: - `<build date>`: An ISO date `yyyy-mm-dd` in UTC on which the metric expires. For example, `2019-03-13`. This date is checked at build time. Except in special cases, this form should be used so that the metric automatically "sunsets" after a period of time. - `<major version>`: An integer greater than 0 representing the major version the metric expires in. For example, `11`. The version is checked at build time against the major provided to the glean_parser and is only valid if a major version is provided at built time. If no major version is provided at build time and expiration by major version is used for a metric, an error is raised. Note that mixing expiration by date and version is not allowed within a product. - `never`: This metric never expires. - `expired`: This metric is manually expired. The default may be overriden in certain applications by the `custom_validate_expires` and `custom_is_expired` configs. oneOf: - type: string - type: integer minimum: 1 version: title: Metric version description: | The version of the metric. A monotonically increasing value. If not provided, defaults to 0. time_unit: title: Time unit description: | For timespans and datetimes, specifies the unit that the metric will be stored and displayed in. If not provided, it defaults to "millisecond". Time values are sent to the backend as integers, so `time_unit`_ determines the maximum resolution at which timespans are recorded. Times are always truncated, not rounded, to the nearest time unit. For example, a measurement of 25 ns will be returned as 0 ms if `time_unit` is `"millisecond"`. For timing distributions, times are always recorded and sent in nanoseconds, but `time_unit` controls the minimum and maximum values. If not provided, it defaults to "nanosecond". - nanosecond: 1ns <= x <= 10 minutes - microsecond: 1μs <= x <= ~6.94 days - millisecond: 1ms <= x <= ~19 years Valid when `type`_ is `timespan`, `timing_distribution` or `datetime`. enum: - nanosecond - microsecond - millisecond - second - minute - hour - day memory_unit: title: Memory unit description: | The unit that the incoming memory size values are recorded in. The units are the power-of-2 units, so "kilobyte" is correctly a "kibibyte". - kilobyte == 2^10 == 1,024 bytes - megabyte == 2^20 == 1,048,576 bytes - gigabyte == 2^30 == 1,073,741,824 bytes Values are automatically converted to and transmitted as bytes. Valid when `type`_ is `memory_distribution`. enum: - byte - kilobyte - megabyte - gigabyte labels: title: Labels description: | A list of labels for a labeled metric. If provided, the labels are enforced at run time, and recording to an unknown label is recorded to the special label `__other__`. If not provided, the labels may be anything, but using too many unique labels will put some labels in the special label `__other__`. Valid with any of the labeled metric types. anyOf: - type: array uniqueItems: true items: $ref: "#/definitions/labeled_metric_id" maxItems: 4096 - type: "null" extra_keys: title: Extra keys description: | The acceptable keys on the "extra" object sent with events. This is an object mapping the key to an object containing metadata about the key. A maximum of 50 extra keys is allowed. This metadata object has the following keys: - `description`: **Required.** A description of the key. Valid when `type`_ is `event`. type: object propertyNames: $ref: "#/definitions/event_extra_key" additionalProperties: type: object properties: description: type: string type: type: string enum: - string - boolean - quantity required: - description maxProperties: 50 default: {} gecko_datapoint: title: Gecko Datapoint description: | This is a Gecko-specific property. It is the name of the Gecko metric to accumulate the data from, when using the Glean SDK in a product using GeckoView. See bug 1566356 for more context. type: string range_min: title: Range minimum description: | The minimum value of a custom distribution. Valid when `type`_ is `custom_distribution`. type: number default: 1 range_max: title: Range maximum description: | The maximum value of a custom distribution. Required when `type`_ is `custom_distribution`. type: number bucket_count: title: Bucket count description: | The number of buckets to include in a custom distribution. Required when `type`_ is `custom_distribution`. type: number minimum: 1 histogram_type: title: Histogram type description: | The type of histogram bucketing to use: - `linear`: The buckets are linearly spaced within the range. - `exponential`: The buckets use the natural logarithmic so the smaller-valued buckets are smaller in size than the higher-valued buckets. Required when `type`_ is `custom_distribution`. enum: - linear - exponential unit: title: Unit description: | The unit of the metric. This is only required for metrics that don't already require a meaningful unit, e.g. `quantity` This is provided for informational purposes only and doesn't have any effect on data collection. Metric types like `timespan`, `datetime` and `timing_distribution` take a `time_unit` instead. type: string no_lint: title: Lint checks to skip description: | This parameter lists any lint checks to skip for this metric only. type: array items: type: string data_sensitivity: title: The level of data sensitivity description: | There are four data collection categories related to data sensitivity [defined here](https://wiki.mozilla.org/Firefox/Data_Collection): - **Category 1: Technical Data:** (`technical`) Information about the machine or Firefox itself. Examples include OS, available memory, crashes and errors, outcome of automated processes like updates, safebrowsing, activation, version \#s, and buildid. This also includes compatibility information about features and APIs used by websites, addons, and other 3rd-party software that interact with Firefox during usage. - **Category 2: Interaction Data:** (`interaction`) Information about the user’s direct engagement with Firefox. Examples include how many tabs, addons, or windows a user has open; uses of specific Firefox features; session length, scrolls and clicks; and the status of discrete user preferences. It also includes information about the user's in-product journeys and product choices helpful to understand engagement (attitudes). For example, selections of add-ons or tiles to determine potential interest categories etc. - **Category 3: Stored Content & Communications:** (`stored_content`, formerly Web activity data, `web_activity`) Information about what people store, sync, communicate or connect to where the information is generally considered to be more sensitive and personal in nature. Examples include users' saved URLs or URL history, specific web browsing history, general information about their web browsing history (such as TLDs or categories of webpages visited over time) and potentially certain types of interaction data about specific web pages or stories visited (such as highlighted portions of a story). It also includes information such as content saved by users to an individual account like saved URLs, tags, notes, passwords and files as well as communications that users have with one another through a Mozilla service. - **Category 4: Highly sensitive data or clearly identifiable personal data:** (`highly_sensitive`) Information that directly identifies a person, or if combined with other data could identify a person. This data may be embedded within specific website content, such as memory contents, dumps, captures of screen data, or DOM data. Examples include account registration data like name, password, and email address associated with an account, payment data in connection with subscriptions or donations, contact information such as phone numbers or mailing addresses, email addresses associated with surveys, promotions and customer support contacts. It also includes any data from different categories that, when combined, can identify a person, device, household or account. For example Category 1 log data combined with Category 3 saved URLs. Additional examples are: voice audio commands (including a voice audio file), speech-to-text or text-to-speech (including transcripts), biometric data, demographic information, and precise location data associated with a persistent identifier, individual or small population cohorts. This is location inferred or determined from mechanisms other than IP such as wi-fi access points, Bluetooth beacons, cell phone towers or provided directly to us, such as in a survey or a profile. type: array items: enum: - technical - interaction - stored_content - web_activity - highly_sensitive type: string minLength: 1 uniqueItems: true telemetry_mirror: title: Which probe in Telemetry to mirror this metric's value to. description: | The C++ enum form of the Scalar, Event, or Histogram to which we should mirror values. Use is limited to Firefox Desktop only. Has no effect when used with non-FOG outputters. See FOG's documentation on mirroring for details - https://firefox-source-docs.mozilla.org/toolkit/components/glean/user/gifft.html type: string minLength: 6 denominator_metric: title: The name of the denominator for this `rate` metric. description: | Denominators for `rate` metrics may be private and internal or shared and external. External denominators are `counter` metrics. This field names the `counter` metric that serves as this `rate` metric's external denominator. The named denominator must be defined in this component so glean_parser can find it. type: string structure: title: A subset of a JSON schema definition description: | The expected structure of data, defined in a strict subset of YAML-dialect JSON Schema (Draft 7) supporting keys "type" (only values "object", "array", "number", "string", and "boolean"), "properties", and "items". type: object required: - type - bugs - description - notification_emails - data_reviews - expires type: object propertyNames: anyOf: - allOf: - $ref: "#/definitions/dotted_snake_case" - not: description: "'pings' is reserved as a category name." const: pings - not: description: "'tags' is reserved as a category name." const: tags - enum: ['$schema', '$tags'] properties: $schema: type: string format: url no_lint: title: Lint checks to skip globally description: | This parameter lists any lint checks to skip for this whole file. type: array items: type: string $tags: title: Tags that apply to the whole file description: | This denotes the list of tags that apply to all metrics in this file. type: array items: type: string additionalProperties: type: object propertyNames: anyOf: - $ref: "#/definitions/short_id" additionalProperties: allOf: - $ref: "#/definitions/metric" - if: properties: type: const: event then: properties: lifetime: description: | Event metrics must have ping lifetime. const: ping - if: not: properties: type: enum: - timing_distribution - custom_distribution - memory_distribution - quantity - boolean - string - labeled_counter then: properties: gecko_datapoint: description: | `gecko_datapoint` is only allowed for `timing_distribution`, `custom_distribution`, `memory_distribution`, `quantity`, `boolean`, `string` and `labeled_counter`. maxLength: 0 - if: properties: type: const: custom_distribution then: required: - range_max - bucket_count - histogram_type description: | `custom_distribution` is missing required parameters `range_max`, `bucket_count` and `histogram_type`. - if: properties: type: const: memory_distribution then: required: - memory_unit description: | `memory_distribution` is missing required parameter `memory_unit`. - if: properties: type: const: quantity then: required: - unit description: | `quantity` is missing required parameter `unit`. - if: properties: type: const: jwe then: required: - jwe_support_was_removed description: | JWE support was removed. If you require this send an email to glean-team@mozilla.com. - if: not: properties: type: const: rate then: properties: denominator_metric: description: | `denominator_metric` is only allowed for `rate`. maxLength: 0 - if: properties: type: const: text then: properties: lifetime: description: > Text metrics must have ping or application lifetime. enum: - ping - application data_sensitivity: description: > Text metrics require Category 3 (`stored_content` / `web_activity`) or Category 4 (`highly_sensitive`). type: array items: enum: - stored_content - web_activity - highly_sensitive send_in_pings: description: | Text metrics can only be sent in custom pings. Built-in pings are not allowed. type: array items: allOf: - $ref: "#/definitions/kebab_case" - not: description: > Text metrics can only be sent in custom pings. Built-in pings are not allowed." pattern: "^(metrics|baseline|events|deletion-request|default|glean_.*)$" - if: # This is a schema check: # This is true when the checked YAML passes the schema validation. # # If it has a datetime/timing_distribution/timespan type # AND has a `unit` property, then... properties: type: enum: - datetime - timing_distribution - timespan required: - unit # ... then `time_unit` is required, # because that's the only way we can force this to fail. then: required: - time_unit description: | This metric type uses the (optional) `time_unit` parameter, not `unit`.