scenarios/workload-genai/policies/fragments/rate-limiting/rate-limiting-by-tokens.xml (7 lines of code) (raw):
<fragment>
<!-- Rate limit configuration at a subscription level. Works for Azure OpenAI endpoint,
both streaming and non-streaming endpoints.
allows 500 tokens per 60 seconds. -->
<azure-openai-token-limit counter-key="@(String.Concat(context.Subscription.Id,"-max-token"))"
tokens-per-minute="500"
estimate-prompt-tokens="true"
remaining-tokens-header-name="x-apim-max-remaining-tokens"
tokens-consumed-header-name="x-apim-max-consumed-tokens"/>
</fragment>