This repository contains the Bicep templates for deploying the following components:
- Azure APIM Developer Sku
- Azure Application Insights
- Azure OpenAI Cognitive Services (3)
APIM GenAI capabilities demonstrated in this repository include:
- Load balancing and Circuit breaker policies
- Token Rate limiting policies
- Emit Token metrics to Application Insights
- User Managed Identity Authentication for Cognitive Services
Deploy the template to the Azure subscription using the following command:
az deployment sub create --name apim-genai --template-file main.bicep --parameters parameters.json --location francecentral
- Open the AzureOpenAI API in the Azure APIM instance and navigate to the test tab.
- Select the POST method for Creates a completion for the chat message
- Fill the template parameters as follows
- deployment-id: chat
- api-version: 2024-02-01
- Overwrite the Request body with the below
{"temperature":1,"top_p":1,"stream":false,"stop":null,"max_tokens":2000,"presence_penalty":0,"frequency_penalty":0,"logit_bias":{},"user":"user-1234","messages": [{"role":"system","content":"You are an AI assistant that helps people find information"},{"role":"user","content":"Negate the following sentence.The price for bubblegum increased on thursday."}],"n":1}
- Click send and observe the response
- x-ms-region changing to the region of AOAI service that served the request
- Metrics can be observed in the Application Insights instance created in the same resource group as the APIM instance.
- Metric blade -> select genaitest in the Metric Namespace dropdown
- Logs blade -> Tables -> customMetrics has the custom metrics emitted by the APIM policies. customDimensions field contains the dimension which can be used to aggregate the metrics.
- Update the azure-openai-token-limit policy for the API to use 100 as the tokens-per-minute and the second request should return a 429 response which is issued by APIM,