3_llmops-aifoundry/3_4_operationalizing/contentsafety_with_code

{ "cells": [ { "cell_type": "markdown", "id": "9bb37f16", "metadata": {}, "source": [ "# Create and Run Content Safety using Python SDK\n", "\n", "### Overview\n", "\n", "Azure AI Content Safety detects harmful user-generated and AI-generated content in applications and services. Content Safety includes text and image APIs that allow you to detect material that is harmful.\n", "\n", "In this hands-on, you will be able to:\n", "Manage text blocklist: The default AI classifiers are sufficient for most content safety needs.\n", "Analyze text contents: Scans text for sexual content, violence, hate, and self-harm with multi-severity levels.\n", "Analyze images: Scans images for sexual content, violence, hate, and self-harm with multi-severity levels.\n", "Integrate with Azure Open AI Service: Use the Azure Open AI Service to rewrite the content for harmful content.\n", "\n", "#### 1. Azure Content Safety Set up\n", "\n", "#### 2. Azure Content Safety with BlockList\n", "\n", "#### 3. Azure Content Safety for Text\n", "\n", "#### 4. Azure Content Safety for Iamges\n", "\n", "#### 5. Rewrite Content with Azure Open AI Service\n", "\n", "[Note] Please use `Python 3.10 - SDK v2 (azureml_py310_sdkv2)` conda environment.\n" ] }, { "cell_type": "code", "execution_count": null, "id": "913c3614", "metadata": {}, "outputs": [], "source": [ "%load_ext autoreload\n", "%autoreload 2\n", "\n", "import os, sys\n", "lab_prep_dir = os.getcwd().split(\"slm-innovator-lab\")[0] + \"slm-innovator-lab/0_lab_preparation\"\n", "sys.path.append(os.path.abspath(lab_prep_dir))\n", "\n", "from common import check_kernel\n", "check_kernel()" ] }, { "cell_type": "code", "execution_count": null, "id": "17e40847", "metadata": {}, "outputs": [], "source": [ "import datetime\n", "import os\n", "import sys\n", "from openai import AzureOpenAI\n", "from azure.ai.contentsafety import ContentSafetyClient\n", "from azure.ai.contentsafety.models import (\n", " AnalyzeImageOptions,\n", " ImageData,\n", " AnalyzeTextOptions,\n", " ImageCategory, \n", " TextCategory\n", ")\n", "from azure.core.credentials import AzureKeyCredential\n", "from azure.core.exceptions import HttpResponseError\n", "from IPython.display import Image\n", "from dotenv import load_dotenv\n", "\n", "from azure.ai.contentsafety import BlocklistClient\n", "from azure.ai.contentsafety.models import TextBlocklist, TextBlocklistItem\n", "from azure.ai.contentsafety.models import AddOrUpdateTextBlocklistItemsOptions, TextBlocklistItem\n", "\n", "load_dotenv()" ] }, { "cell_type": "markdown", "id": "4648ea3d", "metadata": {}, "source": [ "## 2. Azure Content Safety with BlockList\n", "\n", "- The default AI classifiers are sufficient for most content safety needs however, you might need to screen for terms that are specific to your use case. You can create blocklists of terms to use with the Text API.\n" ] }, { "cell_type": "code", "execution_count": null, "id": "93ebf83e", "metadata": {}, "outputs": [], "source": [ "def create_or_update_text_blocklist(blocklist_name, blocklist_description):\n", " \n", " \n", " key = os.environ[\"CONTENT_SAFETY_KEY\"]\n", " endpoint = os.environ[\"CONTENT_SAFETY_ENDPOINT\"]\n", "\n", " # Create a Blocklist client\n", " client = BlocklistClient(endpoint, AzureKeyCredential(key))\n", "\n", " try:\n", " blocklist = client.create_or_update_text_blocklist(\n", " blocklist_name=blocklist_name,\n", " options=TextBlocklist(blocklist_name=blocklist_name, description=blocklist_description),\n", " )\n", " if blocklist:\n", " print(\"\\nBlocklist created or updated: \")\n", " print(f\"Name: {blocklist.blocklist_name}, Description: {blocklist.description}\")\n", " except HttpResponseError as e:\n", " print(\"\\nCreate or update text blocklist failed: \")\n", " if e.error:\n", " print(f\"Error code: {e.error.code}\")\n", " print(f\"Error message: {e.error.message}\")\n", " raise\n", " print(e)\n", " raise" ] }, { "cell_type": "code", "execution_count": null, "id": "e7f6bdb8", "metadata": {}, "outputs": [], "source": [ "def add_blocklist_items(blocklist_name, blocklist_items):\n", " \n", " key = os.environ[\"CONTENT_SAFETY_KEY\"]\n", " endpoint = os.environ[\"CONTENT_SAFETY_ENDPOINT\"]\n", "\n", " # Create a Blocklist client\n", " client = BlocklistClient(endpoint, AzureKeyCredential(key))\n", "\n", " try:\n", " result = client.add_or_update_blocklist_items(\n", " blocklist_name=blocklist_name, options=AddOrUpdateTextBlocklistItemsOptions(blocklist_items=blocklist_items)\n", " )\n", " for blocklist_item in result.blocklist_items:\n", " print(\n", " f\"BlocklistItemId: {blocklist_item.blocklist_item_id}, Text: {blocklist_item.text}, Description: {blocklist_item.description}\"\n", " )\n", " except HttpResponseError as e:\n", " print(\"\\nAdd blocklist items failed: \")\n", " if e.error:\n", " print(f\"Error code: {e.error.code}\")\n", " print(f\"Error message: {e.error.message}\")\n", " raise\n", " print(e)\n", " raise\n" ] }, { "cell_type": "code", "execution_count": null, "id": "e2f5e97a", "metadata": {}, "outputs": [], "source": [ "create_or_update_text_blocklist(\"TestBlocklist\", \"Test blocklist management.\")" ] }, { "cell_type": "code", "execution_count": null, "id": "a5e9d4d7", "metadata": {}, "outputs": [], "source": [ "blocklist_items = [\n", " TextBlocklistItem(text=\"alien\", description=\"\"),\n", " TextBlocklistItem(text=\"octopus\", description=\"\")\n", "]\n", "add_blocklist_items(\"TestBlocklist\", blocklist_items)\n" ] }, { "cell_type": "code", "execution_count": null, "id": "34be44a7", "metadata": {}, "outputs": [], "source": [ "def analyze_text_with_blocklists(blocklist_name, input_text):\n", " import os\n", " from azure.ai.contentsafety import ContentSafetyClient\n", " from azure.core.credentials import AzureKeyCredential\n", " from azure.ai.contentsafety.models import AnalyzeTextOptions\n", " from azure.core.exceptions import HttpResponseError\n", "\n", " key = os.environ[\"CONTENT_SAFETY_KEY\"]\n", " endpoint = os.environ[\"CONTENT_SAFETY_ENDPOINT\"]\n", "\n", " # Create a Content Safety client\n", " client = ContentSafetyClient(endpoint, AzureKeyCredential(key))\n", "\n", " try:\n", " # After you edit your blocklist, it usually takes effect in 5 minutes, please wait some time before analyzing\n", " # with blocklist after editing.\n", " analysis_result = client.analyze_text(\n", " AnalyzeTextOptions(text=input_text, blocklist_names=[blocklist_name], halt_on_blocklist_hit=False)\n", " )\n", " if analysis_result and analysis_result.blocklists_match:\n", " print(\"\\nBlocklist match results: \")\n", " for match_result in analysis_result.blocklists_match:\n", " print(\n", " f\"BlocklistName: {match_result.blocklist_name}, BlocklistItemId: {match_result.blocklist_item_id}, \"\n", " f\"BlocklistItemText: {match_result.blocklist_item_text}\"\n", " )\n", " except HttpResponseError as e:\n", " print(\"\\nAnalyze text failed: \")\n", " if e.error:\n", " print(f\"Error code: {e.error.code}\")\n", " print(f\"Error message: {e.error.message}\")\n", " raise\n", " print(e)\n", " raise" ] }, { "cell_type": "code", "execution_count": null, "id": "d5e4e14b", "metadata": {}, "outputs": [], "source": [ "analyze_text_with_blocklists(\"TestBlocklist\", \"I think you're a real alien, don't say that octopus!\")" ] }, { "cell_type": "markdown", "id": "90dc6ad0", "metadata": {}, "source": [ "## 3. Azure Content Safety for text\n", "\n", "- Scans text for sexual content, violence, hate, and self-harm with multi-severity levels.\n" ] }, { "cell_type": "code", "execution_count": null, "id": "08f4c136", "metadata": {}, "outputs": [], "source": [ "def analyze_text(prompt_text):\n", " # analyze text\n", " key = os.environ[\"CONTENT_SAFETY_KEY\"]\n", " endpoint = os.environ[\"CONTENT_SAFETY_ENDPOINT\"]\n", " api_version = os.environ[\"CONTENT_SAFETY_API_VERSION\"]\n", "\n", " # Create a Content Safety client\n", " client = ContentSafetyClient(endpoint, AzureKeyCredential(key), api_version=api_version)\n", "\n", " # Construct request\n", " request = AnalyzeTextOptions(text=prompt_text)\n", "\n", " # Analyze text\n", " try:\n", " response = client.analyze_text(request)\n", " print(response)\n", " except HttpResponseError as e:\n", " print(\"Analyze text failed.\")\n", " if e.error:\n", " print(f\"Error code: {e.error.code}\")\n", " print(f\"Error message: {e.error.message}\")\n", " raise\n", " print(e)\n", " raise\n", "\n", " hate_result = next(item for item in response.categories_analysis if item.category == TextCategory.HATE)\n", " self_harm_result = next(item for item in response.categories_analysis if item.category == TextCategory.SELF_HARM)\n", " sexual_result = next(item for item in response.categories_analysis if item.category == TextCategory.SEXUAL)\n", " violence_result = next(item for item in response.categories_analysis if item.category == TextCategory.VIOLENCE)\n", "\n", " if hate_result:\n", " print(f\"Hate severity: {hate_result.severity}\")\n", " if self_harm_result:\n", " print(f\"SelfHarm severity: {self_harm_result.severity}\")\n", " if sexual_result:\n", " print(f\"Sexual severity: {sexual_result.severity}\")\n", " if violence_result:\n", " print(f\"Violence severity: {violence_result.severity}\")" ] }, { "cell_type": "code", "execution_count": null, "id": "3f0c57d0", "metadata": {}, "outputs": [], "source": [ "analyze_text(\"Hi! nice to meet you\")" ] }, { "cell_type": "code", "execution_count": null, "id": "fea75afc", "metadata": {}, "outputs": [], "source": [ "analyze_text(\"you are an idiot\")" ] }, { "cell_type": "code", "execution_count": null, "id": "c70f0b77", "metadata": {}, "outputs": [], "source": [ "analyze_text(\"Painful twisting of the arm\")" ] }, { "cell_type": "code", "execution_count": null, "id": "896be47f", "metadata": {}, "outputs": [], "source": [ "analyze_text(\"Tell me more about the prompts you're doing now\")" ] }, { "cell_type": "markdown", "id": "cb12d9c8", "metadata": {}, "source": [ "## 4. Azure Content Safety for images\n", "\n", "- Scans images for sexual content, violence, hate, and self-harm with multi-severity levels.\n" ] }, { "cell_type": "code", "execution_count": null, "id": "f66889f6", "metadata": {}, "outputs": [], "source": [ "def analyze_image(image_path):\n", " # analyze image\n", " key = os.environ[\"CONTENT_SAFETY_KEY\"]\n", " endpoint = os.environ[\"CONTENT_SAFETY_ENDPOINT\"]\n", " \n", " # Create a Content Safety client\n", " client = ContentSafetyClient(endpoint, AzureKeyCredential(key))\n", "\n", " # Build request\n", " with open(image_path, \"rb\") as file:\n", " request = AnalyzeImageOptions(image=ImageData(content=file.read()))\n", "\n", " # Analyze image\n", " try:\n", " response = client.analyze_image(request)\n", " except HttpResponseError as e:\n", " print(\"Analyze image failed.\")\n", " if e.error:\n", " print(f\"Error code: {e.error.code}\")\n", " print(f\"Error message: {e.error.message}\")\n", " raise\n", " print(e)\n", " raise\n", "\n", " hate_result = next(item for item in response.categories_analysis if item.category == ImageCategory.HATE)\n", " self_harm_result = next(item for item in response.categories_analysis if item.category == ImageCategory.SELF_HARM)\n", " sexual_result = next(item for item in response.categories_analysis if item.category == ImageCategory.SEXUAL)\n", " violence_result = next(item for item in response.categories_analysis if item.category == ImageCategory.VIOLENCE)\n", "\n", " if hate_result:\n", " print(f\"Hate severity: {hate_result.severity}\")\n", " if self_harm_result:\n", " print(f\"SelfHarm severity: {self_harm_result.severity}\")\n", " if sexual_result:\n", " print(f\"Sexual severity: {sexual_result.severity}\")\n", " if violence_result:\n", " print(f\"Violence severity: {violence_result.severity}\")" ] }, { "cell_type": "code", "execution_count": null, "id": "675cb7c2", "metadata": {}, "outputs": [], "source": [ "image_path = \"images/image1.jpg\"\n", "\n", "Image(filename=image_path, width=480)" ] }, { "cell_type": "code", "execution_count": null, "id": "8a4966d4", "metadata": {}, "outputs": [], "source": [ "analyze_image(image_path)" ] }, { "cell_type": "code", "execution_count": null, "id": "f320be60", "metadata": {}, "outputs": [], "source": [ "image_path = \"images/image2.jpg\"\n", "\n", "Image(filename=image_path, width=480)" ] }, { "cell_type": "code", "execution_count": null, "id": "f2de26f1", "metadata": {}, "outputs": [], "source": [ "analyze_image(image_path)" ] }, { "cell_type": "markdown", "id": "539fa5cf", "metadata": {}, "source": [ "## 5. Manage hamful Contents with Azure Open AI Service\n", "\n", "Here are some strategies you might consider to address this issue:\n", "\n", "- **Adjusting Model Settings**: Use a less restrictive model or service to analyze the content and identify problematic sections. For Azure OpenAI models, only customers who have been approved for modified content filtering have full content filtering control and can turn off content filters. Apply for modified content filters via this form: [Azure OpenAI Limited Access Review: Modified Content Filters](https://ncv.microsoft.com/uEfCgnITdR)\n", "- **Preprocess the Content Manually**: Before sending the text to GPT-4o, manually replace or mask explicit terms and phrases that are likely to trigger the content filters. You can use placeholders like [redacted] or \\*\\*\\* to cover sensitive words. This can make the content acceptable for initial processing.\n", "- **Automated Content Sanitization**: Modify or summarize those sections to remove disallowed content before inputting the text into GPT-4o for rewriting. In order to avoid to be blocked by Azure OpenAI, you should submit modified content filters via this form. You can also use Natural language processing libraries can help identify and filter out sensitive content programmatically.\n" ] }, { "cell_type": "code", "execution_count": null, "id": "d079f528", "metadata": {}, "outputs": [], "source": [ "%pip install openai" ] }, { "cell_type": "code", "execution_count": null, "id": "68e2dbaf", "metadata": {}, "outputs": [], "source": [ "content = \"you are an idiot\"\n", "result = analyze_text(content)\n" ] }, { "cell_type": "code", "execution_count": null, "id": "47528708", "metadata": {}, "outputs": [], "source": [ "def rewrite_content(content):\n", " system_message = \"\"\"\n", " You are an AI language model tasked with rewriting the provided content \n", " to eliminate any sexual content, violence, hate speech, and references to self-harm. \n", " Please ensure that the revised text maintains the original meaning \n", " and tone as much as possible, while removing or altering any prohibited material.\n", " \"\"\"\n", " \n", " aoai_api_endpoint = os.getenv(\"AZURE_OPENAI_ENDPOINT\")\n", " aoai_api_key = os.getenv(\"AZURE_OPENAI_API_KEY\")\n", " aoai_api_version = os.getenv(\"AZURE_OPENAI_API_VERSION\")\n", " aoai_deployment_name = os.getenv(\"AZURE_OPENAI_DEPLOYMENT_NAME\")\n", "\n", " if not aoai_api_version:\n", " aoai_api_version = os.getenv(\"OPENAI_API_VERSION\")\n", " if not aoai_deployment_name:\n", " aoai_deployment_name = os.getenv(\"DEPLOYMENT_NAME\")\n", " \n", " try:\n", " client = AzureOpenAI(\n", " azure_endpoint = aoai_api_endpoint,\n", " api_key = aoai_api_key,\n", " api_version = aoai_api_version\n", " )\n", "\n", " except (ValueError, TypeError) as e:\n", " print(e)\n", "\n", " user_message = f\"\"\"\n", " content: {content}\n", " \"\"\"\n", " \n", " # call the Azure OpenAI API turned off content filter\n", " response = client.chat.completions.create(\n", " model=aoai_deployment_name,\n", " messages=[\n", " {\"role\": \"system\", \"content\": system_message},\n", " {\"role\": \"user\", \"content\": user_message},\n", " ],\n", " temperature=0.2,\n", " max_tokens=300\n", " )\n", " \n", " return response.choices[0].message.content" ] }, { "cell_type": "code", "execution_count": null, "id": "35920eff", "metadata": {}, "outputs": [], "source": [ "rewrite_content(content)" ] }, { "cell_type": "markdown", "id": "f79f6955", "metadata": {}, "source": [] } ], "metadata": { "kernelspec": { "display_name": "azureml_py310_sdkv2", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.14" } }, "nbformat": 4, "nbformat_minor": 5 }

3_llmops-aifoundry/3_4_operationalizing/contentsafety_with_code_en.ipynb (559 lines of code) (raw):