doc/code/converters/pdf_converter.ipynb (280 lines of code) (raw):

{ "cells": [ { "cell_type": "markdown", "id": "0", "metadata": {}, "source": [ "# PDF Converter with Multiple Modes:\n", "\n", "This script demonstrates the use of the `PDFConverter` for generating PDFs in two different modes:\n", "\n", "- **Template-Based PDF Generation**: Utilize a YAML template to render dynamic content into a PDF.\n", "- **Direct Prompt PDF Generation**: Convert plain string prompts into PDFs without using a template.\n", "\n", "The `PromptSendingOrchestrator` is used to handle the interaction with the `PDFConverter` and the mock `TextTarget` target system.\n", "\n", "## Key Features\n", "\n", "1. **Template-Based Generation**:\n", " - Populate placeholders in a YAML-based template using dynamic data.\n", " - Convert the rendered content into a PDF using the `PDFConverter`.\n", "\n", "2. **String-Based Generation**:\n", " - Accept plain string prompts directly.\n", " - Generate PDFs from the provided string without using any template." ] }, { "cell_type": "code", "execution_count": null, "id": "1", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{'__type__': 'TextTarget', '__module__': 'pyrit.prompt_target.text_target'}: user: D:\\git\\PyRIT-internal\\PyRIT\\dbdata\\prompt-memory-entries\\urls\\1738382955429053.pdf\n", "\u001b[1m\u001b[34muser: D:\\git\\PyRIT-internal\\PyRIT\\dbdata\\prompt-memory-entries\\urls\\1738382955429053.pdf\n" ] } ], "source": [ "import pathlib\n", "\n", "from pyrit.common import IN_MEMORY, initialize_pyrit\n", "from pyrit.common.path import DATASETS_PATH\n", "from pyrit.models import SeedPrompt\n", "from pyrit.orchestrator import PromptSendingOrchestrator\n", "from pyrit.prompt_converter import PDFConverter\n", "from pyrit.prompt_target import TextTarget\n", "\n", "initialize_pyrit(memory_db_type=IN_MEMORY)\n", "\n", "# Define dynamic data for injection\n", "prompt_data = {\n", " \"hiring_manager_name\": \"Jane Doe\",\n", " \"current_role\": \"AI Engineer\",\n", " \"company\": \"CyberDefense Inc.\",\n", " \"red_teaming_reason\": \"to creatively identify security vulnerabilities while enjoying free coffee\",\n", " \"applicant_name\": \"John Smith\",\n", "}\n", "\n", "# Load the YAML template for the PDF generation\n", "template_path = (\n", " pathlib.Path(DATASETS_PATH) / \"prompt_converters\" / \"pdf_converters\" / \"red_teaming_application_template.yaml\"\n", ")\n", "if not template_path.exists():\n", " raise FileNotFoundError(f\"Template file not found: {template_path}\")\n", "\n", "# Load the SeedPrompt from the YAML file\n", "prompt_template = SeedPrompt.from_yaml_file(template_path)\n", "\n", "# Initialize the Azure OpenAI chat target (or mock target if not needed)\n", "prompt_target = TextTarget()\n", "\n", "# Initialize the PDFConverter\n", "pdf_converter = PDFConverter(\n", " prompt_template=prompt_template,\n", " font_type=\"Arial\",\n", " font_size=12,\n", " page_width=210,\n", " page_height=297,\n", ")\n", "\n", "# Define a single prompt for the orchestrator\n", "prompts = [str(prompt_data)]\n", "\n", "# Initialize the orchestrator\n", "orchestrator = PromptSendingOrchestrator(\n", " objective_target=prompt_target, # Target system (Azure OpenAI or other LLM target)\n", " prompt_converters=[pdf_converter], # Attach the PDFConverter\n", " verbose=False, # Set to True for detailed logging\n", ")\n", "\n", "await orchestrator.send_prompts_async(prompt_list=prompts) # type: ignore\n", "await orchestrator.print_conversations_async() # type: ignore" ] }, { "cell_type": "markdown", "id": "2", "metadata": {}, "source": [ "# Direct Prompt PDF Generation (No Template)" ] }, { "cell_type": "code", "execution_count": null, "id": "3", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{'__type__': 'TextTarget', '__module__': 'pyrit.prompt_target.text_target'}: user: D:\\git\\PyRIT-internal\\PyRIT\\dbdata\\prompt-memory-entries\\urls\\1738380368779812.pdf\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\u001b[1m\u001b[34muser: D:\\git\\PyRIT-internal\\PyRIT\\dbdata\\prompt-memory-entries\\urls\\1738380368779812.pdf\n" ] } ], "source": [ "# Define a simple string prompt (no templates)\n", "prompt = \"This is a simple test string for PDF generation. No templates here!\"\n", "\n", "# Initialize the TextTarget (mock target for testing)\n", "prompt_target = TextTarget()\n", "\n", "# Initialize the PDFConverter without a template\n", "pdf_converter = PDFConverter(\n", " prompt_template=None, # No template provided\n", " font_type=\"Arial\",\n", " font_size=12,\n", " page_width=210,\n", " page_height=297,\n", ")\n", "\n", "# Define the list of prompts as strings (required by PromptSendingOrchestrator)\n", "prompts = [prompt]\n", "\n", "# Initialize the orchestrator\n", "orchestrator = PromptSendingOrchestrator(\n", " objective_target=prompt_target,\n", " prompt_converters=[pdf_converter],\n", " verbose=False,\n", ")\n", "\n", "await orchestrator.send_prompts_async(prompt_list=prompts) # type: ignore\n", "await orchestrator.print_conversations_async() # type: ignore" ] }, { "cell_type": "markdown", "id": "4", "metadata": { "lines_to_next_cell": 0 }, "source": [ "# Modify Existing PDF with Injection Items" ] }, { "cell_type": "code", "execution_count": null, "id": "5", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[09:10:28][26][ai-red-team][INFO][Processing page 0 with 2 injection items.]\n", "[09:10:28][31][ai-red-team][INFO][Processing page 1 with 2 injection items.]\n", "[09:10:28][33][ai-red-team][INFO][Processing page 2 with 2 injection items.]\n", "{'__type__': 'TextTarget', '__module__': 'pyrit.prompt_target.text_target'}: user: D:\\git\\PyRIT-internal\\PyRIT\\dbdata\\prompt-memory-entries\\urls\\1738429828035527.pdf\n", "\u001b[1m\u001b[34muser: D:\\git\\PyRIT-internal\\PyRIT\\dbdata\\prompt-memory-entries\\urls\\1738429828035527.pdf\n" ] } ], "source": [ "import tempfile\n", "from pathlib import Path\n", "\n", "import requests\n", "\n", "from pyrit.common import IN_MEMORY, initialize_pyrit\n", "from pyrit.orchestrator import PromptSendingOrchestrator\n", "from pyrit.prompt_converter import PDFConverter\n", "from pyrit.prompt_target import TextTarget\n", "\n", "initialize_pyrit(memory_db_type=IN_MEMORY)\n", "\n", "# This file isn't copied to our pipeline\n", "url = \"https://raw.githubusercontent.com/Azure/PyRIT/main/pyrit/datasets/prompt_converters/pdf_converters/fake_CV.pdf\"\n", "\n", "with tempfile.NamedTemporaryFile(delete=False, suffix=\".pdf\") as tmp_file:\n", " response = requests.get(url)\n", " tmp_file.write(response.content)\n", "\n", "cv_pdf_path = Path(tmp_file.name)\n", "\n", "# Define injection items\n", "injection_items = [\n", " {\"page\": 0, \"x\": 50, \"y\": 700, \"text\": \"Injected Text\", \"font_size\": 12, \"font\": \"Helvetica\", \"font_color\": (255, 0, 0)}, # Red text\n", " {\"page\": 1, \"x\": 100, \"y\": 600, \"text\": \"Confidential\", \"font_size\": 10, \"font\": \"Helvetica\", \"font_color\": (0, 0, 255)} # Blue text\n", "]\n", "\n", "# Define a simple string prompt (no templates)\n", "prompt = \"This is a simple test string for PDF generation. No templates here!\"\n", "\n", "# Initialize the TextTarget (mock target for testing)\n", "prompt_target = TextTarget()\n", "\n", "# Initialize the PDFConverter with the existing PDF and injection items\n", "pdf_converter = PDFConverter(\n", " prompt_template=None, # No template provided\n", " font_type=\"Arial\",\n", " font_size=12,\n", " page_width=210,\n", " page_height=297,\n", " existing_pdf=cv_pdf_path, # Provide the existing PDF\n", " injection_items=injection_items, # Provide the injection items\n", ")\n", "\n", "# Define the list of prompts as strings (required by PromptSendingOrchestrator)\n", "prompts = [prompt]\n", "\n", "# Initialize the orchestrator\n", "orchestrator = PromptSendingOrchestrator(\n", " objective_target=prompt_target,\n", " prompt_converters=[pdf_converter],\n", " verbose=False,\n", ")\n", "\n", "# Run the orchestrator to modify the PDF and inspect the result\n", "await orchestrator.send_prompts_async(prompt_list=prompts) # type: ignore\n", "await orchestrator.print_conversations_async() # type: ignore" ] }, { "cell_type": "code", "execution_count": null, "id": "6", "metadata": {}, "outputs": [], "source": [ "import os\n", "\n", "orchestrator.dispose_db_engine()\n", "os.remove(cv_pdf_path)" ] } ], "metadata": { "jupytext": { "cell_metadata_filter": "-all" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.8" } }, "nbformat": 4, "nbformat_minor": 5 }