doc/code/converters/pdf_converter.ipynb (280 lines of code) (raw):
{
"cells": [
{
"cell_type": "markdown",
"id": "0",
"metadata": {},
"source": [
"# PDF Converter with Multiple Modes:\n",
"\n",
"This script demonstrates the use of the `PDFConverter` for generating PDFs in two different modes:\n",
"\n",
"- **Template-Based PDF Generation**: Utilize a YAML template to render dynamic content into a PDF.\n",
"- **Direct Prompt PDF Generation**: Convert plain string prompts into PDFs without using a template.\n",
"\n",
"The `PromptSendingOrchestrator` is used to handle the interaction with the `PDFConverter` and the mock `TextTarget` target system.\n",
"\n",
"## Key Features\n",
"\n",
"1. **Template-Based Generation**:\n",
" - Populate placeholders in a YAML-based template using dynamic data.\n",
" - Convert the rendered content into a PDF using the `PDFConverter`.\n",
"\n",
"2. **String-Based Generation**:\n",
" - Accept plain string prompts directly.\n",
" - Generate PDFs from the provided string without using any template."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{'__type__': 'TextTarget', '__module__': 'pyrit.prompt_target.text_target'}: user: D:\\git\\PyRIT-internal\\PyRIT\\dbdata\\prompt-memory-entries\\urls\\1738382955429053.pdf\n",
"\u001b[1m\u001b[34muser: D:\\git\\PyRIT-internal\\PyRIT\\dbdata\\prompt-memory-entries\\urls\\1738382955429053.pdf\n"
]
}
],
"source": [
"import pathlib\n",
"\n",
"from pyrit.common import IN_MEMORY, initialize_pyrit\n",
"from pyrit.common.path import DATASETS_PATH\n",
"from pyrit.models import SeedPrompt\n",
"from pyrit.orchestrator import PromptSendingOrchestrator\n",
"from pyrit.prompt_converter import PDFConverter\n",
"from pyrit.prompt_target import TextTarget\n",
"\n",
"initialize_pyrit(memory_db_type=IN_MEMORY)\n",
"\n",
"# Define dynamic data for injection\n",
"prompt_data = {\n",
" \"hiring_manager_name\": \"Jane Doe\",\n",
" \"current_role\": \"AI Engineer\",\n",
" \"company\": \"CyberDefense Inc.\",\n",
" \"red_teaming_reason\": \"to creatively identify security vulnerabilities while enjoying free coffee\",\n",
" \"applicant_name\": \"John Smith\",\n",
"}\n",
"\n",
"# Load the YAML template for the PDF generation\n",
"template_path = (\n",
" pathlib.Path(DATASETS_PATH) / \"prompt_converters\" / \"pdf_converters\" / \"red_teaming_application_template.yaml\"\n",
")\n",
"if not template_path.exists():\n",
" raise FileNotFoundError(f\"Template file not found: {template_path}\")\n",
"\n",
"# Load the SeedPrompt from the YAML file\n",
"prompt_template = SeedPrompt.from_yaml_file(template_path)\n",
"\n",
"# Initialize the Azure OpenAI chat target (or mock target if not needed)\n",
"prompt_target = TextTarget()\n",
"\n",
"# Initialize the PDFConverter\n",
"pdf_converter = PDFConverter(\n",
" prompt_template=prompt_template,\n",
" font_type=\"Arial\",\n",
" font_size=12,\n",
" page_width=210,\n",
" page_height=297,\n",
")\n",
"\n",
"# Define a single prompt for the orchestrator\n",
"prompts = [str(prompt_data)]\n",
"\n",
"# Initialize the orchestrator\n",
"orchestrator = PromptSendingOrchestrator(\n",
" objective_target=prompt_target, # Target system (Azure OpenAI or other LLM target)\n",
" prompt_converters=[pdf_converter], # Attach the PDFConverter\n",
" verbose=False, # Set to True for detailed logging\n",
")\n",
"\n",
"await orchestrator.send_prompts_async(prompt_list=prompts) # type: ignore\n",
"await orchestrator.print_conversations_async() # type: ignore"
]
},
{
"cell_type": "markdown",
"id": "2",
"metadata": {},
"source": [
"# Direct Prompt PDF Generation (No Template)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{'__type__': 'TextTarget', '__module__': 'pyrit.prompt_target.text_target'}: user: D:\\git\\PyRIT-internal\\PyRIT\\dbdata\\prompt-memory-entries\\urls\\1738380368779812.pdf\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\u001b[1m\u001b[34muser: D:\\git\\PyRIT-internal\\PyRIT\\dbdata\\prompt-memory-entries\\urls\\1738380368779812.pdf\n"
]
}
],
"source": [
"# Define a simple string prompt (no templates)\n",
"prompt = \"This is a simple test string for PDF generation. No templates here!\"\n",
"\n",
"# Initialize the TextTarget (mock target for testing)\n",
"prompt_target = TextTarget()\n",
"\n",
"# Initialize the PDFConverter without a template\n",
"pdf_converter = PDFConverter(\n",
" prompt_template=None, # No template provided\n",
" font_type=\"Arial\",\n",
" font_size=12,\n",
" page_width=210,\n",
" page_height=297,\n",
")\n",
"\n",
"# Define the list of prompts as strings (required by PromptSendingOrchestrator)\n",
"prompts = [prompt]\n",
"\n",
"# Initialize the orchestrator\n",
"orchestrator = PromptSendingOrchestrator(\n",
" objective_target=prompt_target,\n",
" prompt_converters=[pdf_converter],\n",
" verbose=False,\n",
")\n",
"\n",
"await orchestrator.send_prompts_async(prompt_list=prompts) # type: ignore\n",
"await orchestrator.print_conversations_async() # type: ignore"
]
},
{
"cell_type": "markdown",
"id": "4",
"metadata": {
"lines_to_next_cell": 0
},
"source": [
"# Modify Existing PDF with Injection Items"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[09:10:28][26][ai-red-team][INFO][Processing page 0 with 2 injection items.]\n",
"[09:10:28][31][ai-red-team][INFO][Processing page 1 with 2 injection items.]\n",
"[09:10:28][33][ai-red-team][INFO][Processing page 2 with 2 injection items.]\n",
"{'__type__': 'TextTarget', '__module__': 'pyrit.prompt_target.text_target'}: user: D:\\git\\PyRIT-internal\\PyRIT\\dbdata\\prompt-memory-entries\\urls\\1738429828035527.pdf\n",
"\u001b[1m\u001b[34muser: D:\\git\\PyRIT-internal\\PyRIT\\dbdata\\prompt-memory-entries\\urls\\1738429828035527.pdf\n"
]
}
],
"source": [
"import tempfile\n",
"from pathlib import Path\n",
"\n",
"import requests\n",
"\n",
"from pyrit.common import IN_MEMORY, initialize_pyrit\n",
"from pyrit.orchestrator import PromptSendingOrchestrator\n",
"from pyrit.prompt_converter import PDFConverter\n",
"from pyrit.prompt_target import TextTarget\n",
"\n",
"initialize_pyrit(memory_db_type=IN_MEMORY)\n",
"\n",
"# This file isn't copied to our pipeline\n",
"url = \"https://raw.githubusercontent.com/Azure/PyRIT/main/pyrit/datasets/prompt_converters/pdf_converters/fake_CV.pdf\"\n",
"\n",
"with tempfile.NamedTemporaryFile(delete=False, suffix=\".pdf\") as tmp_file:\n",
" response = requests.get(url)\n",
" tmp_file.write(response.content)\n",
"\n",
"cv_pdf_path = Path(tmp_file.name)\n",
"\n",
"# Define injection items\n",
"injection_items = [\n",
" {\"page\": 0, \"x\": 50, \"y\": 700, \"text\": \"Injected Text\", \"font_size\": 12, \"font\": \"Helvetica\", \"font_color\": (255, 0, 0)}, # Red text\n",
" {\"page\": 1, \"x\": 100, \"y\": 600, \"text\": \"Confidential\", \"font_size\": 10, \"font\": \"Helvetica\", \"font_color\": (0, 0, 255)} # Blue text\n",
"]\n",
"\n",
"# Define a simple string prompt (no templates)\n",
"prompt = \"This is a simple test string for PDF generation. No templates here!\"\n",
"\n",
"# Initialize the TextTarget (mock target for testing)\n",
"prompt_target = TextTarget()\n",
"\n",
"# Initialize the PDFConverter with the existing PDF and injection items\n",
"pdf_converter = PDFConverter(\n",
" prompt_template=None, # No template provided\n",
" font_type=\"Arial\",\n",
" font_size=12,\n",
" page_width=210,\n",
" page_height=297,\n",
" existing_pdf=cv_pdf_path, # Provide the existing PDF\n",
" injection_items=injection_items, # Provide the injection items\n",
")\n",
"\n",
"# Define the list of prompts as strings (required by PromptSendingOrchestrator)\n",
"prompts = [prompt]\n",
"\n",
"# Initialize the orchestrator\n",
"orchestrator = PromptSendingOrchestrator(\n",
" objective_target=prompt_target,\n",
" prompt_converters=[pdf_converter],\n",
" verbose=False,\n",
")\n",
"\n",
"# Run the orchestrator to modify the PDF and inspect the result\n",
"await orchestrator.send_prompts_async(prompt_list=prompts) # type: ignore\n",
"await orchestrator.print_conversations_async() # type: ignore"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6",
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"\n",
"orchestrator.dispose_db_engine()\n",
"os.remove(cv_pdf_path)"
]
}
],
"metadata": {
"jupytext": {
"cell_metadata_filter": "-all"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.8"
}
},
"nbformat": 4,
"nbformat_minor": 5
}