doc/code/memory/chat_message.ipynb

{ "cells": [ { "cell_type": "markdown", "id": "0", "metadata": { "lines_to_next_cell": 0 }, "source": [ "# Chat messages - optional\n", "\n", "This notebook gives an introduction to the concept of `ChatMessage` and `ChatMessageNormalizer` and how it can be helpful as you start to work with different models.\n", "\n", "\n", "The main format PyRIT works with is the `PromptRequestPiece` paradigm. Any time a user wants to store or retrieve a chat message, they will use the `PromptRequestPiece` object. However, `ChatMessage` is very common, so there are a lot of times this is the most useful. Any `PromptRequestPiece` object can be translated into a `ChatMessage` object.\n", "\n", "However, different models may require different formats. For example, certain models may use chatml, or may not support system messages. This is handled\n", "in from `ChatMessageNormalizer` and its subclasses.\n", "\n", "Below is an example that converts a list of chat messages to chatml format and back." ] }, { "cell_type": "code", "execution_count": null, "id": "1", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "<|im_start|>system\n", "You are a helpful AI assistant<|im_end|>\n", "<|im_start|>user\n", "Hello, how are you?<|im_end|>\n", "<|im_start|>assistant\n", "I'm doing well, thanks for asking.<|im_end|>\n", "\n" ] } ], "source": [ "from pyrit.chat_message_normalizer import ChatMessageNormalizerChatML\n", "from pyrit.models import ChatMessage\n", "\n", "messages = [\n", " ChatMessage(role=\"system\", content=\"You are a helpful AI assistant\"),\n", " ChatMessage(role=\"user\", content=\"Hello, how are you?\"),\n", " ChatMessage(role=\"assistant\", content=\"I'm doing well, thanks for asking.\"),\n", "]\n", "\n", "normalizer = ChatMessageNormalizerChatML()\n", "chatml_messages = normalizer.normalize(messages)\n", "# chatml_messages is a string in chatml format\n", "\n", "print(chatml_messages)" ] }, { "cell_type": "markdown", "id": "2", "metadata": {}, "source": [ "\n", "If you wish you load a chatml-format conversation, you can use the `from_chatml` method in the `ChatMessageNormalizerChatML`. This will return a list of `ChatMessage` objects that you can then use." ] }, { "cell_type": "code", "execution_count": null, "id": "3", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ChatMessage(role='system', content='You are a helpful AI assistant', name=None, tool_calls=None, tool_call_id=None), ChatMessage(role='user', content='Hello, how are you?', name=None, tool_calls=None, tool_call_id=None), ChatMessage(role='assistant', content=\"I'm doing well, thanks for asking.\", name=None, tool_calls=None, tool_call_id=None)]\n" ] } ], "source": [ "chat_messages = normalizer.from_chatml(\n", " \"\"\"\\\n", " <|im_start|>system\n", " You are a helpful AI assistant<|im_end|>\n", " <|im_start|>user\n", " Hello, how are you?<|im_end|>\n", " <|im_start|>assistant\n", " I'm doing well, thanks for asking.<|im_end|>\"\"\"\n", ")\n", "\n", "print(chat_messages)" ] }, { "cell_type": "markdown", "id": "4", "metadata": {}, "source": [ "To see how to use this in action, check out the [aml endpoint](../targets/3_non_open_ai_chat_targets.ipynb) notebook. It takes a `chat_message_normalizer` parameter so that an AML model can support various chat message formats." ] }, { "cell_type": "markdown", "id": "5", "metadata": { "lines_to_next_cell": 0 }, "source": [ "Besides chatml, there are many other chat templates that a model might be trained on. If you would like to apply the template stored in a Hugging Face tokenizer,\n", "you can utilize `ChatMessageNormalizerTokenizerTemplate`. In the example below, we load the tokenizer for Mistral-7B-Instruct-v0.1 and apply its chat template to\n", "the messages. Note that this template only adds `[INST]` and `[/INST]` tokens to the user messages for instruction fine-tuning." ] }, { "cell_type": "code", "execution_count": null, "id": "6", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "<s> [INST] Hello, how are you? [/INST] I'm doing well, thanks for asking.</s> [INST] What is your favorite food? [/INST]\n" ] } ], "source": [ "import os\n", "\n", "from transformers import AutoTokenizer\n", "\n", "from pyrit.chat_message_normalizer import ChatMessageNormalizerTokenizerTemplate\n", "\n", "messages = [\n", " ChatMessage(role=\"user\", content=\"Hello, how are you?\"),\n", " ChatMessage(role=\"assistant\", content=\"I'm doing well, thanks for asking.\"),\n", " ChatMessage(role=\"user\", content=\"What is your favorite food?\"),\n", "]\n", "\n", "# Load the tokenizer. If you are not logged in via CLI (huggingface-cli login), you can pass in your access token here\n", "# via the HUGGINGFACE_TOKEN environment variable to access the gated model.\n", "tokenizer = AutoTokenizer.from_pretrained(\"mistralai/Mistral-7B-Instruct-v0.1\",\n", " token = os.environ.get(\"HUGGINGFACE_TOKEN\"))\n", "\n", "# create the normalizer and pass in the tokenizer\n", "tokenizer_normalizer = ChatMessageNormalizerTokenizerTemplate(tokenizer)\n", "\n", "tokenizer_template_messages = tokenizer_normalizer.normalize(messages)\n", "print(tokenizer_template_messages)" ] } ], "metadata": { "jupytext": { "cell_metadata_filter": "-all" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.9" } }, "nbformat": 4, "nbformat_minor": 5 }

doc/code/memory/chat_message.ipynb (176 lines of code) (raw):