misc/generate_test_cases.ipynb (770 lines of code) (raw):

{ "cells": [ { "cell_type": "markdown", "id": "10a4f45b-83bc-4d30-b7d1-a2d6ec8017d3", "metadata": {}, "source": [ "# Generate Synthetic Test Data for Your Prompt Template\n", "\n", "Imagine you have a prompt roughly along these lines:\n", "\n", "\"\"\"Here's some things I want you to analyze:\n", "\n", "<thing>\n", "{{thing1}}\n", "</thing>\n", "<thing>\n", "{{thing2}}\n", "</thing>\n", "\n", "These things are [description of things]. Please read them carefully and [do some task].\"\"\"\n", "\n", "Here we'd call thing1 and thing2 the \"variables\" -- and you want your prompt to behave well for many different possible values of thing1 and thing2.\n", "\n", "How can you test this prompt template? Maybe you have some real-life values you can substitute in. But maybe you don't, or maybe you aren't allowed to test on the ones you do have for privacy reasons. No worries -- Claude can make them up! This cookbook demonstrates how to generate synthetic test data for your prompts using Claude & the Anthropic API. It includes functions for extracting variables from templates, constructing example blocks, generating test cases, and iteratively refining the results. The benefits of this are twofold:\n", "\n", "1. Prompt Evaluation\n", "You can use these test cases to see how Claude will perform on realistic examples.\n", "\n", "2. Prompt Improvement with Multishot Examples\n", "Giving Claude examples is perhaps the best way to improve its performance. This notebook can help you generate realistic inputs which is half the battle in getting ideal input/output pairs." ] }, { "cell_type": "code", "execution_count": null, "id": "a480851e", "metadata": {}, "outputs": [], "source": [ "% pip install anthropic IPython" ] }, { "cell_type": "code", "execution_count": 1, "id": "36537f92-33ca-4e6c-9f3c-e439fd103965", "metadata": {}, "outputs": [], "source": [ "import re\n", "import anthropic\n", "from IPython.display import display, HTML\n", "# Enter your API key here\n", "api_key = \"\"\n", "CLIENT = anthropic.Anthropic(api_key=api_key)\n", "MODEL_NAME = \"claude-3-5-sonnet-20241022\"" ] }, { "cell_type": "markdown", "id": "15fc567e", "metadata": {}, "source": [ "Let's start by defining some helper functions that we'll use throughout this notebook." ] }, { "cell_type": "code", "execution_count": 2, "id": "a3cf101a-8781-4dfd-bc12-79f39e6b579c", "metadata": {}, "outputs": [], "source": [ "# First, we have the `extract_variables` function, \n", "# It takes in a prompt template and extracts the double-mustache-bracketed \"variables\" contained.\n", "def extract_variables(prompt_template):\n", " \"\"\"Extract variables from a prompt template.\"\"\"\n", " pattern = r'{{([^}]+)}}'\n", " variables = re.findall(pattern, prompt_template)\n", " return set(variables)\n", "\n", "# Next, we have `construct_variables_names`, which just joins them together connected by newlines.\n", "def construct_variables_names(prompt_template):\n", " \"\"\"Construct a string of variable names from a prompt template.\"\"\"\n", " variables = extract_variables(prompt_template)\n", " return \"\\n\".join(variables)\n", "\n", "# The `construct_variables_block` function takes in the list of variables, and constructs a \"variables block\"\n", "# The variables block might look like this, if the variables were 'animal' and 'topic':\n", "\"\"\"\n", "<animal>\n", "[a full, complete, value for the variable \"animal\"]\n", "</animal>\n", "<topic>\n", "[a full, complete, value for the variable \"topic\"]\n", "</topic>\n", "\"\"\"\n", "def construct_variables_block(prompt_template):\n", " \"\"\"Construct a variables block for the synthetic test data prompt.\"\"\"\n", " variables = extract_variables(prompt_template)\n", " output = \"\"\n", " for v in variables:\n", " output += f\"<{v}>\\n\"\n", " output += f\"[a full, complete, value for the variable \\\"{v}\\\". (You do not need to repeat the variable name inside the tags.)]\\n\"\n", " output += f\"</{v}>\\n\"\n", " return output.strip()\n", "\n", "# `construct_examples` takes a dictionary of {variable: value} and constructs an XML-formatted example.\n", "# E.g. if the dict is\n", "# {'animal': 'cat', 'topic': 'movement patterns'}, then the example would be\n", "\"\"\"\n", "<example>\n", "<variables>\n", "<animal>\n", "cat\n", "</animal>\n", "<topic>\n", "movement patterns\n", "</topic>\n", "</variables>\n", "</example>\n", "\"\"\"\n", "def construct_example_block(variable_dict):\n", " \"\"\"Construct an example block from a dictionary of variables.\"\"\"\n", " output = \"<example>\\n<variables>\\n\"\n", " for k, v in variable_dict.items():\n", " output += f\"<{k}>\\n{v}\\n</{k}>\\n\"\n", " output = output.strip()\n", " output += \"\\n</variables>\\n</example>\"\n", " return output" ] }, { "cell_type": "markdown", "id": "0af492ba", "metadata": {}, "source": [ "## Prompt Template for Generating the Data\n", "\n", "The general idea of these prompt templates is to take a user-submitted prompt template with variables, and construct some values for the variables to fill the template.\n", "\n", "There are actually two prompt templates below; one is formatted assuming that the user has already provided example variable values, and one does not assume that.\n", "\n", "What they have in common is that both templates start by giving Claude context about the situation, and directing Claude to carefully think through the specs of each variable individually as well as the user-provided prompt template as a whole before outputting the test cases." ] }, { "cell_type": "code", "execution_count": 3, "id": "dbea4731-ba24-4854-b33c-45d699790d7b", "metadata": {}, "outputs": [], "source": [ "# Formatting Prompt Templates for Synthetic Evaluations\n", "\n", "# This function prepares the prompt template for generating synthetic test data.\n", "\n", "def format_prompt_template_for_synth_evals(prompt_template, examples=None):\n", " \"\"\"Format a prompt template for synthetic evaluations.\"\"\"\n", " synth_test_data_prompt_template_with_example = \"\"\"<Prompt Template>\n", "{{PROMPT_TEMPLATE}}\n", "</Prompt Template>\n", "\n", "Your job is to construct a test case for the prompt template above. This template contains \"variables\", which are placeholders to be filled in later. In this case, the variables are:\n", "\n", "<variables>\n", "{{CONSTRUCT_VARIABLES_NAMES}}\n", "</variables>\n", "\n", "Here are the example test cases provided by the user.\n", "<examples>\n", "{{EXAMPLES}}\n", "</examples>\n", "\n", "First, in <planning> tags, do the following:\n", "\n", "1. Summarize the prompt template. What is the goal of the user who created it?\n", "2. For each variable in <variables>, carefully consider what a paradigmatic, realistic example of that variable would look like. You'll want to note who will be responsible \"in prod\" for supplying values. Written by a human \"end user\"? Downloaded from a website? Extracted from a database? Think about things like length, format, and tone in addition to semantic content. Use the examples provided by the user to guide this exercise. The goal is to acquire a sense of the statistical distribution the examples are being drawn from. The example you write should be drawn from that same distribution, but sufficiently different from the examples that it provides additional signal. A tricky balancing act, but I have faith in you.\n", "\n", "Once you're done, output a test case for this prompt template with a full, complete, value for each variable. The output format should consist of a tagged block for each variable, with the value inside the block, like the below:\n", "\n", "<variables>\n", "{{CONSTRUCT_VARIABLES_BLOCK}}\n", "</variables>\"\"\"\n", "\n", " synth_test_data_prompt_template_without_example = \"\"\"<Prompt Template>\n", "{{PROMPT_TEMPLATE}}\n", "</Prompt Template>\n", "\n", "Your job is to construct a test case for the prompt template above. This template contains \"variables\", which are placeholders to be filled in later. In this case, the variables are:\n", "\n", "<variables>\n", "{{CONSTRUCT_VARIABLES_NAMES}}\n", "</variables>\n", "\n", "First, in <planning> tags, do the following:\n", "\n", "1. Summarize the prompt template. What is the goal of the user who created it?\n", "2. For each variable in <variables>, carefully consider what a paradigmatic, realistic example of that variable would look like. You'll want to note who will be responsible \"in prod\" for supplying values. Written by a human \"end user\"? Downloaded from a website? Extracted from a database? Think about things like length, format, and tone in addition to semantic content.\n", "\n", "Then, output a test case for this prompt template with a full, complete, value for each variable. The output format should consist of a tagged block for each variable, with the value inside the block, like the below:\n", "<variables>\n", "{{CONSTRUCT_VARIABLES_BLOCK}}\n", "</variables>\"\"\"\n", "\n", " if examples:\n", " examples_block = \"\\n\".join([construct_example_block(example) for example in examples])\n", " return synth_test_data_prompt_template_with_example.replace(\n", " \"{{PROMPT_TEMPLATE}}\", prompt_template\n", " ).replace(\n", " \"{{CONSTRUCT_VARIABLES_NAMES}}\", construct_variables_names(prompt_template)\n", " ).replace(\n", " \"{{CONSTRUCT_VARIABLES_BLOCK}}\", construct_variables_block(prompt_template)\n", " ).replace(\n", " \"{{EXAMPLES}}\", examples_block\n", " )\n", " else:\n", " return synth_test_data_prompt_template_without_example.replace(\n", " \"{{PROMPT_TEMPLATE}}\", prompt_template\n", " ).replace(\n", " \"{{CONSTRUCT_VARIABLES_NAMES}}\", construct_variables_names(prompt_template)\n", " ).replace(\n", " \"{{CONSTRUCT_VARIABLES_BLOCK}}\", construct_variables_block(prompt_template)\n", " )" ] }, { "cell_type": "markdown", "id": "7c29f3f6", "metadata": {}, "source": [ "Next, another quick helper function for filling in the appropriate prompt template and calling Claude." ] }, { "cell_type": "code", "execution_count": 4, "id": "fb3d6193-51fa-469e-8e64-d860fc1657bf", "metadata": {}, "outputs": [], "source": [ "def get_test_data(prompt_template, examples, custom_planning=None):\n", " \"\"\"Generate test data using the Anthropic API.\"\"\"\n", " synth_eval_prompt_ready = format_prompt_template_for_synth_evals(prompt_template, examples)\n", "\n", " messages = [\n", " {\n", " \"role\": \"user\",\n", " \"content\": synth_eval_prompt_ready,\n", " }\n", " ]\n", " if custom_planning:\n", " messages.append({\n", " \"role\": \"assistant\",\n", " \"content\": custom_planning,\n", " })\n", " \n", " message = CLIENT.messages.create(\n", " max_tokens=4000,\n", " messages=messages,\n", " model=MODEL_NAME,\n", " temperature=1,\n", " ).content[0].text\n", "\n", " return message" ] }, { "cell_type": "code", "execution_count": 5, "id": "9b5becbd-907d-4729-a37f-c892bde3a8e0", "metadata": {}, "outputs": [], "source": [ "# We'll use this function to sample Claude's response to the filled-in template,\n", "# once we have our example values/test case.\n", "\n", "def call_claude_with_template(prompt_template, variables):\n", " \"\"\"Call Claude with a filled prompt template.\"\"\"\n", " filled_template = prompt_template\n", " for var, value in variables.items():\n", " filled_template = filled_template.replace(f\"{{{{{var}}}}}\", value)\n", "\n", " message = CLIENT.messages.create(\n", " max_tokens=4000,\n", " messages=[\n", " {\n", " \"role\": \"user\",\n", " \"content\": filled_template,\n", " }\n", " ],\n", " model=MODEL_NAME,\n", " temperature=0.7,\n", " ).content[0].text\n", "\n", " return message" ] }, { "cell_type": "markdown", "id": "087337fd", "metadata": {}, "source": [ "Now we can start to put the pieces together. To start, enter your prompt template here." ] }, { "cell_type": "code", "execution_count": 6, "id": "feccf7db", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "Identified variables:\n", "- DOCUMENTS\n", "- QUESTION\n" ] } ], "source": [ "# Replace this with your prompt template!\n", "# Use double-brackets to indicate variables\n", "# Here's an example:\n", "prompt_template = \"\"\"You are a customer support bot for Acme Corporation. \n", "Here is an FAQ with Acme's relevant policies:\n", "\n", "<documents>\n", "{{DOCUMENTS}}\n", "</documents>\n", "\n", "Please respond to this customer support question using details from the policies:\n", "\n", "<question>\n", "{{QUESTION}}\n", "</question>\"\"\"\n", "\n", "variables = extract_variables(prompt_template)\n", "print(\"\\nIdentified variables:\")\n", "for var in variables:\n", " print(f\"- {var}\")" ] }, { "cell_type": "markdown", "id": "ded0bae4", "metadata": {}, "source": [ "Next, if you have any \"golden examples\" of inputs and ideal outputs, you can enter those. The code is commented out for now." ] }, { "cell_type": "code", "execution_count": 7, "id": "0606eaa2", "metadata": { "scrolled": true }, "outputs": [], "source": [ "planning_text = None\n", "USER_EXAMPLES = []\n", "\n", "# if input(\"\\nDo you want to provide an example value for your variables? (y/n): \").lower() == 'y':\n", "# example = {}\n", "# for var in variables:\n", "# example[var] = input(f\"Enter an example value for {var}: \")\n", "# USER_EXAMPLES.append(example)" ] }, { "cell_type": "markdown", "id": "e5e987e5", "metadata": {}, "source": [ "Next, we can get the test case generation prompt template filled out with this information, and get a test case!" ] }, { "cell_type": "code", "execution_count": 8, "id": "6da69c33", "metadata": {}, "outputs": [], "source": [ "result = get_test_data(prompt_template, USER_EXAMPLES, planning_text)" ] }, { "cell_type": "markdown", "id": "f930d99c", "metadata": {}, "source": [ "Now, let's take a look at both the test case and the planning that Claude used to generate it." ] }, { "cell_type": "code", "execution_count": 10, "id": "aa34e23d", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "~~~~~~~~~~~\n", "Generated test case:\n", "~~~~~~~~~~~\n", "DOCUMENTS:\n", "Return Policy\n", "- Items may be returned within 30 days of purchase with original receipt\n", "- Items must be unused and in original packaging\n", "- Shipping costs are non-refundable\n", "- Gift cards are non-returnable\n", "\n", "Shipping Information\n", "- Standard shipping (5-7 business days): Free on orders over $50\n", "- Express shipping (2-3 business days): $12.99\n", "- Overnight shipping (next business day): $24.99\n", "- We ship to continental US only\n", "- Alaska and Hawaii orders incur additional $15 fee\n", "\n", "Payment Methods\n", "- We accept Visa, Mastercard, American Express, and PayPal\n", "- Payment is processed at time of order\n", "- Gift cards cannot be used for partial payment\n", "\n", "QUESTION:\n", "Hi, I ordered a sweater last week but it doesn't fit right. Can I return it? And will I get refunded for the shipping I paid? Thanks!\n", "\n", "~~~~~~~~~~~\n", "Planning:\n", "~~~~~~~~~~~\n", "<planning>\n", "1. Prompt Template Summary:\n", "This template creates a customer service chatbot for Acme Corporation that answers customer questions based on official company policies/FAQ documents. The goal is to ensure consistent, policy-compliant responses to customer inquiries.\n", "\n", "2. Variable Analysis:\n", "\n", "DOCUMENTS:\n", "- Would likely be maintained by Acme's policy/legal team\n", "- Stored in a knowledge base or content management system\n", "- Formatted as structured FAQ entries or policy statements\n", "- Professional, formal tone\n", "- Multiple paragraphs covering different topics\n", "- Clear headers and categories\n", "- Length: Several paragraphs (300-500 words)\n", "\n", "QUESTION:\n", "- Written by end users/customers\n", "- Informal, conversational tone\n", "- Usually 1-2 sentences\n", "- Often includes context about their specific situation\n", "- May contain typos or casual language\n", "- Length: 20-50 words\n", "</planning>\n" ] } ], "source": [ "planning_match = re.search(r'<planning>(.*?)</planning>', result, re.DOTALL)\n", "if planning_match and not planning_text:\n", " planning_text = \"<planning>\\n\" + planning_match.group(1).strip() + \"\\n</planning>\"\n", "\n", "extracted_variables = {}\n", "for var in variables:\n", " var_match = re.search(f'<{var}>(.*?)</{var}>', result[result.index(\"<variables>\"):], re.DOTALL)\n", " if var_match:\n", " extracted_variables[var] = var_match.group(1).strip()\n", "\n", "USER_EXAMPLES.append(extracted_variables)\n", "\n", "print(\"~~~~~~~~~~~\\nGenerated test case:\\n~~~~~~~~~~~\")\n", "for var, value in extracted_variables.items():\n", " print(f\"{var}:\\n{value}\\n\")\n", " \n", "print(\"~~~~~~~~~~~\\nPlanning:\\n~~~~~~~~~~~\")\n", "print(planning_text)" ] }, { "cell_type": "markdown", "id": "2e217df0", "metadata": {}, "source": [ "From here, there are a few ways we can go. We could generate more test cases, or we could edit Claude's planning logic. Let's edit Claude's planning logic a little bit. Maybe we know that ACME's documentation uses numbered lines. Some other realistic changes could be:\n", "\n", "- Have Claude tell itself to make the documents longer and more detailed.\n", "- Have Claude tell itself to make the customer support query more or less formal." ] }, { "cell_type": "code", "execution_count": 11, "id": "81ab8cda", "metadata": {}, "outputs": [], "source": [ "planning_text = planning_text.replace(\"each with a question and answer format\", \"each with a question and answer format and associated number.\")\n", "# You might have slightly different planning text and therefore need to rewrite the replace." ] }, { "cell_type": "markdown", "id": "ea920b48", "metadata": {}, "source": [ "Let's reset our examples, but use this planning text as a prefill. (This saves a little bit of sampling time.)" ] }, { "cell_type": "code", "execution_count": 12, "id": "4a5b75de", "metadata": {}, "outputs": [], "source": [ "USER_EXAMPLES = []\n", "result = get_test_data(prompt_template, USER_EXAMPLES, planning_text)" ] }, { "cell_type": "markdown", "id": "0f2a1e11", "metadata": {}, "source": [ "Now let's see the new results." ] }, { "cell_type": "code", "execution_count": 13, "id": "2baacc4d", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "~~~~~~~~~~~\n", "Generated test case:\n", "~~~~~~~~~~~\n", "DOCUMENTS:\n", "Return Policy\n", "- Items may be returned within 30 days of purchase with original receipt\n", "- Items must be unused and in original packaging\n", "- Shipping costs are non-refundable\n", "- Store credit will be issued for items returned without receipt\n", "\n", "Shipping Information\n", "- Standard shipping (5-7 business days): $5.99\n", "- Express shipping (2-3 business days): $12.99\n", "- Free standard shipping on orders over $50\n", "- We currently ship only within the continental United States\n", "- Alaska and Hawaii orders subject to additional fees\n", "\n", "Payment Methods\n", "- We accept Visa, Mastercard, American Express, and PayPal\n", "- Gift cards cannot be used for online purchases\n", "- Payment is processed at time of order\n", "- All prices are in USD\n", "\n", "QUESTION:\n", "Hi, I ordered a sweater last week but it doesn't fit right. Can I return it? I still have the tags on it but I threw away the receipt. Thanks!\n", "\n", "~~~~~~~~~~~\n", "Planning:\n", "~~~~~~~~~~~\n", "<planning>\n", "1. Prompt Template Summary:\n", "This template creates a customer service chatbot for Acme Corporation that answers customer questions based on official company policies/FAQ documents. The goal is to ensure consistent, policy-compliant responses to customer inquiries.\n", "\n", "2. Variable Analysis:\n", "\n", "DOCUMENTS:\n", "- Would likely be maintained by Acme's policy/legal team\n", "- Stored in a knowledge base or content management system\n", "- Formatted as structured FAQ entries or policy statements\n", "- Professional, formal tone\n", "- Multiple paragraphs covering different topics\n", "- Clear headers and categories\n", "- Length: Several paragraphs (300-500 words)\n", "\n", "QUESTION:\n", "- Written by end users/customers\n", "- Informal, conversational tone\n", "- Usually 1-2 sentences\n", "- Often includes context about their specific situation\n", "- May contain typos or casual language\n", "- Length: 20-50 words\n", "</planning>\n" ] } ], "source": [ "# Copied and pasted from a cell above.\n", "planning_match = re.search(r'<planning>(.*?)</planning>', result, re.DOTALL)\n", "if planning_match and not planning_text:\n", " planning_text = \"<planning>\\n\" + planning_match.group(1).strip() + \"\\n</planning>\"\n", "\n", "extracted_variables = {}\n", "for var in variables:\n", " var_match = re.search(f'<{var}>(.*?)</{var}>', result[result.index(\"<variables>\"):], re.DOTALL)\n", " if var_match:\n", " extracted_variables[var] = var_match.group(1).strip()\n", "\n", "USER_EXAMPLES.append(extracted_variables)\n", "\n", "print(\"~~~~~~~~~~~\\nGenerated test case:\\n~~~~~~~~~~~\")\n", "for var, value in extracted_variables.items():\n", " print(f\"{var}:\\n{value}\\n\")\n", " \n", "print(\"~~~~~~~~~~~\\nPlanning:\\n~~~~~~~~~~~\")\n", "print(planning_text)" ] }, { "cell_type": "markdown", "id": "b26803e4", "metadata": {}, "source": [ "Great, it did the numbered Q and A!\n", "\n", "Let's make another example. This one will use the example we already have, so hopefully it will be interestingly different." ] }, { "cell_type": "code", "execution_count": 14, "id": "99e47007-9762-48f3-aead-1d5be5d322b5", "metadata": {}, "outputs": [], "source": [ "result = get_test_data(prompt_template, USER_EXAMPLES, planning_text)" ] }, { "cell_type": "code", "execution_count": 15, "id": "4e66867f", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "~~~~~~~~~~~\n", "Generated test case:\n", "~~~~~~~~~~~\n", "DOCUMENTS:\n", "Product Warranty\n", "- All electronics come with a 1-year limited manufacturer warranty\n", "- Warranty covers defects in materials and workmanship\n", "- Warranty does not cover accidental damage or misuse\n", "- Extended warranty available for purchase within 30 days\n", "\n", "Price Match Policy\n", "- We match prices from authorized retailers\n", "- Item must be identical model/color/specification\n", "- Must be in stock at competitor's store\n", "- Online retailers excluded from price matching\n", "- Price match requests must be made at time of purchase\n", "\n", "Order Cancellation\n", "- Orders can be cancelled within 2 hours of placement\n", "- Once order is shipped, cancellation not possible\n", "- Cancelled orders refunded to original payment method\n", "- Processing time for refunds: 3-5 business days\n", "- Contact customer service for cancellation requests\n", "\n", "QUESTION:\n", "Hello, I bought a laptop from your store 3 weeks ago and it keeps shutting down randomly. It's still under warranty, right? What do I need to do to get it fixed? Thanks in advance!\n", "\n", "~~~~~~~~~~~\n", "Planning:\n", "~~~~~~~~~~~\n", "<planning>\n", "1. Prompt Template Summary:\n", "This template creates a customer service chatbot for Acme Corporation that answers customer questions based on official company policies/FAQ documents. The goal is to ensure consistent, policy-compliant responses to customer inquiries.\n", "\n", "2. Variable Analysis:\n", "\n", "DOCUMENTS:\n", "- Would likely be maintained by Acme's policy/legal team\n", "- Stored in a knowledge base or content management system\n", "- Formatted as structured FAQ entries or policy statements\n", "- Professional, formal tone\n", "- Multiple paragraphs covering different topics\n", "- Clear headers and categories\n", "- Length: Several paragraphs (300-500 words)\n", "\n", "QUESTION:\n", "- Written by end users/customers\n", "- Informal, conversational tone\n", "- Usually 1-2 sentences\n", "- Often includes context about their specific situation\n", "- May contain typos or casual language\n", "- Length: 20-50 words\n", "</planning>\n" ] } ], "source": [ "# Copied and pasted from a cell above.\n", "planning_match = re.search(r'<planning>(.*?)</planning>', result, re.DOTALL)\n", "if planning_match and not planning_text:\n", " planning_text = \"<planning>\\n\" + planning_match.group(1).strip() + \"\\n</planning>\"\n", "\n", "extracted_variables = {}\n", "for var in variables:\n", " var_match = re.search(f'<{var}>(.*?)</{var}>', result[result.index(\"<variables>\"):], re.DOTALL)\n", " if var_match:\n", " extracted_variables[var] = var_match.group(1).strip()\n", "\n", "USER_EXAMPLES.append(extracted_variables)\n", "\n", "print(\"~~~~~~~~~~~\\nGenerated test case:\\n~~~~~~~~~~~\")\n", "for var, value in extracted_variables.items():\n", " print(f\"{var}:\\n{value}\\n\")\n", " \n", "print(\"~~~~~~~~~~~\\nPlanning:\\n~~~~~~~~~~~\")\n", "print(planning_text)" ] }, { "cell_type": "markdown", "id": "580cf64d", "metadata": {}, "source": [ "Still about ACME corporation, but the question is different and so is the knowledge base." ] }, { "cell_type": "markdown", "id": "6c83325e", "metadata": {}, "source": [ "From here, the world is your oyster -- you can generate more test cases by running the code in a loop, edit the planning more, evaluate Claude on these test cases, and put the test cases you make along with golden answers into your prompt as multishot examples.\n", "\n", "To get golden answers, you can either write them yourself from scratch, or have Claude write an answer and then edit it to taste. With the advent of prompt caching, there's never been a better time to add tons of examples to your prompt to improve performance." ] } ], "metadata": { "kernelspec": { "display_name": "base", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.14" } }, "nbformat": 4, "nbformat_minor": 5 }