multimodal/reading_charts_graphs

{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "HCuST1ZZKuXv" }, "source": [ "# Working with Charts, Graphs, and Slide Decks\n", "Claude is highly capable of working with charts, graphs, and broader slide decks. Depending on your use case, there are a number of tips and tricks that you may want to take advantage of. This recipe will show you common patterns for using Claude with these materials." ] }, { "cell_type": "markdown", "metadata": { "id": "yRzNf75nKuXw" }, "source": [ "## Charts and Graphs\n", "For the most part, using claude with charts and graphs is simple. Let's walk through how to ingest them and pass them to Claude, as well as some common tips to improve your results." ] }, { "cell_type": "markdown", "metadata": { "id": "oRUFAZcKKuXw" }, "source": [ "### Ingestion and calling the Claude API\n", "The best way to pass Claude charts and graphs is to take advantage of its vision capabilities and the PDF support feature. That is, give Claude a PDF document of the chart or graph, along with a text question about it.\n", "\n", "At the moment, only `claude-3-5-sonnet-20241022` supports the PDF feature. Since the feature is still in beta, you will need to provide it with the `pdfs-2024-09-25` beta header." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "qZhhy2XqKuXw", "outputId": "5c791634-0fd1-4d16-92bf-aef5b5089d25" }, "outputs": [], "source": [ "# Install and create the Anthropic client.\n", "%pip install anthropic" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "id": "FYeAUCZfKuXw" }, "outputs": [], "source": [ "import base64\n", "from anthropic import Anthropic\n", "# While PDF support is in beta, you must pass in the correct beta header\n", "client = Anthropic(default_headers={\n", " \"anthropic-beta\": \"pdfs-2024-09-25\"\n", " }\n", ")\n", "# For now, only claude-3-5-sonnet-20241022 supports PDFs\n", "MODEL_NAME = \"claude-3-5-sonnet-20241022\"" ] }, { "cell_type": "code", "execution_count": 37, "metadata": { "id": "ff40NSWcKuXw" }, "outputs": [], "source": [ "# Make a useful helper function.\n", "def get_completion(messages):\n", " response = client.messages.create(\n", " model=MODEL_NAME,\n", " max_tokens=8192,\n", " temperature=0,\n", " messages=messages\n", " )\n", " return response.content[0].text" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "id": "HYu3rob2KuXx" }, "outputs": [], "source": [ "# To start, we'll need a PDF. We will be using the .pdf document located at cvna_2021_annual_report.pdf.\n", "# Start by reading in the PDF and encoding it as base64.\n", "with open(\"./documents/cvna_2021_annual_report.pdf\", \"rb\") as pdf_file:\n", " binary_data = pdf_file.read()\n", " base_64_encoded_data = base64.b64encode(binary_data)\n", " base64_string = base_64_encoded_data.decode('utf-8')" ] }, { "cell_type": "markdown", "metadata": { "id": "eWngQL60KuXx" }, "source": [ "Let's see how we can pass this document to the model alongside a simple question." ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "vmVJlTxoKuXx", "outputId": "07c4b704-3661-40d8-b90e-ca61392d4e6a" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "This is a page from Carvana's 2021 Annual Report showing four key metrics: retail units sold, total revenue, total markets at year end, and car vending machines, all displaying significant growth from 2014 to 2021.\n" ] } ], "source": [ "messages = [\n", " {\n", " \"role\": 'user',\n", " \"content\": [\n", " {\"type\": \"document\", \"source\": {\"type\": \"base64\", \"media_type\": \"application/pdf\", \"data\": base64_string}},\n", " {\"type\": \"text\", \"text\": \"What's in this document? Answer in a single sentence.\"}\n", " ]\n", " }\n", "]\n", "\n", "print(get_completion(messages))" ] }, { "cell_type": "markdown", "metadata": { "id": "r9wZMStZKuXx" }, "source": [ "That's pretty good! Now let's ask it some more useful questions." ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "Wg7A0pcIKuXy", "outputId": "e965371e-576b-453f-b35a-eae82896cb4e" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "----------Question 1----------\n", "According to the graph showing Total Revenue ($M), Carvana's revenue in 2020 was $5,587 million (or approximately $5.59 billion).\n", "\n", "----------Question 2----------\n", "According to the \"TOTAL MARKETS AT YEAR END\" graph, Carvana started with 4 markets in 2014 and grew to 311 markets by 2021. Therefore, Carvana added 307 markets since 2014 (311 - 4 = 307 additional markets).\n", "\n", "----------Question 3----------\n", "Let me calculate this for you:\n", "\n", "2016 Revenue: $365 million\n", "2016 Retail Units Sold: 18,761 units\n", "\n", "$365 million ÷ 18,761 units = $19,455 per unit (rounded to nearest dollar)\n", "\n", "So in 2016, Carvana's revenue per retail unit sold was approximately $19,455.\n" ] } ], "source": [ "questions = [\n", " \"What was CVNA revenue in 2020?\",\n", " \"How many additional markets has Carvana added since 2014?\",\n", " \"What was 2016 revenue per retail unit sold?\"\n", "]\n", "\n", "for index, question in enumerate(questions):\n", " messages = [\n", " {\n", "\n", " \"role\": 'user',\n", " \"content\": [\n", " {\"type\": \"document\", \"source\": {\"type\": \"base64\", \"media_type\": \"application/pdf\", \"data\": base64_string}},\n", " {\"type\": \"text\", \"text\": question}\n", " ]\n", " }\n", " ]\n", "\n", " print(f\"\\n----------Question {index+1}----------\")\n", " print(get_completion(messages))" ] }, { "cell_type": "markdown", "metadata": { "id": "40iNE9-3KuXy" }, "source": [ "As you can see, Claude is capable of answering fairly detailed questions about charts and graphs. However, there are some tips and tricks that will help you get the most out of it.\n", "- Sometimes Claude's arithmetic capabilities get in the way. You'll notice that if you sample the third question above it will occasionally output an incorrect final answer because it messes up the arithmetic. Consider providing Claude with a calculator tool to ensure it doesn't make these types of mistakes.\n", "- With super complicated charts and graphs, we can ask Claude to \"First describe every data point you see in the document\" as a way to elicit similar improvements to what we seen in traditional Chain of Thought.\n", "- Claude occasionally struggles with charts that depend on lots of colors to convey information, such as grouped bar charts with many groups. Asking Claude to first identify the colors in your graph using HEX codes can boost its accuracy." ] }, { "cell_type": "markdown", "metadata": { "id": "gkEFJcriKuXy" }, "source": [ "## Slide Decks\n", "Now that we know Claude is a charts and graphs wizard, it is only logical that we extend it to the true home of charts and graphs - slide decks!\n", "\n", "Slides represent a critical source of information for many domains, including financial services. While you *can* use packages like PyPDF to extract text from slide decks, their chart/graph heavy nature often makes this a poor choice as models will struggle to access the information they actually need.\n", "\n", "The PDF support feature can be a great replacement as a result. It uses both extracted text and vision in order when processing PDF documents. In this section we will go over how to use PDF documents in Claude to review slide decks, and how to deal with some common pitfalls of this approach." ] }, { "cell_type": "markdown", "metadata": { "id": "eElb_dFQKuXy" }, "source": [ "The best way to get a typical slide deck into claude is to download it as a PDF and provide it directly to Claude." ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "id": "5Jpt1L0WOVWa" }, "outputs": [], "source": [ "# Open the multi-page PDF document the same way we did earlier.\n", "with open(\"./documents/twilio_q4_2023.pdf\", \"rb\") as pdf_file:\n", " binary_data = pdf_file.read()\n", " base_64_encoded_data = base64.b64encode(binary_data)\n", " base64_string = base_64_encoded_data.decode('utf-8')" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "JWPLWxtTKuXy", "outputId": "d2a2839a-d95c-437a-8280-512549bd8be0" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "According to the financial results shown in the presentation, Twilio's year-over-year revenue growth for fiscal year 2023 was 9%. This can be found in the \"Total Company Results Highlights\" section, which shows FY 2023 revenue growth of 9%.\n" ] } ], "source": [ "# Now let's pass the document directly to Claude. Note that Claude will process both the text and visual elements of the document.\n", "question = \"What was Twilio y/y revenue growth for fiscal year 2023?\"\n", "content = [\n", " {\"type\": \"document\", \"source\": {\"type\": \"base64\", \"media_type\": \"application/pdf\", \"data\": base64_string}},\n", " {\"type\": \"text\", \"text\": question}\n", "]\n", "\n", "messages = [\n", " {\n", " \"role\": 'user',\n", " \"content\": content\n", " }\n", "]\n", "\n", "print(get_completion(messages))" ] }, { "cell_type": "markdown", "metadata": { "id": "tbtz3BTXKuXy" }, "source": [ "This approach is a great way to get started, and for some use cases offers the best performance. However, there are some limitations.\n", "- You can only include a total of 100 pages across all provided documents in a request (we intend to increase this limit over time).\n", "- If you are using slide content as part of RAG, introducing multimodal PDFs into your embeddings can cause problems\n", "\n", "Luckily, we can take advantage of Claude's vision capabilities to get a much higher quality representation of the slide deck **in text form** than normal pdf text extraction allows.\n", "\n", "We find the best way to do this is to ask Claude to sequentially narrate the deck from start to finish, passing it the current slide and its prior narration. Let's see how." ] }, { "cell_type": "code", "execution_count": 41, "metadata": { "id": "BQYUPFJ-QwtP" }, "outputs": [], "source": [ "# Define a prompt for narrating our slide deck. We would adjut this prompt based on the nature of the deck, but keep the structure largely the same.\n", "prompt = \"\"\"\n", "You are the Twilio CFO, narrating your Q4 2023 earnings presentation.\n", "\n", "The entire earnings presentation document is provided to you.\n", "Please narrate this presentation from Twilio's Q4 2023 Earnings as if you were the presenter. Do not talk about any things, especially acronyms, if you are not exactly sure you know what they mean.\n", "\n", "Do not leave any details un-narrated as some of your viewers are vision-impaired, so if you don't narrate every number they won't know the number.\n", "\n", "Structure your response like this:\n", "<narration>\n", " <page_narration id=1>\n", " [Your narration for page 1]\n", " </page_narration>\n", "\n", " <page_narration id=2>\n", " [Your narration for page 2]\n", " </page_narration>\n", "\n", " ... and so on for each page\n", "</narration>\n", "\n", "Use excruciating detail for each page, ensuring you describe every visual element and number present. Show the full response in a single message.\n", "\"\"\"\n", "messages = [\n", " {\n", " \"role\": 'user',\n", " \"content\": [\n", " {\"type\": \"document\", \"source\": {\"type\": \"base64\", \"media_type\": \"application/pdf\", \"data\": base64_string}},\n", " {\"type\": \"text\", \"text\": prompt}\n", " ]\n", " }\n", "]\n", "\n", "# Now we use our prompt to narrate the entire deck. Note that this may take a few minutes to run (often up to 10).\n", "completion = get_completion(messages)" ] }, { "cell_type": "code", "execution_count": 42, "metadata": { "id": "LCrVi-B5UGUS" }, "outputs": [], "source": [ "import re\n", "\n", "# Next we'll parse the response from Claude using regex\n", "pattern = r\"<narration>(.*?)</narration>\"\n", "match = re.search(pattern, completion.strip(), re.DOTALL)\n", "if match:\n", " narration = match.group(1)\n", "else:\n", " raise ValueError(\"No narration available. Likely due to the model response being truncated.\")" ] }, { "cell_type": "markdown", "metadata": { "id": "Fwx8_-4QKuXy" }, "source": [ "Now that we have a text-based narration (it's far from perfect but it's pretty good), we have the ability to use this deck with any text-only workflow. Including vector search!\n", "\n", "As a final sanity check, let's ask a few questions of our narration-only setup!" ] }, { "cell_type": "code", "execution_count": 43, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "SQuTknpzKuXy", "outputId": "e2e178c0-f3ca-4610-a3ea-6008fbb86d07" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "----------Question 1----------\n", "Let me calculate this:\n", "\n", "Segment revenue in Q4 2023: $75 million\n", "Total revenue in Q4 2023: $1,076 million\n", "\n", "$75M ÷ $1,076M = 0.0697 or approximately 7%\n", "\n", "Therefore, the Segment business line represented approximately 7% of Twilio's total Q4 2023 revenue.\n", "\n", "----------Question 2----------\n", "Decreasing. The transcript shows Q4 2023 revenue growth was 5% year-over-year, while for the full year 2023 revenue growth was 9% year-over-year, indicating a slowing growth rate. Additionally, the Q1 2024 guidance projects even lower growth of 2-3% year-over-year, confirming the declining trend.\n", "\n", "----------Question 3----------\n", "Let me help calculate the acquisition revenue for 2023.\n", "\n", "From the transcript, we can see:\n", "- Total revenue for 2023: $4,154 million\n", "- Organic revenue for 2023: $4,146 million\n", "\n", "Therefore, acquisition revenue would be:\n", "Total Revenue - Organic Revenue = $4,154M - $4,146M = $8 million\n", "\n", "So the acquisition revenue for the year ended December 31, 2023 was $8 million.\n", "\n", "This can be verified by the fact that the difference between total revenue growth (9%) and organic revenue growth (10%) also suggests a small contribution from acquisitions.\n" ] } ], "source": [ "questions = [\n", " \"What percentage of q4 total revenue was the Segment business line?\",\n", " \"Has the rate of growth of quarterly revenue been increasing or decreasing? Give just an answer.\",\n", " \"What was acquisition revenue for the year ended december 31, 2023 (including negative revenues)?\"\n", "]\n", "\n", "for index, question in enumerate(questions):\n", " prompt = f\"\"\"You are an expert financial analyst analyzing a transcript of Twilio's earnings call.\n", "Here is the transcript:\n", "<transcript>\n", "{narration}\n", "</transcript>\n", "\n", "Please answer the following question:\n", "<question>\n", "{question}\n", "</question>\"\"\"\n", " messages = [\n", " {\n", "\n", " \"role\": 'user',\n", " \"content\": [\n", " {\"type\": \"text\", \"text\": prompt}\n", " ]\n", " }\n", " ]\n", "\n", " print(f\"\\n----------Question {index+1}----------\")\n", " print(get_completion(messages))" ] }, { "cell_type": "markdown", "metadata": { "id": "fHyZkSSaKuXy" }, "source": [ "Looks good! With these techniques at your side, you are ready to start applying models to chart and graph heavy content like slide decks." ] } ], "metadata": { "colab": { "provenance": [] }, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.12" } }, "nbformat": 4, "nbformat_minor": 0 }

multimodal/reading_charts_graphs_powerpoints.ipynb (502 lines of code) (raw):