gemini/use-cases/retrieval-augmented-generation/rag-evaluation/deepeval_with_gemini.ipynb (873 lines of code) (raw):
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "ijGzTHJJUCPY"
},
"outputs": [],
"source": [
"# Copyright 2024 Google LLC\n",
"#\n",
"# Licensed under the Apache License, Version 2.0 (the \"License\");\n",
"# you may not use this file except in compliance with the License.\n",
"# You may obtain a copy of the License at\n",
"#\n",
"# https://www.apache.org/licenses/LICENSE-2.0\n",
"#\n",
"# Unless required by applicable law or agreed to in writing, software\n",
"# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
"# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
"# See the License for the specific language governing permissions and\n",
"# limitations under the License."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "VEqbX8OhE8y9"
},
"source": [
"# Getting Started with DeepEval & Gemini API in Vertex AI\n",
"\n",
"<table align=\"left\">\n",
" <td style=\"text-align: center\">\n",
" <a href=\"https://colab.research.google.com/github/GoogleCloudPlatform/generative-ai/blob/main/gemini/use-cases/retrieval-augmented-generation/rag-evaluation/deepeval_with_gemini.ipynb\">\n",
" <img src=\"https://cloud.google.com/ml-engine/images/colab-logo-32px.png\" alt=\"Google Colaboratory logo\"><br> Run in Colab\n",
" </a>\n",
" </td>\n",
" <td style=\"text-align: center\">\n",
" <a href=\"https://console.cloud.google.com/vertex-ai/colab/import/https:%2F%2Fraw.githubusercontent.com%2FGoogleCloudPlatform%2Fgenerative-ai%2Fmain%2Fgemini%2Fuse-cases%2Fretrieval-augmented-generation%2Frag-evaluation%2Fdeepeval_with_gemini.ipynb\">\n",
" <img width=\"32px\" src=\"https://lh3.googleusercontent.com/JmcxdQi-qOpctIvWKgPtrzZdJJK-J3sWE1RsfjZNwshCFgE_9fULcNpuXYTilIR2hjwN\" alt=\"Google Cloud Colab Enterprise logo\"><br> Run in Colab Enterprise\n",
" </a>\n",
" </td>\n",
" <td style=\"text-align: center\">\n",
" <a href=\"https://github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/use-cases/retrieval-augmented-generation/rag-evaluation/deepeval_with_gemini.ipynb\">\n",
" <img src=\"https://cloud.google.com/ml-engine/images/github-logo-32px.png\" alt=\"GitHub logo\"><br> View on GitHub\n",
" </a>\n",
" </td>\n",
" <td style=\"text-align: center\">\n",
" <a href=\"https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://raw.githubusercontent.com/GoogleCloudPlatform/generative-ai/main/gemini/use-cases/retrieval-augmented-generation/rag-evaluation/deepeval_with_gemini.ipynb\">\n",
" <img src=\"https://lh3.googleusercontent.com/UiNooY4LUgW_oTvpsNhPpQzsstV5W8F7rYgxgGBD85cWJoLmrOzhVs_ksK_vgx40SHs7jCqkTkCk=e14-rj-sc0xffffff-h130-w32\" alt=\"Vertex AI logo\"><br> Open in Vertex AI Workbench\n",
" </a>\n",
" </td>\n",
"</table>\n",
"\n",
"<div style=\"clear: both;\"></div>\n",
"\n",
"<b>Share to:</b>\n",
"\n",
"<a href=\"https://www.linkedin.com/sharing/share-offsite/?url=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/use-cases/retrieval-augmented-generation/rag-evaluation/deepeval_with_gemini.ipynb\" target=\"_blank\">\n",
" <img width=\"20px\" src=\"https://upload.wikimedia.org/wikipedia/commons/8/81/LinkedIn_icon.svg\" alt=\"LinkedIn logo\">\n",
"</a>\n",
"\n",
"<a href=\"https://bsky.app/intent/compose?text=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/use-cases/retrieval-augmented-generation/rag-evaluation/deepeval_with_gemini.ipynb\" target=\"_blank\">\n",
" <img width=\"20px\" src=\"https://upload.wikimedia.org/wikipedia/commons/7/7a/Bluesky_Logo.svg\" alt=\"Bluesky logo\">\n",
"</a>\n",
"\n",
"<a href=\"https://twitter.com/intent/tweet?url=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/use-cases/retrieval-augmented-generation/rag-evaluation/deepeval_with_gemini.ipynb\" target=\"_blank\">\n",
" <img width=\"20px\" src=\"https://upload.wikimedia.org/wikipedia/commons/5/5a/X_icon_2.svg\" alt=\"X logo\">\n",
"</a>\n",
"\n",
"<a href=\"https://reddit.com/submit?url=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/use-cases/retrieval-augmented-generation/rag-evaluation/deepeval_with_gemini.ipynb\" target=\"_blank\">\n",
" <img width=\"20px\" src=\"https://redditinc.com/hubfs/Reddit%20Inc/Brand/Reddit_Logo.png\" alt=\"Reddit logo\">\n",
"</a>\n",
"\n",
"<a href=\"https://www.facebook.com/sharer/sharer.php?u=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/use-cases/retrieval-augmented-generation/rag-evaluation/deepeval_with_gemini.ipynb\" target=\"_blank\">\n",
" <img width=\"20px\" src=\"https://upload.wikimedia.org/wikipedia/commons/5/51/Facebook_f_logo_%282019%29.svg\" alt=\"Facebook logo\">\n",
"</a> "
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "e57d479c5581"
},
"source": [
"| | |\n",
"|-|-|\n",
"| Author(s) | [Aditya Rane](https://github.com/Adi8885) |"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "CkHPv2myT2cx"
},
"source": [
"## Overview\n",
"\n",
"### [DeepEval](https://docs.confident-ai.com/docs/getting-started)\n",
"\n",
"DeepEval is an open-source evaluation framework for LLMs. DeepEval makes it extremely easy to build and iterate on LLM (applications) and was built with the following principles in mind:\n",
"\n",
"- Easily \"unit test\" LLM outputs in a similar way to Pytest.\n",
"- Plug-and-use 14+ LLM-evaluated metrics, most with research backing.\n",
"- Synthetic dataset generation with state-of-the-art evolution techniques.\n",
"- Metrics are simple to customize and covers all use cases.\n",
"- Real-time evaluations in production.\n",
"\n",
"### Gemini\n",
"\n",
"Gemini is a family of generative AI models developed by Google DeepMind that is designed for multimodal use cases. The Gemini API gives you access to the Gemini models.\n",
"\n",
"### Gemini API in Vertex AI\n",
"\n",
"The Gemini API in Vertex AI provides a unified interface for interacting with Gemini models. \n",
"\n",
"You can interact with the Gemini API using the following methods:\n",
"\n",
"- Use the [Vertex AI Studio](https://cloud.google.com/generative-ai-studio) for quick testing and command generation\n",
"- Use cURL commands\n",
"- Use the Vertex AI SDK\n",
"\n",
"This notebook focuses on using the **Gemini model with DeepEval**\n",
"\n",
"For more information, see the [Generative AI on Vertex AI](https://cloud.google.com/vertex-ai/docs/generative-ai/learn/overview) documentation."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "DrkcqHrrwMAo"
},
"source": [
"### Objectives\n",
"\n",
"In this notebook we will focus on using the Gemini API in Vertex AI with RAGAS\n",
"We will use the Gemini (`gemini-2.0-flash`) model for Q&A evaluation.\n",
"\n",
"You will complete the following tasks:\n",
"\n",
"- Install the Vertex AI SDK for Python\n",
"- Use the Gemini API in Vertex AI to interact with each model\n",
" - Gemini (`gemini-2.0-flash`) model:\n",
" - Q&A Generation\n",
" - Evaluate Q&A performance with RAGAS"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "C9nEPojogw-g"
},
"source": [
"### Costs\n",
"\n",
"This tutorial uses billable components of Google Cloud:\n",
"\n",
"- Vertex AI\n",
"\n",
"Learn about [Vertex AI pricing](https://cloud.google.com/vertex-ai/pricing) and use the [Pricing Calculator](https://cloud.google.com/products/calculator/) to generate a cost estimate based on your projected usage."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "r11Gu7qNgx1p"
},
"source": [
"## Getting Started"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "No17Cw5hgx12"
},
"source": [
"### Install Vertex AI SDK for Python"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "tFy3H3aPgx12"
},
"outputs": [],
"source": [
"#This notebook was created and tested with below versions \n",
"%pip install --user deepeval==0.21.51 \\\n",
"datasets==2.18.0 \\\n",
"langchain==0.1.14 \\\n",
"langchain-google-vertexai==1.0.5 \\\n",
"langchain-chroma==0.1.1 \\\n",
"chromadb==0.5.0 \\\n",
"pypdf==4.2.0"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "j7UyNVSiyQ96"
},
"source": [
"### Restart current runtime\n",
"\n",
"To use the newly installed packages in this Jupyter runtime, it is recommended to restart the runtime. Run the following cell to restart the current kernel.\n",
"\n",
"The restart process might take a minute or so."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "YmY9HVVGSBW5"
},
"outputs": [],
"source": [
"import IPython\n",
"\n",
"app = IPython.Application.instance()\n",
"app.kernel.do_shutdown(True)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "4TCvupe_UQvS"
},
"source": [
"After the restart is complete, continue to the next step."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "EXQZrM5hQeKb"
},
"source": [
"<div class=\"alert alert-block alert-warning\">\n",
"<b>⚠️ Wait for the kernel to finish restarting before you continue. ⚠️</b>\n",
"</div>"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "jXHfaVS66_01"
},
"source": [
"## Import libraries"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "lslYAvw37JGQ"
},
"outputs": [],
"source": [
"import itertools\n",
"\n",
"from deepeval import evaluate\n",
"from deepeval.metrics import AnswerRelevancyMetric\n",
"\n",
"# Base LLM for DeepEval\n",
"from deepeval.models.base_model import DeepEvalBaseLLM\n",
"from deepeval.test_case import LLMTestCase\n",
"from langchain.chains import RetrievalQA\n",
"from langchain.prompts import PromptTemplate\n",
"from langchain_chroma import Chroma\n",
"from langchain_community.document_loaders import PyPDFLoader\n",
"\n",
"# LangChain package for Vertex AI\n",
"from langchain_google_vertexai import ( # type: ignore[import-untyped]\n",
" ChatVertexAI,\n",
" HarmBlockThreshold,\n",
" HarmCategory,\n",
" VertexAIEmbeddings,\n",
")\n",
"import vertexai"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "812bdbe00afb"
},
"outputs": [],
"source": [
"# TODO(developer): Update the below lines\n",
"PROJECT_ID = \"<your_project>\"\n",
"LOCATION = \"<your_region>\"\n",
"\n",
"vertexai.init(project=PROJECT_ID, location=LOCATION)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "4437b7608c8e"
},
"source": [
"## Use Vertex AI models\n",
"\n",
"The [Gemini](https://cloud.google.com/vertex-ai/generative-ai/docs/learn/overview) models are designed to handle multimodal inputs."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "efedae445710"
},
"outputs": [],
"source": [
"# Initialise safety filters for vertex model\n",
"safety_settings = {\n",
" HarmCategory.HARM_CATEGORY_UNSPECIFIED: HarmBlockThreshold.BLOCK_NONE,\n",
" HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT: HarmBlockThreshold.BLOCK_NONE,\n",
" HarmCategory.HARM_CATEGORY_HATE_SPEECH: HarmBlockThreshold.BLOCK_NONE,\n",
" HarmCategory.HARM_CATEGORY_HARASSMENT: HarmBlockThreshold.BLOCK_NONE,\n",
" HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT: HarmBlockThreshold.BLOCK_NONE,\n",
"}\n",
"\n",
"generation_config = {\"temperature\": 0.0, \"topk\": 1}\n",
"\n",
"# Initialise the ChatVertexAI model\n",
"custom_chat_model_gemini = ChatVertexAI(\n",
" model_name=\"gemini-2.0-flash\",\n",
" safety_settings=safety_settings,\n",
" generation_config=generation_config,\n",
" project=PROJECT_ID,\n",
" location=LOCATION,\n",
" response_validation=False, # Important since deepeval cannot handle validation errors\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "d28ebabd3f38"
},
"source": [
"The [Vertex AI Embeddings](https://cloud.google.com/vertex-ai/generative-ai/docs/embeddings/get-text-embeddings) models are designed to convert text to dense vector representations"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "356471d32ab3"
},
"outputs": [],
"source": [
"# Load Embeddings Models\n",
"embeddings = VertexAIEmbeddings(model_name=\"text-embedding-005\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "f2aa34a39acb"
},
"source": [
"## Create a local Vector DB\n",
"### Load the document"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "398e911a765b"
},
"outputs": [],
"source": [
"# source document\n",
"document_uri = \"https://arxiv.org/pdf/1706.03762\""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "16aa29d18990"
},
"outputs": [],
"source": [
"# use PyPDF loaded to read and chunk the input document\n",
"loader = PyPDFLoader(document_uri)\n",
"docs = loader.load_and_split()\n",
"\n",
"# Verify if pages are loaded correctly\n",
"docs[0]"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "cdddcf1fe8d1"
},
"source": [
"### Create local Vector DB"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "6648f7c42278"
},
"outputs": [],
"source": [
"# Create an in-memory Vector DB using Chroma\n",
"vectordb = Chroma.from_documents(docs, embeddings)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "781b31cf93f9"
},
"outputs": [],
"source": [
"# Set Vector DB as retriever\n",
"retriever = vectordb.as_retriever()"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "05c304041ced"
},
"source": [
"### Create Q&A Chain"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "f4a30e562673"
},
"outputs": [],
"source": [
"# Create Q&A template for the Gemini Model\n",
"template = \"\"\"Your task is to answer questions related to documents.\n",
"Use the following context to answer the question at the end.\n",
"{context}\n",
"\n",
"Answers should be crisp.\n",
"\n",
"Question: {question}\n",
"Helpful Answer:\"\"\"\n",
"\n",
"# Create a prompt template for the q&a chain\n",
"PROMPT = PromptTemplate(\n",
" template=template,\n",
" input_variables=[\"context\", \"question\"],\n",
")\n",
"\n",
"# Pass prompts to q&a chain\n",
"chain_type_kwargs = {\"prompt\": PROMPT}\n",
"\n",
"# Retriever arguments\n",
"retriever.search_kwargs = {\"k\": 1}"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "5a0db3ef72ba"
},
"outputs": [],
"source": [
"# Setup a RetrievalQA Chain\n",
"qa = RetrievalQA.from_chain_type(\n",
" llm=custom_chat_model_gemini,\n",
" chain_type=\"stuff\",\n",
" retriever=retriever,\n",
" return_source_documents=True,\n",
" chain_type_kwargs=chain_type_kwargs,\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "9a5adc4734e6"
},
"outputs": [],
"source": [
"# Test the chain with a sample question\n",
"query = \"Who are the authors of paper on Attention is all you need?\"\n",
"result = qa({\"query\": query})\n",
"result"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "01ed0ed9fabe"
},
"source": [
"## Evaluation\n",
"### Create the evaluation set"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "7dff0bb8c87a"
},
"outputs": [],
"source": [
"# Evaluation set with questions and ground_truth\n",
"questions = [\n",
" \"What architecture is proposed in paper titled Attention is all you need?\",\n",
" \"Where do primary authors of paper titled Attention is all you need work?\",\n",
"]\n",
"ground_truth = [\"Transformers architecture\", \"Google Brain\"]"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "8e1c60b44b2c"
},
"source": [
"### Run the [Q&A chain](#create-qa-chain) on evaluation dataset "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "eae0927ed7e5"
},
"outputs": [],
"source": [
"contexts = []\n",
"answers = []\n",
"\n",
"# Generate contexts and answers for each question\n",
"for query in questions:\n",
" result = qa({\"query\": query})\n",
" contexts.append(\n",
" [document.page_content for document in result.get(\"source_documents\")]\n",
" )\n",
" answers.append(result.get(\"result\"))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "d1544319aa30"
},
"outputs": [],
"source": [
"# Convert into a dataset and prepare for consumption by DeepEval API\n",
"dataset = []\n",
"for q, a, c, g in itertools.zip_longest(questions, answers, contexts, ground_truth):\n",
" dataset.append({\"Question\": q, \"Answer\": g, \"Context\": c})\n",
"\n",
"# Inspect the dataset\n",
"dataset"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "5547f84f5198"
},
"source": [
"## IMPORTANT : Gemini with DeepEval\n",
"> DeepEval is designed to work with OpenAI Models by default. We must write a Wrapper to make it work with Gemini"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "34f368618ffa"
},
"outputs": [],
"source": [
"# Base LLM for DeepEval\n",
"\n",
"\n",
"class GoogleVertexAIDeepEval(DeepEvalBaseLLM):\n",
" \"\"\"Class to implement Vertex AI for DeepEval\"\"\"\n",
"\n",
" def __init__(self, model): # pylint: disable=W0231\n",
" self.model = model\n",
"\n",
" def load_model(self): # pylint: disable=W0221\n",
" return self.model\n",
"\n",
" def generate(self, prompt: str) -> str: # pylint: disable=W0221\n",
" chat_model = self.load_model()\n",
" return chat_model.invoke(prompt).content\n",
"\n",
" async def a_generate(self, prompt: str) -> str: # pylint: disable=W0221\n",
" chat_model = self.load_model()\n",
" res = await chat_model.ainvoke(prompt)\n",
" return res.content\n",
"\n",
" def get_model_name(self): # pylint: disable=W0236 , W0221\n",
" return \"Vertex AI Model\""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "6003be80b9d0"
},
"outputs": [],
"source": [
"# Initialise the DeepEval wrapper class\n",
"google_vertexai_gemini_deepeval = GoogleVertexAIDeepEval(model=custom_chat_model_gemini)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "2e862113e393"
},
"source": [
"### Run the DeepEval Evaluation"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "518d58f8d072"
},
"outputs": [],
"source": [
"answer_relevancy_metric = AnswerRelevancyMetric(\n",
" threshold=0.5, model=google_vertexai_gemini_deepeval, async_mode=False\n",
")\n",
"test_cases = []\n",
"for record in dataset:\n",
" test_cases.append(\n",
" LLMTestCase(\n",
" input=record[\"Question\"],\n",
" actual_output=record[\"Answer\"],\n",
" retrieval_context=record[\"Context\"],\n",
" )\n",
" )"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "053667e6ebb5"
},
"outputs": [],
"source": [
"# Evaluate test cases in bulk\n",
"evaluate(test_cases, [answer_relevancy_metric])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "d4675ad2118c"
},
"outputs": [],
"source": [
"# measure single instance\n",
"answer_relevancy_metric.measure(test_cases[0])"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "867299d6a903"
},
"source": [
"### To use DeepEval with Pytest with "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "a254c7b61e72"
},
"outputs": [],
"source": [
"%%writefile ./scripts/vertex_llm.py\n",
"\n",
"# Copyright 2024 Google LLC\n",
"#\n",
"# Licensed under the Apache License, Version 2.0 (the \"License\");\n",
"# you may not use this file except in compliance with the License.\n",
"# You may obtain a copy of the License at\n",
"#\n",
"# https://www.apache.org/licenses/LICENSE-2.0\n",
"#\n",
"# Unless required by applicable law or agreed to in writing, software\n",
"# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
"# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
"# See the License for the specific language governing permissions and\n",
"# limitations under the License.\n",
"\"\"\"Custom Class implementation\"\"\"\n",
"\n",
"# Base LLM for DeepEval\n",
"from deepeval.models.base_model import DeepEvalBaseLLM\n",
"\n",
"# LangChain package for Vertex AI\n",
"from langchain_google_vertexai import ChatVertexAI\n",
"\n",
"\n",
"class GoogleVertexAIDeepEval(DeepEvalBaseLLM):\n",
" \"\"\"Class to implement Vertex AI for DeepEval\"\"\"\n",
"\n",
" def __init__(self, model: ChatVertexAI) -> None: # pylint: disable=W0231\n",
" \"\"\"Initialise the model\"\"\"\n",
" self.model = model\n",
"\n",
" def load_model(self) -> ChatVertexAI: # pylint: disable=W0221\n",
" \"\"\"Loads the model\"\"\"\n",
" return self.model\n",
"\n",
" def generate(self, prompt: str) -> str: # pylint: disable=W0221\n",
" \"\"\"Invokes the model\"\"\"\n",
" chat_model = self.load_model()\n",
" return chat_model.invoke(prompt).content\n",
"\n",
" async def a_generate(self, prompt: str) -> str: # pylint: disable=W0221\n",
" \"\"\"Invokes the model async\"\"\"\n",
" chat_model = self.load_model()\n",
" res = await chat_model.ainvoke(prompt)\n",
" return res.content\n",
"\n",
" def get_model_name(self) -> str: # pylint: disable=W0236 , W0221\n",
" \"\"\"Returns the model name\"\"\"\n",
" return \"Vertex AI Model\"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "76296281b97c"
},
"outputs": [],
"source": [
"%%writefile ./scripts/test_chatbot.py\n",
"\n",
"\"\"\"Test Script for DeepEval with Gemini\"\"\"\n",
"\n",
"import itertools\n",
"\n",
"from deepeval import assert_test\n",
"from deepeval.metrics import AnswerRelevancyMetric\n",
"from deepeval.test_case import LLMTestCase\n",
"\n",
"# LangChain package for Vertex AI\n",
"from langchain_google_vertexai import ChatVertexAI, HarmBlockThreshold, HarmCategory\n",
"import pytest\n",
"from vertex_llm import GoogleVertexAIDeepEval # pylint: disable=E0401\n",
"\n",
"# TODO(developer): Update the below lines\n",
"PROJECT_ID = \"<your_project>\"\n",
"LOCATION = \"<your_region>\"\n",
"\n",
"# Initialize safety filters for Gemini model\n",
"safety_settings = {\n",
" HarmCategory.HARM_CATEGORY_UNSPECIFIED: HarmBlockThreshold.BLOCK_NONE,\n",
" HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT: HarmBlockThreshold.BLOCK_NONE,\n",
" HarmCategory.HARM_CATEGORY_HATE_SPEECH: HarmBlockThreshold.BLOCK_NONE,\n",
" HarmCategory.HARM_CATEGORY_HARASSMENT: HarmBlockThreshold.BLOCK_NONE,\n",
" HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT: HarmBlockThreshold.BLOCK_NONE,\n",
"}\n",
"\n",
"# Initialize the ChatVertexAI model\n",
"custom_model_gemini = ChatVertexAI(\n",
" model_name=\"gemini-2.0-flash\",\n",
" safety_settings=safety_settings,\n",
" project=PROJECT_ID,\n",
" location=LOCATION,\n",
" response_validation=False, # Important since deepeval cannot handle validation errors\n",
")\n",
"\n",
"# Initialize the DeepEval wrapper class\n",
"google_vertexai_gemini_deepeval = GoogleVertexAIDeepEval(model=custom_model_gemini)\n",
"\n",
"# Evaluation set with questions and ground_truth\n",
"questions = [\n",
" \"What architecture is proposed in paper titled Attention is all you need?\",\n",
" \"Where do primary authors of paper titled Attention is all you need work?\",\n",
"]\n",
"ground_truth = [\"Transformers architecture\", \"Google Brain\"]\n",
"\n",
"# Convert into a dataset and prepare for consumption by DeepEval API\n",
"test_set = []\n",
"for q, a in itertools.zip_longest(questions, ground_truth):\n",
" test_set.append({\"Question\": q, \"Answer\": a, \"Context\": None})\n",
"\n",
"\n",
"@pytest.mark.parametrize(\"record\", test_set)\n",
"def test_answer_relevancy(record: dict) -> None:\n",
" \"\"\"Function to test Answer relevancy\"\"\"\n",
" answer_relevancy_metric = AnswerRelevancyMetric(\n",
" threshold=0.5, model=google_vertexai_gemini_deepeval\n",
" )\n",
" test_case = LLMTestCase(\n",
" input=record[\"Question\"],\n",
" actual_output=record[\"Answer\"],\n",
" retrieval_context=record[\"Context\"],\n",
" )\n",
" assert_test(test_case, [answer_relevancy_metric])\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "e25133d8089f"
},
"outputs": [],
"source": [
"# run the pytest scripts\n",
"!pytest scripts/"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "55046c75bd67"
},
"source": [
"# Conclusion\n",
"\n",
"In this notebook, you learned:\n",
"\n",
"1. DeepEval - Framework for evaluation .\n",
"2. Making DeepEval Work with Gemini API in Vertex AI\n",
"3. Integrating DeepEval with Pytest"
]
}
],
"metadata": {
"colab": {
"name": "deepeval_with_gemini.ipynb",
"toc_visible": true
},
"kernelspec": {
"display_name": "Python 3",
"name": "python3"
}
},
"nbformat": 4,
"nbformat_minor": 0
}