# Evaluate mutliple models in quantitative NLP evaluators

## Objective
This notebook demonstrates how to use NLP-based evaluators to assess the quality of generated text by comparing it to reference text. By the end of this tutorial, you'll be able to:
 - Understand different NLP evaluators such as `BleuScoreEvaluator`, `GleuScoreEvaluator`, `MeteorScoreEvaluator`, and `RougeScoreEvaluator`.
 - Evaluate dataset using these evaluators.

## Time
You should expect to spend about 10 minutes running this notebook.

## Before you begin

### Installation
Install the following packages required to execute this notebook.

In [None]:
# Install the packages
%pip install azure-ai-evaluation

In [None]:
import os
from pprint import pprint
from dotenv import load_dotenv
load_dotenv("../.credentials.env")

## NLP Evaluators

In [None]:
# Initialize Azure AI project and Azure OpenAI conncetion with your environment variables
azure_ai_project = {
    "subscription_id": os.environ.get("AZURE_SUBSCRIPTION_ID"),
    "resource_group_name": os.environ.get("AZURE_RESOURCE_GROUP"),
    "project_name": os.environ.get("AZURE_PROJECT_NAME"),
}

## Set up env vars for model endpoints and keys

In [None]:
env_var = { 
    "gpt-35-turbo": {
        "endpoint": os.environ.get("AZURE_OPENAI_GPT35_ENDPOINT"),
        "key": os.environ.get("AZURE_OPENAI_GPT35_API_KEY"),
    },
    "gpt-4": {
        "endpoint": os.environ.get("AZURE_OPENAI_GPT4_ENDPOINT"),
        "key": os.environ.get("AZURE_OPENAI_GPT4_API_KEY"),
    },
    "gpt-4o": {
        "endpoint": os.environ.get("AZURE_OPENAI_GPT4o_ENDPOINT"),
        "key": os.environ.get("AZURE_OPENAI_GPT4o_API_KEY"),
    },
   "gpt-4o-mini" : { 
        "endpoint" : os.environ.get("AZURE_OPENAI_GPT4o-mini_ENDPOINT"), 
        "key" : os.environ.get("AZURE_OPENAI_GPT4o-mini_API_KEY"), 
    },    
}

In [None]:
with open("target_nlp_api/target_nlp_api.py") as fin:
    print(fin.read())

In [None]:
from target_nlp_api.target_nlp_api import ModelEndpoints

In [None]:
from azure.ai.evaluation import BleuScoreEvaluator
from azure.ai.evaluation import GleuScoreEvaluator
from azure.ai.evaluation import MeteorScoreEvaluator
from azure.ai.evaluation import RougeScoreEvaluator, RougeType

bleu = BleuScoreEvaluator()
gleu = GleuScoreEvaluator()
meteor = MeteorScoreEvaluator(alpha=0.9, beta=3.0, gamma=0.5)
rouge = RougeScoreEvaluator(rouge_type=RougeType.ROUGE_1)

In [None]:
from azure.ai.evaluation import evaluate
import random
import pathlib
import sys

from target_nlp_api.target_nlp_api import ModelEndpoints

models = ["gpt-35-turbo","gpt-4","gpt-4o","gpt-4o-mini"]

for model in models:
    print(" Evaluating NLP metrics - ", model)
    print("-----------------------------------")
    randomNum = random.randint(1111, 9999)
    result = evaluate(
        azure_ai_project=azure_ai_project, 
        data="ai_data.jsonl",
        evaluation_name = "NLP-" + model.title() + "_Run-" + str(randomNum),
        target = ModelEndpoints(env_var, model),

        evaluators={
            "bleu": bleu,
            "gleu": gleu,
            "meteor": meteor,
            "rouge": rouge,
        },
        evaluator_config={
        "bleu": {
            "column_mapping": {
                "ground_truth": "${data.ground_truth}",
                "response": "${target.response}"}
            },
        }
    )

View the results, Alternatively you can view the results in AI Foundry

In [None]:
import pandas as pd

pd.DataFrame(result["rows"])