1
00:00:00,269 --> 00:00:02,936
(air whooshing)

2
00:00:05,700 --> 00:00:07,110
- In our other videos,

3
00:00:07,110 --> 00:00:09,000
and as always, there'll be links below

4
00:00:09,000 --> 00:00:10,740
if you want to check those out,

5
00:00:10,740 --> 00:00:13,230
we showed you how to
initialize and fine-tune

6
00:00:13,230 --> 00:00:15,690
a transformer model in TensorFlow.

7
00:00:15,690 --> 00:00:18,600
So the question now is,
what can we do with a model

8
00:00:18,600 --> 00:00:20,070
after we train it?

9
00:00:20,070 --> 00:00:21,390
The obvious thing to try

10
00:00:21,390 --> 00:00:23,790
is to use it to get
predictions for new data,

11
00:00:23,790 --> 00:00:25,560
so let's see how to do that.

12
00:00:25,560 --> 00:00:28,320
Again, if you're familiar
with Keras, the good news

13
00:00:28,320 --> 00:00:31,860
is that because there are
just standard Keras models,

14
00:00:31,860 --> 00:00:34,770
we can use the standard
Keras predict method,

15
00:00:34,770 --> 00:00:35,883
as shown here.

16
00:00:36,990 --> 00:00:40,560
You simply pass in tokenized
text to this method,

17
00:00:40,560 --> 00:00:42,330
like you'd get from a tokenizer,

18
00:00:42,330 --> 00:00:44,280
and you get your results.

19
00:00:44,280 --> 00:00:46,740
Our models can output
several different things,

20
00:00:46,740 --> 00:00:48,510
depending on the options you set,

21
00:00:48,510 --> 00:00:50,310
but most of the time the thing you want

22
00:00:50,310 --> 00:00:52,290
is the output logits.

23
00:00:52,290 --> 00:00:54,900
If you haven't come
across them before logits,

24
00:00:54,900 --> 00:00:57,630
sometimes pronounced to
logits because no one's sure,

25
00:00:57,630 --> 00:01:00,390
are the outputs of the
last layer of the network

26
00:01:00,390 --> 00:01:03,150
because before a softmax has been applied.

27
00:01:03,150 --> 00:01:04,710
So if you want to turn the logits

28
00:01:04,710 --> 00:01:06,900
into the model's probability outputs,

29
00:01:06,900 --> 00:01:09,423
you just apply a softmax, like so.

30
00:01:10,981 --> 00:01:12,630
What if we want to turn
those probabilities

31
00:01:12,630 --> 00:01:14,370
into class predictions?

32
00:01:14,370 --> 00:01:16,410
Again, it's very straightforward.

33
00:01:16,410 --> 00:01:19,470
We just pick the biggest
probability for each output

34
00:01:19,470 --> 00:01:23,070
and you can get that immediately
with the argmax function.

35
00:01:23,070 --> 00:01:24,870
argmax will return the index

36
00:01:24,870 --> 00:01:27,120
of the largest probability in each row

37
00:01:27,120 --> 00:01:30,360
which means that we'll
get a vector of integers.

38
00:01:30,360 --> 00:01:34,950
So zero if the largest probability
was in the zero position,

39
00:01:34,950 --> 00:01:37,350
one in the first position, and so on.

40
00:01:37,350 --> 00:01:40,380
So these are our class
predictions indicating class zero,

41
00:01:40,380 --> 00:01:42,300
class one, and so on.

42
00:01:42,300 --> 00:01:45,090
In fact, if class
predictions are all you want,

43
00:01:45,090 --> 00:01:47,310
you can skip the softmax step entirely

44
00:01:47,310 --> 00:01:49,740
because the largest logit
will always be the largest

45
00:01:49,740 --> 00:01:51,303
probability as well.

46
00:01:52,500 --> 00:01:55,800
So if probabilities and class
predictions are all you want,

47
00:01:55,800 --> 00:01:58,350
then you've seen everything
you need at this point.

48
00:01:58,350 --> 00:02:00,630
But if you're interested
in benchmarking your model

49
00:02:00,630 --> 00:02:02,190
or using it for research,

50
00:02:02,190 --> 00:02:05,010
you might want to delve deeper
into the results you get.

51
00:02:05,010 --> 00:02:07,230
And one way to do that is
to compute some metrics

52
00:02:07,230 --> 00:02:09,060
for the model's predictions.

53
00:02:09,060 --> 00:02:10,920
If you're following
along with our datasets

54
00:02:10,920 --> 00:02:12,390
and fine tuning videos,

55
00:02:12,390 --> 00:02:14,850
we got our data from the MRPC dataset,

56
00:02:14,850 --> 00:02:17,130
which is part of the GLUE benchmark.

57
00:02:17,130 --> 00:02:19,050
Each of the GLUE datasets

58
00:02:19,050 --> 00:02:22,560
as well as many other datasets
in our dataset, Light Hub

59
00:02:22,560 --> 00:02:24,510
has some predefined metrics,

60
00:02:24,510 --> 00:02:26,940
and we can load them easily

61
00:02:26,940 --> 00:02:29,880
with the datasets load metric function.

62
00:02:29,880 --> 00:02:33,720
For the MRPC dataset, the
built-in metrics are accuracy

63
00:02:33,720 --> 00:02:35,790
which just measures the
percentage of the time

64
00:02:35,790 --> 00:02:37,830
the model's prediction was correct,

65
00:02:37,830 --> 00:02:39,780
and the F1 score,

66
00:02:39,780 --> 00:02:41,610
which is a slightly more complex measure

67
00:02:41,610 --> 00:02:43,920
that measures how well
the model trades off

68
00:02:43,920 --> 00:02:45,543
precision and recall.

69
00:02:46,470 --> 00:02:49,110
To compute those metrics
to benchmark our model,

70
00:02:49,110 --> 00:02:51,480
we just pass them the model's predictions,

71
00:02:51,480 --> 00:02:53,220
and to the ground truth labels,

72
00:02:53,220 --> 00:02:56,880
and we get our results in a
straightforward Python dict.

73
00:02:56,880 --> 00:02:58,740
If you're familiar with Keras though,

74
00:02:58,740 --> 00:03:00,870
you might notice that this
is a slightly weird way

75
00:03:00,870 --> 00:03:01,800
to compute metrics,

76
00:03:01,800 --> 00:03:02,970
because we're only computing metrics

77
00:03:02,970 --> 00:03:04,440
at the very end of training.

78
00:03:04,440 --> 00:03:06,480
But in Keras, you have
this built-in ability

79
00:03:06,480 --> 00:03:08,790
to compute a wide range of metrics

80
00:03:08,790 --> 00:03:10,470
on the fly while you're training,

81
00:03:10,470 --> 00:03:11,910
which gives you a very useful insight

82
00:03:11,910 --> 00:03:13,740
into how training is going.

83
00:03:13,740 --> 00:03:15,900
So if you want to use built-in metrics,

84
00:03:15,900 --> 00:03:17,280
it's very straightforward

85
00:03:17,280 --> 00:03:19,350
and you use the standard
Keras approach again.

86
00:03:19,350 --> 00:03:23,160
You just pass a metric
argument to the compile method.

87
00:03:23,160 --> 00:03:25,740
As with things like loss and optimizer,

88
00:03:25,740 --> 00:03:28,470
you can specify the
metrics you want by string

89
00:03:28,470 --> 00:03:30,810
or you can import the
actual metric objects

90
00:03:30,810 --> 00:03:33,240
and pass specific arguments to them.

91
00:03:33,240 --> 00:03:35,610
But note that unlike loss and accuracy,

92
00:03:35,610 --> 00:03:37,710
you have to supply metrics as a list

93
00:03:37,710 --> 00:03:39,760
even if there's only one metric you want.

94
00:03:40,770 --> 00:03:43,140
Once a model has been
compiled with a metric,

95
00:03:43,140 --> 00:03:45,360
it will report that metric for training,

96
00:03:45,360 --> 00:03:47,643
validation, and predictions.

97
00:03:48,480 --> 00:03:50,820
Assuming there are labels
passed to the predictions.

98
00:03:50,820 --> 00:03:53,400
You can even write your
own metric classes.

99
00:03:53,400 --> 00:03:55,920
Although this is a bit beyond
the scope of this course,

100
00:03:55,920 --> 00:03:58,200
I'll link to the relevant TF docs below

101
00:03:58,200 --> 00:03:59,580
because it can be very handy

102
00:03:59,580 --> 00:04:01,320
if you want a metric that isn't supported

103
00:04:01,320 --> 00:04:02,850
by default in Keras,

104
00:04:02,850 --> 00:04:04,473
such as the F1 score.

105
00:04:06,076 --> 00:04:08,743
(air whooshing)