subtitles/en/28_tensorflow-predictions-and-metrics.srt (356 lines of code) (raw):
1
00:00:00,269 --> 00:00:02,936
(air whooshing)
2
00:00:05,700 --> 00:00:07,110
- In our other videos,
3
00:00:07,110 --> 00:00:09,000
and as always, there'll be links below
4
00:00:09,000 --> 00:00:10,740
if you want to check those out,
5
00:00:10,740 --> 00:00:13,230
we showed you how to
initialize and fine-tune
6
00:00:13,230 --> 00:00:15,690
a transformer model in TensorFlow.
7
00:00:15,690 --> 00:00:18,600
So the question now is,
what can we do with a model
8
00:00:18,600 --> 00:00:20,070
after we train it?
9
00:00:20,070 --> 00:00:21,390
The obvious thing to try
10
00:00:21,390 --> 00:00:23,790
is to use it to get
predictions for new data,
11
00:00:23,790 --> 00:00:25,560
so let's see how to do that.
12
00:00:25,560 --> 00:00:28,320
Again, if you're familiar
with Keras, the good news
13
00:00:28,320 --> 00:00:31,860
is that because there are
just standard Keras models,
14
00:00:31,860 --> 00:00:34,770
we can use the standard
Keras predict method,
15
00:00:34,770 --> 00:00:35,883
as shown here.
16
00:00:36,990 --> 00:00:40,560
You simply pass in tokenized
text to this method,
17
00:00:40,560 --> 00:00:42,330
like you'd get from a tokenizer,
18
00:00:42,330 --> 00:00:44,280
and you get your results.
19
00:00:44,280 --> 00:00:46,740
Our models can output
several different things,
20
00:00:46,740 --> 00:00:48,510
depending on the options you set,
21
00:00:48,510 --> 00:00:50,310
but most of the time the thing you want
22
00:00:50,310 --> 00:00:52,290
is the output logits.
23
00:00:52,290 --> 00:00:54,900
If you haven't come
across them before logits,
24
00:00:54,900 --> 00:00:57,630
sometimes pronounced to
logits because no one's sure,
25
00:00:57,630 --> 00:01:00,390
are the outputs of the
last layer of the network
26
00:01:00,390 --> 00:01:03,150
because before a softmax has been applied.
27
00:01:03,150 --> 00:01:04,710
So if you want to turn the logits
28
00:01:04,710 --> 00:01:06,900
into the model's probability outputs,
29
00:01:06,900 --> 00:01:09,423
you just apply a softmax, like so.
30
00:01:10,981 --> 00:01:12,630
What if we want to turn
those probabilities
31
00:01:12,630 --> 00:01:14,370
into class predictions?
32
00:01:14,370 --> 00:01:16,410
Again, it's very straightforward.
33
00:01:16,410 --> 00:01:19,470
We just pick the biggest
probability for each output
34
00:01:19,470 --> 00:01:23,070
and you can get that immediately
with the argmax function.
35
00:01:23,070 --> 00:01:24,870
argmax will return the index
36
00:01:24,870 --> 00:01:27,120
of the largest probability in each row
37
00:01:27,120 --> 00:01:30,360
which means that we'll
get a vector of integers.
38
00:01:30,360 --> 00:01:34,950
So zero if the largest probability
was in the zero position,
39
00:01:34,950 --> 00:01:37,350
one in the first position, and so on.
40
00:01:37,350 --> 00:01:40,380
So these are our class
predictions indicating class zero,
41
00:01:40,380 --> 00:01:42,300
class one, and so on.
42
00:01:42,300 --> 00:01:45,090
In fact, if class
predictions are all you want,
43
00:01:45,090 --> 00:01:47,310
you can skip the softmax step entirely
44
00:01:47,310 --> 00:01:49,740
because the largest logit
will always be the largest
45
00:01:49,740 --> 00:01:51,303
probability as well.
46
00:01:52,500 --> 00:01:55,800
So if probabilities and class
predictions are all you want,
47
00:01:55,800 --> 00:01:58,350
then you've seen everything
you need at this point.
48
00:01:58,350 --> 00:02:00,630
But if you're interested
in benchmarking your model
49
00:02:00,630 --> 00:02:02,190
or using it for research,
50
00:02:02,190 --> 00:02:05,010
you might want to delve deeper
into the results you get.
51
00:02:05,010 --> 00:02:07,230
And one way to do that is
to compute some metrics
52
00:02:07,230 --> 00:02:09,060
for the model's predictions.
53
00:02:09,060 --> 00:02:10,920
If you're following
along with our datasets
54
00:02:10,920 --> 00:02:12,390
and fine tuning videos,
55
00:02:12,390 --> 00:02:14,850
we got our data from the MRPC dataset,
56
00:02:14,850 --> 00:02:17,130
which is part of the GLUE benchmark.
57
00:02:17,130 --> 00:02:19,050
Each of the GLUE datasets
58
00:02:19,050 --> 00:02:22,560
as well as many other datasets
in our dataset, Light Hub
59
00:02:22,560 --> 00:02:24,510
has some predefined metrics,
60
00:02:24,510 --> 00:02:26,940
and we can load them easily
61
00:02:26,940 --> 00:02:29,880
with the datasets load metric function.
62
00:02:29,880 --> 00:02:33,720
For the MRPC dataset, the
built-in metrics are accuracy
63
00:02:33,720 --> 00:02:35,790
which just measures the
percentage of the time
64
00:02:35,790 --> 00:02:37,830
the model's prediction was correct,
65
00:02:37,830 --> 00:02:39,780
and the F1 score,
66
00:02:39,780 --> 00:02:41,610
which is a slightly more complex measure
67
00:02:41,610 --> 00:02:43,920
that measures how well
the model trades off
68
00:02:43,920 --> 00:02:45,543
precision and recall.
69
00:02:46,470 --> 00:02:49,110
To compute those metrics
to benchmark our model,
70
00:02:49,110 --> 00:02:51,480
we just pass them the model's predictions,
71
00:02:51,480 --> 00:02:53,220
and to the ground truth labels,
72
00:02:53,220 --> 00:02:56,880
and we get our results in a
straightforward Python dict.
73
00:02:56,880 --> 00:02:58,740
If you're familiar with Keras though,
74
00:02:58,740 --> 00:03:00,870
you might notice that this
is a slightly weird way
75
00:03:00,870 --> 00:03:01,800
to compute metrics,
76
00:03:01,800 --> 00:03:02,970
because we're only computing metrics
77
00:03:02,970 --> 00:03:04,440
at the very end of training.
78
00:03:04,440 --> 00:03:06,480
But in Keras, you have
this built-in ability
79
00:03:06,480 --> 00:03:08,790
to compute a wide range of metrics
80
00:03:08,790 --> 00:03:10,470
on the fly while you're training,
81
00:03:10,470 --> 00:03:11,910
which gives you a very useful insight
82
00:03:11,910 --> 00:03:13,740
into how training is going.
83
00:03:13,740 --> 00:03:15,900
So if you want to use built-in metrics,
84
00:03:15,900 --> 00:03:17,280
it's very straightforward
85
00:03:17,280 --> 00:03:19,350
and you use the standard
Keras approach again.
86
00:03:19,350 --> 00:03:23,160
You just pass a metric
argument to the compile method.
87
00:03:23,160 --> 00:03:25,740
As with things like loss and optimizer,
88
00:03:25,740 --> 00:03:28,470
you can specify the
metrics you want by string
89
00:03:28,470 --> 00:03:30,810
or you can import the
actual metric objects
90
00:03:30,810 --> 00:03:33,240
and pass specific arguments to them.
91
00:03:33,240 --> 00:03:35,610
But note that unlike loss and accuracy,
92
00:03:35,610 --> 00:03:37,710
you have to supply metrics as a list
93
00:03:37,710 --> 00:03:39,760
even if there's only one metric you want.
94
00:03:40,770 --> 00:03:43,140
Once a model has been
compiled with a metric,
95
00:03:43,140 --> 00:03:45,360
it will report that metric for training,
96
00:03:45,360 --> 00:03:47,643
validation, and predictions.
97
00:03:48,480 --> 00:03:50,820
Assuming there are labels
passed to the predictions.
98
00:03:50,820 --> 00:03:53,400
You can even write your
own metric classes.
99
00:03:53,400 --> 00:03:55,920
Although this is a bit beyond
the scope of this course,
100
00:03:55,920 --> 00:03:58,200
I'll link to the relevant TF docs below
101
00:03:58,200 --> 00:03:59,580
because it can be very handy
102
00:03:59,580 --> 00:04:01,320
if you want a metric that isn't supported
103
00:04:01,320 --> 00:04:02,850
by default in Keras,
104
00:04:02,850 --> 00:04:04,473
such as the F1 score.
105
00:04:06,076 --> 00:04:08,743
(air whooshing)