subtitles/en/66_the-post-processing-step-in-question-answering-(pytorch).srt (265 lines of code) (raw):
1
00:00:00,315 --> 00:00:02,982
(air whooshing)
2
00:00:05,940 --> 00:00:08,913
- The post-processing step
in a question answering task.
3
00:00:10,440 --> 00:00:12,180
When doing question answering,
4
00:00:12,180 --> 00:00:14,550
the processing of the initial dataset
5
00:00:14,550 --> 00:00:17,370
implies splitting examples
in several features,
6
00:00:17,370 --> 00:00:19,773
which may or may not contain the answer.
7
00:00:21,000 --> 00:00:22,740
Passing those features through the model
8
00:00:22,740 --> 00:00:25,830
will give us logits for the
start and end positions,
9
00:00:25,830 --> 00:00:28,650
since our labels are
the indices of the token
10
00:00:28,650 --> 00:00:31,050
that correspond to the
start and end the answer.
11
00:00:32,664 --> 00:00:35,490
We must then somehow convert
those logits into an answer,
12
00:00:35,490 --> 00:00:38,610
and then pick one of the various
answers each feature gives
13
00:00:38,610 --> 00:00:40,893
to be the answer for a given example.
14
00:00:42,300 --> 00:00:43,500
For the processing step,
15
00:00:43,500 --> 00:00:45,750
you should refer to
the video linked below.
16
00:00:45,750 --> 00:00:47,820
It's not very different for validation,
17
00:00:47,820 --> 00:00:50,820
we just need to add a few lines
to keep track of two things.
18
00:00:51,660 --> 00:00:54,960
Instead of discarding the
offset mappings, we keep them,
19
00:00:54,960 --> 00:00:55,793
and also include in them
20
00:00:55,793 --> 00:00:58,350
the information of where the context is
21
00:00:58,350 --> 00:01:00,690
by setting the offsets
of the special tokens
22
00:01:00,690 --> 00:01:02,253
and the question to None.
23
00:01:03,480 --> 00:01:06,630
Then we also keep track of the
example ID for each feature,
24
00:01:06,630 --> 00:01:08,280
to be able to map back feature
25
00:01:08,280 --> 00:01:10,503
to the examples that they originated from.
26
00:01:11,940 --> 00:01:14,100
If you don't want to
compute the validation loss,
27
00:01:14,100 --> 00:01:15,990
you won't need to include
all the special code
28
00:01:15,990 --> 00:01:18,420
that we used to create the labels.
29
00:01:18,420 --> 00:01:21,090
With this done, we can apply
that preprocessing function
30
00:01:21,090 --> 00:01:22,890
using the map method.
31
00:01:22,890 --> 00:01:24,090
We take the SQUAD dataset
32
00:01:24,090 --> 00:01:26,840
like in the preprocessing
for question-answering video.
33
00:01:27,810 --> 00:01:30,540
Once this is done, the next
step is to create our model.
34
00:01:30,540 --> 00:01:31,710
We use the default model
35
00:01:31,710 --> 00:01:33,930
behind the question-answering
pipeline here,
36
00:01:33,930 --> 00:01:36,960
but you should use any
model you want to evaluate.
37
00:01:36,960 --> 00:01:38,850
We'll run a manual evaluation loop,
38
00:01:38,850 --> 00:01:41,583
so we create a PyTorch
DataLoader with our features.
39
00:01:42,657 --> 00:01:44,520
With it, we can compute and gather
40
00:01:44,520 --> 00:01:46,650
all the start and end logits like this,
41
00:01:46,650 --> 00:01:49,653
with a standard PyTorch evaluation loop.
42
00:01:49,653 --> 00:01:53,220
With this done, we can really
dive into the post-processing.
43
00:01:53,220 --> 00:01:56,340
First, we'll need a map
from example to features,
44
00:01:56,340 --> 00:01:57,873
which we can create like this.
45
00:01:58,800 --> 00:02:00,810
Now, for the main part
of the post-processing,
46
00:02:00,810 --> 00:02:04,230
let's see how to extract
an answer from the logits.
47
00:02:04,230 --> 00:02:05,760
We could just take the best index
48
00:02:05,760 --> 00:02:07,980
for the start and end logits and be done,
49
00:02:07,980 --> 00:02:10,380
but if our model predicts
something impossible,
50
00:02:10,380 --> 00:02:12,150
like tokens in the question,
51
00:02:12,150 --> 00:02:13,940
we'll look at more of the logits.
52
00:02:15,270 --> 00:02:17,070
Note that in the
question-answering pipeline,
53
00:02:17,070 --> 00:02:18,870
we attributed score to each answer
54
00:02:18,870 --> 00:02:20,430
based on the probabilities,
55
00:02:20,430 --> 00:02:22,350
which we did not compute here.
56
00:02:22,350 --> 00:02:25,560
In terms of logits, the
multiplication we had in the scores
57
00:02:25,560 --> 00:02:26,853
becomes an addition.
58
00:02:28,110 --> 00:02:29,010
To go fast,
59
00:02:29,010 --> 00:02:31,800
we don't look at all possible
start and end logits,
60
00:02:31,800 --> 00:02:34,050
but the 20 best one is enough.
61
00:02:34,050 --> 00:02:36,570
We ignore the logits that
spawn impossible answers
62
00:02:36,570 --> 00:02:38,550
or answer that are too long.
63
00:02:38,550 --> 00:02:41,430
As we saw in the
preprocessing, the labels 0,0
64
00:02:41,430 --> 00:02:43,230
correspond to a no answer.
65
00:02:43,230 --> 00:02:45,090
Otherwise we use the offsets
66
00:02:45,090 --> 00:02:46,940
to get the answer inside the context.
67
00:02:47,910 --> 00:02:49,107
Let's have a look at the predicted answer
68
00:02:49,107 --> 00:02:50,370
for the first feature,
69
00:02:50,370 --> 00:02:51,930
which is the answer with the best score
70
00:02:51,930 --> 00:02:53,640
or the best logit score
71
00:02:53,640 --> 00:02:56,280
since the SoftMax is
an increasing function.
72
00:02:56,280 --> 00:02:58,230
The model got it right.
73
00:02:58,230 --> 00:03:00,690
Next we just have to loop
this for every example,
74
00:03:00,690 --> 00:03:03,720
picking for each the answer
with the best logit score
75
00:03:03,720 --> 00:03:06,750
in all the features the example generated.
76
00:03:06,750 --> 00:03:09,700
Now you know how to get answers
from your model prediction.
77
00:03:11,007 --> 00:03:13,674
(air whooshing)