1
00:00:00,315 --> 00:00:02,982
(air whooshing)

2
00:00:05,940 --> 00:00:08,913
- The post-processing step
in a question answering task.

3
00:00:10,440 --> 00:00:12,180
When doing question answering,

4
00:00:12,180 --> 00:00:14,550
the processing of the initial dataset

5
00:00:14,550 --> 00:00:17,370
implies splitting examples
in several features,

6
00:00:17,370 --> 00:00:19,773
which may or may not contain the answer.

7
00:00:21,000 --> 00:00:22,740
Passing those features through the model

8
00:00:22,740 --> 00:00:25,830
will give us logits for the
start and end positions,

9
00:00:25,830 --> 00:00:28,650
since our labels are
the indices of the token

10
00:00:28,650 --> 00:00:31,050
that correspond to the
start and end the answer.

11
00:00:32,664 --> 00:00:35,490
We must then somehow convert
those logits into an answer,

12
00:00:35,490 --> 00:00:38,610
and then pick one of the various
answers each feature gives

13
00:00:38,610 --> 00:00:40,893
to be the answer for a given example.

14
00:00:42,300 --> 00:00:43,500
For the processing step,

15
00:00:43,500 --> 00:00:45,750
you should refer to
the video linked below.

16
00:00:45,750 --> 00:00:47,820
It's not very different for validation,

17
00:00:47,820 --> 00:00:50,820
we just need to add a few lines
to keep track of two things.

18
00:00:51,660 --> 00:00:54,960
Instead of discarding the
offset mappings, we keep them,

19
00:00:54,960 --> 00:00:55,793
and also include in them

20
00:00:55,793 --> 00:00:58,350
the information of where the context is

21
00:00:58,350 --> 00:01:00,690
by setting the offsets
of the special tokens

22
00:01:00,690 --> 00:01:02,253
and the question to None.

23
00:01:03,480 --> 00:01:06,630
Then we also keep track of the
example ID for each feature,

24
00:01:06,630 --> 00:01:08,280
to be able to map back feature

25
00:01:08,280 --> 00:01:10,503
to the examples that they originated from.

26
00:01:11,940 --> 00:01:14,100
If you don't want to
compute the validation loss,

27
00:01:14,100 --> 00:01:15,990
you won't need to include
all the special code

28
00:01:15,990 --> 00:01:18,420
that we used to create the labels.

29
00:01:18,420 --> 00:01:21,090
With this done, we can apply
that preprocessing function

30
00:01:21,090 --> 00:01:22,890
using the map method.

31
00:01:22,890 --> 00:01:24,090
We take the SQUAD dataset

32
00:01:24,090 --> 00:01:26,840
like in the preprocessing
for question-answering video.

33
00:01:27,810 --> 00:01:30,540
Once this is done, the next
step is to create our model.

34
00:01:30,540 --> 00:01:31,710
We use the default model

35
00:01:31,710 --> 00:01:33,930
behind the question-answering
pipeline here,

36
00:01:33,930 --> 00:01:36,960
but you should use any
model you want to evaluate.

37
00:01:36,960 --> 00:01:38,850
We'll run a manual evaluation loop,

38
00:01:38,850 --> 00:01:41,583
so we create a PyTorch
DataLoader with our features.

39
00:01:42,657 --> 00:01:44,520
With it, we can compute and gather

40
00:01:44,520 --> 00:01:46,650
all the start and end logits like this,

41
00:01:46,650 --> 00:01:49,653
with a standard PyTorch evaluation loop.

42
00:01:49,653 --> 00:01:53,220
With this done, we can really
dive into the post-processing.

43
00:01:53,220 --> 00:01:56,340
First, we'll need a map
from example to features,

44
00:01:56,340 --> 00:01:57,873
which we can create like this.

45
00:01:58,800 --> 00:02:00,810
Now, for the main part
of the post-processing,

46
00:02:00,810 --> 00:02:04,230
let's see how to extract
an answer from the logits.

47
00:02:04,230 --> 00:02:05,760
We could just take the best index

48
00:02:05,760 --> 00:02:07,980
for the start and end logits and be done,

49
00:02:07,980 --> 00:02:10,380
but if our model predicts
something impossible,

50
00:02:10,380 --> 00:02:12,150
like tokens in the question,

51
00:02:12,150 --> 00:02:13,940
we'll look at more of the logits.

52
00:02:15,270 --> 00:02:17,070
Note that in the
question-answering pipeline,

53
00:02:17,070 --> 00:02:18,870
we attributed score to each answer

54
00:02:18,870 --> 00:02:20,430
based on the probabilities,

55
00:02:20,430 --> 00:02:22,350
which we did not compute here.

56
00:02:22,350 --> 00:02:25,560
In terms of logits, the
multiplication we had in the scores

57
00:02:25,560 --> 00:02:26,853
becomes an addition.

58
00:02:28,110 --> 00:02:29,010
To go fast,

59
00:02:29,010 --> 00:02:31,800
we don't look at all possible
start and end logits,

60
00:02:31,800 --> 00:02:34,050
but the 20 best one is enough.

61
00:02:34,050 --> 00:02:36,570
We ignore the logits that
spawn impossible answers

62
00:02:36,570 --> 00:02:38,550
or answer that are too long.

63
00:02:38,550 --> 00:02:41,430
As we saw in the
preprocessing, the labels 0,0

64
00:02:41,430 --> 00:02:43,230
correspond to a no answer.

65
00:02:43,230 --> 00:02:45,090
Otherwise we use the offsets

66
00:02:45,090 --> 00:02:46,940
to get the answer inside the context.

67
00:02:47,910 --> 00:02:49,107
Let's have a look at the predicted answer

68
00:02:49,107 --> 00:02:50,370
for the first feature,

69
00:02:50,370 --> 00:02:51,930
which is the answer with the best score

70
00:02:51,930 --> 00:02:53,640
or the best logit score

71
00:02:53,640 --> 00:02:56,280
since the SoftMax is
an increasing function.

72
00:02:56,280 --> 00:02:58,230
The model got it right.

73
00:02:58,230 --> 00:03:00,690
Next we just have to loop
this for every example,

74
00:03:00,690 --> 00:03:03,720
picking for each the answer
with the best logit score

75
00:03:03,720 --> 00:03:06,750
in all the features the example generated.

76
00:03:06,750 --> 00:03:09,700
Now you know how to get answers
from your model prediction.

77
00:03:11,007 --> 00:03:13,674
(air whooshing)