1 00:00:00,315 --> 00:00:02,982 (air whooshing) 2 00:00:05,940 --> 00:00:08,913 - The post-processing step in a question answering task. 3 00:00:10,440 --> 00:00:12,180 When doing question answering, 4 00:00:12,180 --> 00:00:14,550 the processing of the initial dataset 5 00:00:14,550 --> 00:00:17,370 implies splitting examples in several features, 6 00:00:17,370 --> 00:00:19,773 which may or may not contain the answer. 7 00:00:21,000 --> 00:00:22,740 Passing those features through the model 8 00:00:22,740 --> 00:00:25,830 will give us logits for the start and end positions, 9 00:00:25,830 --> 00:00:28,650 since our labels are the indices of the token 10 00:00:28,650 --> 00:00:31,050 that correspond to the start and end the answer. 11 00:00:32,664 --> 00:00:35,490 We must then somehow convert those logits into an answer, 12 00:00:35,490 --> 00:00:38,610 and then pick one of the various answers each feature gives 13 00:00:38,610 --> 00:00:40,893 to be the answer for a given example. 14 00:00:42,300 --> 00:00:43,500 For the processing step, 15 00:00:43,500 --> 00:00:45,750 you should refer to the video linked below. 16 00:00:45,750 --> 00:00:47,820 It's not very different for validation, 17 00:00:47,820 --> 00:00:50,820 we just need to add a few lines to keep track of two things. 18 00:00:51,660 --> 00:00:54,960 Instead of discarding the offset mappings, we keep them, 19 00:00:54,960 --> 00:00:55,793 and also include in them 20 00:00:55,793 --> 00:00:58,350 the information of where the context is 21 00:00:58,350 --> 00:01:00,690 by setting the offsets of the special tokens 22 00:01:00,690 --> 00:01:02,253 and the question to None. 23 00:01:03,480 --> 00:01:06,630 Then we also keep track of the example ID for each feature, 24 00:01:06,630 --> 00:01:08,280 to be able to map back feature 25 00:01:08,280 --> 00:01:10,503 to the examples that they originated from. 26 00:01:11,940 --> 00:01:14,100 If you don't want to compute the validation loss, 27 00:01:14,100 --> 00:01:15,990 you won't need to include all the special code 28 00:01:15,990 --> 00:01:18,420 that we used to create the labels. 29 00:01:18,420 --> 00:01:21,090 With this done, we can apply that preprocessing function 30 00:01:21,090 --> 00:01:22,890 using the map method. 31 00:01:22,890 --> 00:01:24,090 We take the SQUAD dataset 32 00:01:24,090 --> 00:01:26,840 like in the preprocessing for question-answering video. 33 00:01:27,810 --> 00:01:30,540 Once this is done, the next step is to create our model. 34 00:01:30,540 --> 00:01:31,710 We use the default model 35 00:01:31,710 --> 00:01:33,930 behind the question-answering pipeline here, 36 00:01:33,930 --> 00:01:36,960 but you should use any model you want to evaluate. 37 00:01:36,960 --> 00:01:38,850 We'll run a manual evaluation loop, 38 00:01:38,850 --> 00:01:41,583 so we create a PyTorch DataLoader with our features. 39 00:01:42,657 --> 00:01:44,520 With it, we can compute and gather 40 00:01:44,520 --> 00:01:46,650 all the start and end logits like this, 41 00:01:46,650 --> 00:01:49,653 with a standard PyTorch evaluation loop. 42 00:01:49,653 --> 00:01:53,220 With this done, we can really dive into the post-processing. 43 00:01:53,220 --> 00:01:56,340 First, we'll need a map from example to features, 44 00:01:56,340 --> 00:01:57,873 which we can create like this. 45 00:01:58,800 --> 00:02:00,810 Now, for the main part of the post-processing, 46 00:02:00,810 --> 00:02:04,230 let's see how to extract an answer from the logits. 47 00:02:04,230 --> 00:02:05,760 We could just take the best index 48 00:02:05,760 --> 00:02:07,980 for the start and end logits and be done, 49 00:02:07,980 --> 00:02:10,380 but if our model predicts something impossible, 50 00:02:10,380 --> 00:02:12,150 like tokens in the question, 51 00:02:12,150 --> 00:02:13,940 we'll look at more of the logits. 52 00:02:15,270 --> 00:02:17,070 Note that in the question-answering pipeline, 53 00:02:17,070 --> 00:02:18,870 we attributed score to each answer 54 00:02:18,870 --> 00:02:20,430 based on the probabilities, 55 00:02:20,430 --> 00:02:22,350 which we did not compute here. 56 00:02:22,350 --> 00:02:25,560 In terms of logits, the multiplication we had in the scores 57 00:02:25,560 --> 00:02:26,853 becomes an addition. 58 00:02:28,110 --> 00:02:29,010 To go fast, 59 00:02:29,010 --> 00:02:31,800 we don't look at all possible start and end logits, 60 00:02:31,800 --> 00:02:34,050 but the 20 best one is enough. 61 00:02:34,050 --> 00:02:36,570 We ignore the logits that spawn impossible answers 62 00:02:36,570 --> 00:02:38,550 or answer that are too long. 63 00:02:38,550 --> 00:02:41,430 As we saw in the preprocessing, the labels 0,0 64 00:02:41,430 --> 00:02:43,230 correspond to a no answer. 65 00:02:43,230 --> 00:02:45,090 Otherwise we use the offsets 66 00:02:45,090 --> 00:02:46,940 to get the answer inside the context. 67 00:02:47,910 --> 00:02:49,107 Let's have a look at the predicted answer 68 00:02:49,107 --> 00:02:50,370 for the first feature, 69 00:02:50,370 --> 00:02:51,930 which is the answer with the best score 70 00:02:51,930 --> 00:02:53,640 or the best logit score 71 00:02:53,640 --> 00:02:56,280 since the SoftMax is an increasing function. 72 00:02:56,280 --> 00:02:58,230 The model got it right. 73 00:02:58,230 --> 00:03:00,690 Next we just have to loop this for every example, 74 00:03:00,690 --> 00:03:03,720 picking for each the answer with the best logit score 75 00:03:03,720 --> 00:03:06,750 in all the features the example generated. 76 00:03:06,750 --> 00:03:09,700 Now you know how to get answers from your model prediction. 77 00:03:11,007 --> 00:03:13,674 (air whooshing)