subtitles/en/67_the-post-processing-step-in-question-answering-(tensorflow).srt (255 lines of code) (raw):

1 00:00:00,367 --> 00:00:02,950 (subtle blast) 2 00:00:05,850 --> 00:00:08,913 - The post-processing step in a question-answering task. 3 00:00:10,830 --> 00:00:11,790 When doing question answering, 4 00:00:11,790 --> 00:00:14,670 the processing of the initial dataset 5 00:00:14,670 --> 00:00:18,090 implies splitting examples in several features, 6 00:00:18,090 --> 00:00:20,850 which may or may not contain the answer. 7 00:00:20,850 --> 00:00:22,530 Passing those features through the model 8 00:00:22,530 --> 00:00:25,860 will give us logits for the start and end positions, 9 00:00:25,860 --> 00:00:28,620 since our labels are the indices of the tokens 10 00:00:28,620 --> 00:00:31,020 that correspond to the start and end the answer. 11 00:00:31,860 --> 00:00:34,740 We must then somehow convert those logits into an answer, 12 00:00:34,740 --> 00:00:38,070 and then pick one of the various answers each feature gives 13 00:00:38,070 --> 00:00:40,473 to be the answer for a given example. 14 00:00:41,683 --> 00:00:43,200 For the processing step, 15 00:00:43,200 --> 00:00:45,450 you should refer to the video linked below. 16 00:00:45,450 --> 00:00:47,310 It's not very different for validation, 17 00:00:47,310 --> 00:00:50,053 we just need to add a few lines to keep track of two things: 18 00:00:50,053 --> 00:00:52,620 instead of discarding the offset mappings, 19 00:00:52,620 --> 00:00:55,380 we keep them, and also include in them the information 20 00:00:55,380 --> 00:00:58,410 of where the context is by setting the offsets 21 00:00:58,410 --> 00:01:01,821 of the special tokens and the question to None. 22 00:01:01,821 --> 00:01:05,370 Then we also keep track of the example ID for each feature, 23 00:01:05,370 --> 00:01:07,020 to be able to map back feature 24 00:01:07,020 --> 00:01:09,243 to the examples that they originated from. 25 00:01:10,470 --> 00:01:12,660 If you don't want to compute the validation loss, 26 00:01:12,660 --> 00:01:14,610 you won't need to include all the special code 27 00:01:14,610 --> 00:01:17,010 that we used to create the labels. 28 00:01:17,010 --> 00:01:19,650 With this done, we can apply that preprocessing function 29 00:01:19,650 --> 00:01:21,480 using the map method. 30 00:01:21,480 --> 00:01:23,610 We take the SQUAD dataset like in the preprocessing 31 00:01:23,610 --> 00:01:25,060 for question-answering video. 32 00:01:26,400 --> 00:01:29,310 Once this is done, the next step is to create our model. 33 00:01:29,310 --> 00:01:30,570 We use the default model behind 34 00:01:30,570 --> 00:01:32,640 the question-answering pipeline here, 35 00:01:32,640 --> 00:01:35,880 but you should use any model you want to evaluate. 36 00:01:35,880 --> 00:01:37,680 With the to_tf_dataset method, 37 00:01:37,680 --> 00:01:41,370 we can just sent our processed dataset to model.predict, 38 00:01:41,370 --> 00:01:43,350 and we directly get our start and end logits 39 00:01:43,350 --> 00:01:45,930 for the whole dataset as NumPy arrays. 40 00:01:45,930 --> 00:01:49,230 With this done, we can really dive into the post-processing. 41 00:01:49,230 --> 00:01:52,380 First, we'll need a map from example to features, 42 00:01:52,380 --> 00:01:53,883 which we can create like this. 43 00:01:54,780 --> 00:01:56,700 Now, for the main part of the post-processing, 44 00:01:56,700 --> 00:02:00,270 let's see how to extract an answer from the logits. 45 00:02:00,270 --> 00:02:01,650 We could just take the best index 46 00:02:01,650 --> 00:02:03,690 for the start and end logits and be done, 47 00:02:03,690 --> 00:02:06,180 but if our model predicts something impossible, 48 00:02:06,180 --> 00:02:07,920 like tokens in the questions, 49 00:02:07,920 --> 00:02:09,670 we will look at more of the logits. 50 00:02:10,800 --> 00:02:12,570 Note that in the question-answering pipeline, 51 00:02:12,570 --> 00:02:14,160 we attributed the score to each answer 52 00:02:14,160 --> 00:02:17,880 based on the probabilities, which we did not compute here. 53 00:02:17,880 --> 00:02:19,860 In terms of logits, the multiplication we had 54 00:02:19,860 --> 00:02:21,663 in the scores becomes an addition. 55 00:02:22,650 --> 00:02:23,910 To go fast, we don't look 56 00:02:23,910 --> 00:02:25,343 at all possible start and end logits, 57 00:02:25,343 --> 00:02:26,973 but the 20 best ones. 58 00:02:27,810 --> 00:02:30,386 We ignore the logits that spawn impossible answers 59 00:02:30,386 --> 00:02:32,370 or answer that are too long. 60 00:02:32,370 --> 00:02:33,720 As we saw in the preprocessing, 61 00:02:33,720 --> 00:02:36,240 the label "0, 0" correspond to no answer, 62 00:02:36,240 --> 00:02:37,440 otherwise we use the offset 63 00:02:37,440 --> 00:02:39,290 to get the answer inside the context. 64 00:02:40,260 --> 00:02:41,580 Let's have a look at the predicted answer 65 00:02:41,580 --> 00:02:43,200 for the first feature, 66 00:02:43,200 --> 00:02:44,790 which is the answer with the best score, 67 00:02:44,790 --> 00:02:46,860 or the best logit score since the SoftMax 68 00:02:46,860 --> 00:02:48,810 is an increasing function. 69 00:02:48,810 --> 00:02:49,960 The model got it right. 70 00:02:51,210 --> 00:02:54,180 Next, we just have to loop this for every example, 71 00:02:54,180 --> 00:02:56,700 picking for each the answer with the best logit score 72 00:02:56,700 --> 00:02:59,133 in all the features the example generated. 73 00:03:00,030 --> 00:03:03,030 Now you know how to get answers from your model predictions. 74 00:03:04,214 --> 00:03:06,797 (subtle blast)