subtitles/en/58_what-is-domain-adaptation.srt (144 lines of code) (raw):

1 00:00:00,000 --> 00:00:01,402 (air whooshing) 2 00:00:01,402 --> 00:00:02,720 (smiley snapping) 3 00:00:02,720 --> 00:00:05,910 (air whooshing) 4 00:00:05,910 --> 00:00:07,923 - What is domain adaptation? 5 00:00:09,540 --> 00:00:12,540 When fine-tuning a pre-trained model on a new dataset, 6 00:00:12,540 --> 00:00:15,480 the fine-tuned model we obtain will make predictions 7 00:00:15,480 --> 00:00:17,433 that are attuned to this new dataset. 8 00:00:18,840 --> 00:00:21,840 When the two models are trained with the same task, 9 00:00:21,840 --> 00:00:25,320 we can then compare their predictions on the same input. 10 00:00:25,320 --> 00:00:27,870 The predictions of the two models will be different 11 00:00:27,870 --> 00:00:29,790 in a way that reflects the differences 12 00:00:29,790 --> 00:00:31,680 between the two datasets, 13 00:00:31,680 --> 00:00:34,053 a phenomenon we call domain adaptation. 14 00:00:35,310 --> 00:00:38,640 Let's look at an example with masked language modeling 15 00:00:38,640 --> 00:00:41,910 by comparing the outputs of the pre-trained DistilBERT model 16 00:00:41,910 --> 00:00:43,080 with the version fine-tuned 17 00:00:43,080 --> 00:00:45,273 in chapter 7 of the course, linked below. 18 00:00:46,500 --> 00:00:49,140 The pre-trained model makes generic predictions, 19 00:00:49,140 --> 00:00:50,580 whereas the fine-tuned model 20 00:00:50,580 --> 00:00:53,253 has its first two predictions linked to cinema. 21 00:00:54,390 --> 00:00:57,210 Since it was fine-tuned on a movie reviews dataset, 22 00:00:57,210 --> 00:00:58,680 it's perfectly normal to see 23 00:00:58,680 --> 00:01:01,440 it adapted its suggestions like this. 24 00:01:01,440 --> 00:01:03,090 Notice how it keeps the same prediction 25 00:01:03,090 --> 00:01:05,220 as the pre-trained model afterward. 26 00:01:05,220 --> 00:01:08,100 Even if the fine-tuned model adapts to the new dataset, 27 00:01:08,100 --> 00:01:10,450 it's not forgetting what it was pre-trained on. 28 00:01:11,490 --> 00:01:14,220 This is another example on a translation task. 29 00:01:14,220 --> 00:01:17,310 On top, we use a pre-trained French/English model, 30 00:01:17,310 --> 00:01:21,330 and at the bottom, the version we fine-tuned in chapter 7. 31 00:01:21,330 --> 00:01:23,610 The top model is pre-trained on lots of texts, 32 00:01:23,610 --> 00:01:25,170 and leaves technical English terms, 33 00:01:25,170 --> 00:01:28,350 like plugin and email, unchanged in the translation. 34 00:01:28,350 --> 00:01:31,350 Both are perfectly understood by French people. 35 00:01:31,350 --> 00:01:33,780 The dataset picked for the fine-tuning is a dataset 36 00:01:33,780 --> 00:01:36,660 of technical texts where special attention was picked 37 00:01:36,660 --> 00:01:39,150 on translating everything in French. 38 00:01:39,150 --> 00:01:42,090 As a result, the fine-tuned model picked that habit 39 00:01:42,090 --> 00:01:44,193 and translated both plugin and email. 40 00:01:45,942 --> 00:01:49,181 (air whooshing) 41 00:01:49,181 --> 00:01:50,592 (air whooshing)