subtitles/en/58_what-is-domain-adaptation.srt (144 lines of code) (raw):
1
00:00:00,000 --> 00:00:01,402
(air whooshing)
2
00:00:01,402 --> 00:00:02,720
(smiley snapping)
3
00:00:02,720 --> 00:00:05,910
(air whooshing)
4
00:00:05,910 --> 00:00:07,923
- What is domain adaptation?
5
00:00:09,540 --> 00:00:12,540
When fine-tuning a pre-trained
model on a new dataset,
6
00:00:12,540 --> 00:00:15,480
the fine-tuned model we
obtain will make predictions
7
00:00:15,480 --> 00:00:17,433
that are attuned to this new dataset.
8
00:00:18,840 --> 00:00:21,840
When the two models are
trained with the same task,
9
00:00:21,840 --> 00:00:25,320
we can then compare their
predictions on the same input.
10
00:00:25,320 --> 00:00:27,870
The predictions of the two
models will be different
11
00:00:27,870 --> 00:00:29,790
in a way that reflects the differences
12
00:00:29,790 --> 00:00:31,680
between the two datasets,
13
00:00:31,680 --> 00:00:34,053
a phenomenon we call domain adaptation.
14
00:00:35,310 --> 00:00:38,640
Let's look at an example
with masked language modeling
15
00:00:38,640 --> 00:00:41,910
by comparing the outputs of the
pre-trained DistilBERT model
16
00:00:41,910 --> 00:00:43,080
with the version fine-tuned
17
00:00:43,080 --> 00:00:45,273
in chapter 7 of the course, linked below.
18
00:00:46,500 --> 00:00:49,140
The pre-trained model
makes generic predictions,
19
00:00:49,140 --> 00:00:50,580
whereas the fine-tuned model
20
00:00:50,580 --> 00:00:53,253
has its first two
predictions linked to cinema.
21
00:00:54,390 --> 00:00:57,210
Since it was fine-tuned on
a movie reviews dataset,
22
00:00:57,210 --> 00:00:58,680
it's perfectly normal to see
23
00:00:58,680 --> 00:01:01,440
it adapted its suggestions like this.
24
00:01:01,440 --> 00:01:03,090
Notice how it keeps the same prediction
25
00:01:03,090 --> 00:01:05,220
as the pre-trained model afterward.
26
00:01:05,220 --> 00:01:08,100
Even if the fine-tuned model
adapts to the new dataset,
27
00:01:08,100 --> 00:01:10,450
it's not forgetting what
it was pre-trained on.
28
00:01:11,490 --> 00:01:14,220
This is another example
on a translation task.
29
00:01:14,220 --> 00:01:17,310
On top, we use a pre-trained
French/English model,
30
00:01:17,310 --> 00:01:21,330
and at the bottom, the version
we fine-tuned in chapter 7.
31
00:01:21,330 --> 00:01:23,610
The top model is pre-trained
on lots of texts,
32
00:01:23,610 --> 00:01:25,170
and leaves technical English terms,
33
00:01:25,170 --> 00:01:28,350
like plugin and email,
unchanged in the translation.
34
00:01:28,350 --> 00:01:31,350
Both are perfectly
understood by French people.
35
00:01:31,350 --> 00:01:33,780
The dataset picked for the
fine-tuning is a dataset
36
00:01:33,780 --> 00:01:36,660
of technical texts where
special attention was picked
37
00:01:36,660 --> 00:01:39,150
on translating everything in French.
38
00:01:39,150 --> 00:01:42,090
As a result, the fine-tuned
model picked that habit
39
00:01:42,090 --> 00:01:44,193
and translated both plugin and email.
40
00:01:45,942 --> 00:01:49,181
(air whooshing)
41
00:01:49,181 --> 00:01:50,592
(air whooshing)