subtitles/en/tasks_03_🤗-tasks-masked-language-modeling.srt (68 lines of code) (raw):
1
00:00:04,660 --> 00:00:07,589
Welcome to the Hugging Face tasks series!
2
00:00:07,589 --> 00:00:13,730
In this video we’ll take a look at Masked
Language Modeling.
3
00:00:13,730 --> 00:00:20,720
Masked language modeling is the task of predicting
which words should fill in the blanks of a
4
00:00:20,720 --> 00:00:23,500
sentence.
5
00:00:23,500 --> 00:00:32,870
These models take a masked text as the input
and output the possible values for that mask.
6
00:00:32,870 --> 00:00:37,550
Masked language modeling is handy before fine-tuning
your model for your task.
7
00:00:37,550 --> 00:00:43,579
For example, if you need to use a model in
a specific domain, say, biomedical documents,
8
00:00:43,579 --> 00:00:49,050
models like BERT will treat your domain-specific
words as rare tokens.
9
00:00:49,050 --> 00:00:54,220
If you train a masked language model using
your biomedical corpus and then fine tune
10
00:00:54,220 --> 00:01:02,929
your model on a downstream task, you will
have a better performance.
11
00:01:02,929 --> 00:01:07,799
Classification metrics can’t be used as
there’s no single correct answer to mask
12
00:01:07,799 --> 00:01:08,799
values.
13
00:01:08,799 --> 00:01:12,900
Instead, we evaluate the distribution of the
mask values.
14
00:01:12,900 --> 00:01:16,590
A common metric to do so is the cross-entropy
loss.
15
00:01:16,590 --> 00:01:22,010
Perplexity is also a widely used metric and
it is calculated as the exponential of the
16
00:01:22,010 --> 00:01:27,240
cross-entropy loss.
17
00:01:27,240 --> 00:01:35,680
You can use any dataset with plain text and
tokenize the text to mask the data.
18
00:01:35,680 --> 00:01:44,710
For more information about the Masked Language
Modeling, check out the Hugging Face course.