1 00:00:04,660 --> 00:00:07,589 Welcome to the Hugging Face tasks series! 2 00:00:07,589 --> 00:00:13,730 In this video we’ll take a look at Masked Language Modeling. 3 00:00:13,730 --> 00:00:20,720 Masked language modeling is the task of predicting which words should fill in the blanks of a 4 00:00:20,720 --> 00:00:23,500 sentence. 5 00:00:23,500 --> 00:00:32,870 These models take a masked text as the input and output the possible values for that mask. 6 00:00:32,870 --> 00:00:37,550 Masked language modeling is handy before fine-tuning your model for your task. 7 00:00:37,550 --> 00:00:43,579 For example, if you need to use a model in a specific domain, say, biomedical documents, 8 00:00:43,579 --> 00:00:49,050 models like BERT will treat your domain-specific words as rare tokens. 9 00:00:49,050 --> 00:00:54,220 If you train a masked language model using your biomedical corpus and then fine tune 10 00:00:54,220 --> 00:01:02,929 your model on a downstream task, you will have a better performance. 11 00:01:02,929 --> 00:01:07,799 Classification metrics can’t be used as there’s no single correct answer to mask 12 00:01:07,799 --> 00:01:08,799 values. 13 00:01:08,799 --> 00:01:12,900 Instead, we evaluate the distribution of the mask values. 14 00:01:12,900 --> 00:01:16,590 A common metric to do so is the cross-entropy loss. 15 00:01:16,590 --> 00:01:22,010 Perplexity is also a widely used metric and it is calculated as the exponential of the 16 00:01:22,010 --> 00:01:27,240 cross-entropy loss. 17 00:01:27,240 --> 00:01:35,680 You can use any dataset with plain text and tokenize the text to mask the data. 18 00:01:35,680 --> 00:01:44,710 For more information about the Masked Language Modeling, check out the Hugging Face course.