subtitles/en/tasks_02_🤗-tasks-causal-language-modeling.srt (51 lines of code) (raw):

1 00:00:04,560 --> 00:00:06,640 Welcome to the Hugging Face tasks series!   2 00:00:07,200 --> 00:00:10,400 In this video we’ll take a look  at Causal Language Modeling.  3 00:00:13,600 --> 00:00:16,880 Causal language modeling is  the task of predicting the next  4 00:00:16,880 --> 00:00:21,920 word in a sentence, given all the  previous words. This task is very   5 00:00:21,920 --> 00:00:29,920 similar to the autocorrect function  that you might have on your phone.  6 00:00:29,920 --> 00:00:34,720 These models take a sequence to be  completed and outputs the complete sequence.  7 00:00:38,640 --> 00:00:44,160 Classification metrics can’t be used as there’s  no single correct answer for completion.   8 00:00:44,960 --> 00:00:49,280 Instead, we evaluate the distribution  of the text completed by the model.  9 00:00:50,800 --> 00:00:55,440 A common metric to do so is the  cross-entropy loss. Perplexity is   10 00:00:55,440 --> 00:01:01,280 also a widely used metric and it is calculated  as the exponential of the cross-entropy loss.  11 00:01:05,200 --> 00:01:11,840 You can use any dataset with plain text  and tokenize the text to prepare the data.  12 00:01:15,040 --> 00:01:18,240 Causal language models can  be used to generate code.  13 00:01:22,480 --> 00:01:33,200 For more information about the Causal Language  Modeling task, check out the Hugging Face course.