subtitles/en/tasks_04_🤗-tasks-summarization.srt

1 00:00:04,560 --> 00:00:06,640 Welcome to the Hugging Face tasks series. 2 00:00:07,280 --> 00:00:10,720 In this video, we will take a look at the Text Summarization task. 3 00:00:13,680 --> 00:00:16,480 Summarization is a task of producing a shorter version 4 00:00:16,480 --> 00:00:21,600 of a document while preserving the relevant and important information in the document. 5 00:00:25,040 --> 00:00:29,840 Summarization models take a document to be summarized and output the summarized text. 6 00:00:33,360 --> 00:00:40,240 This task is evaluated on the ROUGE score. It’s based on the overlap between the produced sequence 7 00:00:40,240 --> 00:00:48,000 and the correct sequence. You might see this as ROUGE-1, 8 00:00:48,000 --> 00:00:55,600 which is the overlap of single tokens and ROUGE-2, the overlap of subsequent token pairs. ROUGE-N 9 00:00:55,600 --> 00:01:02,960 refers to the overlap of n subsequent tokens. Here we see an example of how overlaps take place. 10 00:01:06,160 --> 00:01:11,280 An example dataset used for this task is called Extreme Summarization, XSUM. This 11 00:01:11,280 --> 00:01:14,480 dataset contains texts and their summarized versions. 12 00:01:17,680 --> 00:01:21,280 You can use summarization models to summarize research papers which 13 00:01:21,280 --> 00:01:25,680 would enable researchers to easily pick papers for their reading list. 14 00:01:29,040 --> 00:01:39,520 For more information about the Summarization task, check out the Hugging Face course.

subtitles/en/tasks_04_🤗-tasks-summarization.srt (55 lines of code) (raw):