subtitles/en/tasks_04_🤗-tasks-summarization.srt (55 lines of code) (raw):
1
00:00:04,560 --> 00:00:06,640
Welcome to the Hugging Face tasks series.
2
00:00:07,280 --> 00:00:10,720
In this video, we will take a look
at the Text Summarization task.
3
00:00:13,680 --> 00:00:16,480
Summarization is a task of
producing a shorter version
4
00:00:16,480 --> 00:00:21,600
of a document while preserving the relevant
and important information in the document.
5
00:00:25,040 --> 00:00:29,840
Summarization models take a document to be
summarized and output the summarized text.
6
00:00:33,360 --> 00:00:40,240
This task is evaluated on the ROUGE score. It’s
based on the overlap between the produced sequence
7
00:00:40,240 --> 00:00:48,000
and the correct sequence.
You might see this as ROUGE-1,
8
00:00:48,000 --> 00:00:55,600
which is the overlap of single tokens and ROUGE-2,
the overlap of subsequent token pairs. ROUGE-N
9
00:00:55,600 --> 00:01:02,960
refers to the overlap of n subsequent tokens.
Here we see an example of how overlaps take place.
10
00:01:06,160 --> 00:01:11,280
An example dataset used for this task is
called Extreme Summarization, XSUM. This
11
00:01:11,280 --> 00:01:14,480
dataset contains texts and
their summarized versions.
12
00:01:17,680 --> 00:01:21,280
You can use summarization models
to summarize research papers which
13
00:01:21,280 --> 00:01:25,680
would enable researchers to easily
pick papers for their reading list.
14
00:01:29,040 --> 00:01:39,520
For more information about the Summarization
task, check out the Hugging Face course.