subtitles/en/tasks_00_🤗-tasks-token-classification.srt (92 lines of code) (raw):
1
00:00:04,520 --> 00:00:07,400
Welcome to the Hugging Face tasks series!
2
00:00:07,400 --> 00:00:11,870
In this video we’ll take a look at the token
classification task.
3
00:00:11,870 --> 00:00:17,900
Token classification is the task of assigning
a label to each token in a sentence.
4
00:00:17,900 --> 00:00:23,310
There are various token classification tasks
and the most common are Named Entity Recognition
5
00:00:23,310 --> 00:00:26,430
and Part-of-Speech Tagging.
6
00:00:26,430 --> 00:00:31,640
Let’s take a quick look at the Named Entity
Recognition task.
7
00:00:31,640 --> 00:00:38,400
The goal of this task is to find the entities
in a piece of text, such as person, location,
8
00:00:38,400 --> 00:00:40,210
or organization.
9
00:00:40,210 --> 00:00:45,250
This task is formulated as labelling each
token with one class for each entity, and
10
00:00:45,250 --> 00:00:51,719
another class for tokens that have no entity.
11
00:00:51,719 --> 00:00:55,670
Another token classification task is part-of-speech
tagging.
12
00:00:55,670 --> 00:01:01,399
The goal of this task is to label the words
for a particular part of a speech, such as
13
00:01:01,399 --> 00:01:05,900
noun, pronoun, adjective, verb and so on.
14
00:01:05,900 --> 00:01:11,270
This task is formulated as labelling each
token with parts of speech.
15
00:01:11,270 --> 00:01:19,659
Token classification models are evaluated
on Accuracy, Recall, Precision and F1-Score.
16
00:01:19,659 --> 00:01:22,950
The metrics are calculated for each of the
classes.
17
00:01:22,950 --> 00:01:28,040
We calculate true positive, true negative
and false positives to calculate precision
18
00:01:28,040 --> 00:01:31,829
and recall, and take their harmonic mean to
get F1-Score.
19
00:01:31,829 --> 00:01:42,329
Then we calculate it for every class and take
the overall average to evaluate our model.
20
00:01:42,329 --> 00:01:45,680
An example dataset used for this task is ConLL2003.
21
00:01:45,680 --> 00:01:51,750
Here, each token belongs to a certain named
entity class, denoted as the indices of the
22
00:01:51,750 --> 00:01:55,380
list containing the labels.
23
00:01:55,380 --> 00:02:00,720
You can extract important information from
invoices using named entity recognition models,
24
00:02:00,720 --> 00:02:07,070
such as date, organization name or address.
25
00:02:07,070 --> 00:02:16,840
For more information about the Token classification
task, check out the Hugging Face course.