subtitles/en/tasks_00_🤗-tasks-token-classification.srt (92 lines of code) (raw):

1 00:00:04,520 --> 00:00:07,400 Welcome to the Hugging Face tasks series! 2 00:00:07,400 --> 00:00:11,870 In this video we’ll take a look at the token classification task. 3 00:00:11,870 --> 00:00:17,900 Token classification is the task of assigning a label to each token in a sentence. 4 00:00:17,900 --> 00:00:23,310 There are various token classification tasks and the most common are Named Entity Recognition 5 00:00:23,310 --> 00:00:26,430 and Part-of-Speech Tagging. 6 00:00:26,430 --> 00:00:31,640 Let’s take a quick look at the Named Entity Recognition task. 7 00:00:31,640 --> 00:00:38,400 The goal of this task is to find the entities in a piece of text, such as person, location, 8 00:00:38,400 --> 00:00:40,210 or organization. 9 00:00:40,210 --> 00:00:45,250 This task is formulated as labelling each token with one class for each entity, and 10 00:00:45,250 --> 00:00:51,719 another class for tokens that have no entity. 11 00:00:51,719 --> 00:00:55,670 Another token classification task is part-of-speech tagging. 12 00:00:55,670 --> 00:01:01,399 The goal of this task is to label the words for a particular part of a speech, such as 13 00:01:01,399 --> 00:01:05,900 noun, pronoun, adjective, verb and so on. 14 00:01:05,900 --> 00:01:11,270 This task is formulated as labelling each token with parts of speech. 15 00:01:11,270 --> 00:01:19,659 Token classification models are evaluated on Accuracy, Recall, Precision and F1-Score. 16 00:01:19,659 --> 00:01:22,950 The metrics are calculated for each of the classes. 17 00:01:22,950 --> 00:01:28,040 We calculate true positive, true negative and false positives to calculate precision 18 00:01:28,040 --> 00:01:31,829 and recall, and take their harmonic mean to get F1-Score. 19 00:01:31,829 --> 00:01:42,329 Then we calculate it for every class and take the overall average to evaluate our model. 20 00:01:42,329 --> 00:01:45,680 An example dataset used for this task is ConLL2003. 21 00:01:45,680 --> 00:01:51,750 Here, each token belongs to a certain named entity class, denoted as the indices of the 22 00:01:51,750 --> 00:01:55,380 list containing the labels. 23 00:01:55,380 --> 00:02:00,720 You can extract important information from invoices using named entity recognition models, 24 00:02:00,720 --> 00:02:07,070 such as date, organization name or address. 25 00:02:07,070 --> 00:02:16,840 For more information about the Token classification task, check out the Hugging Face course.