subtitles/en/33_the-push-to-hub-api-(pytorch).srt (370 lines of code) (raw):

1 00:00:00,321 --> 00:00:01,497 (air whooshing) 2 00:00:01,497 --> 00:00:02,330 (smiley face popping) 3 00:00:02,330 --> 00:00:05,130 (air whooshing) 4 00:00:05,130 --> 00:00:06,830 - [Instructor] So push to hub API. 5 00:00:08,310 --> 00:00:10,533 Let's have a look at the push to hub API. 6 00:00:11,730 --> 00:00:14,640 You will need to be logged in with your Hugging Face account 7 00:00:14,640 --> 00:00:17,400 which you can do by executing this first cell, 8 00:00:17,400 --> 00:00:21,123 or by typing huggingface-cli login in a terminal. 9 00:00:21,990 --> 00:00:26,640 Just enter you username and password, then click login, 10 00:00:26,640 --> 00:00:28,620 this will store a authentication token 11 00:00:28,620 --> 00:00:30,670 in the cache of the machine you're using. 12 00:00:31,890 --> 00:00:35,790 Now, let's launch a fine tuning of a BERT model 13 00:00:35,790 --> 00:00:37,920 on the GLUE COLA dataset. 14 00:00:37,920 --> 00:00:39,600 We won't go over the fine tuning code 15 00:00:39,600 --> 00:00:42,270 because you can find it in any transformer tutorial, 16 00:00:42,270 --> 00:00:44,670 or by looking at the videos link below. 17 00:00:44,670 --> 00:00:46,470 What interests us here is 18 00:00:46,470 --> 00:00:48,970 how we can leverage the model hub during training. 19 00:00:49,860 --> 00:00:52,980 This is done with the "push_to_hub=true" argument 20 00:00:52,980 --> 00:00:55,530 passed in your TrainingArguments. 21 00:00:55,530 --> 00:00:57,240 This will automatically upload your model 22 00:00:57,240 --> 00:00:59,400 to the Hub each time it is saved, 23 00:00:59,400 --> 00:01:01,323 so every epoch in our case. 24 00:01:02,280 --> 00:01:04,860 This allows you to resume training from a different machine 25 00:01:04,860 --> 00:01:06,873 if the current one gets interrupted. 26 00:01:08,220 --> 00:01:10,440 The model will be updated in your name space 27 00:01:10,440 --> 00:01:14,640 with the name of the output directory you picked by default. 28 00:01:14,640 --> 00:01:16,020 You can choose another name 29 00:01:16,020 --> 00:01:19,113 by passing it to the hub_model_id argument. 30 00:01:20,070 --> 00:01:23,370 You can also push inside an organization you are a member of 31 00:01:23,370 --> 00:01:25,740 by passing a full repository name, 32 00:01:25,740 --> 00:01:28,933 with the name of the organization/, 33 00:01:28,933 --> 00:01:30,433 the model ID you want to pick. 34 00:01:32,250 --> 00:01:34,650 With that done, we can just launch training, 35 00:01:34,650 --> 00:01:36,093 and wait a little bit. 36 00:01:36,960 --> 00:01:39,033 I'll cut the waiting time from the video. 37 00:01:43,260 --> 00:01:46,350 Note that the model is pushed asynchronously, 38 00:01:46,350 --> 00:01:47,730 meaning that the training continues 39 00:01:47,730 --> 00:01:49,730 while your model is uploaded to the hub. 40 00:01:51,060 --> 00:01:52,950 When your first commit is finished, 41 00:01:52,950 --> 00:01:55,650 you can go inspect your model on the Hub 42 00:01:55,650 --> 00:01:57,960 by looking inside your name space, 43 00:01:57,960 --> 00:01:59,943 and you'll find it at the very top. 44 00:02:01,980 --> 00:02:04,200 You can even start playing with its inference widget 45 00:02:04,200 --> 00:02:06,630 while it's continuing the training. 46 00:02:06,630 --> 00:02:09,270 The Cola data set tasks the model with determining 47 00:02:09,270 --> 00:02:11,970 if the sentence is grammatically correct on that. 48 00:02:11,970 --> 00:02:15,510 So we pick an example of incorrect sentence to test it. 49 00:02:15,510 --> 00:02:16,950 Note that it'll take a bit of time 50 00:02:16,950 --> 00:02:18,750 to load your model inside the inference APIs, 51 00:02:18,750 --> 00:02:20,880 so first time you try to use it. 52 00:02:20,880 --> 00:02:23,280 We'll cut by time from the video. 53 00:02:23,280 --> 00:02:24,870 There is something wrong with the labels, 54 00:02:24,870 --> 00:02:27,360 but we'll fix it later in this video. 55 00:02:27,360 --> 00:02:29,520 Once your training is finished, 56 00:02:29,520 --> 00:02:31,770 you should do one last push with the trainer 57 00:02:31,770 --> 00:02:33,840 that pushed to a method. 58 00:02:33,840 --> 00:02:35,430 This is for two reason. 59 00:02:35,430 --> 00:02:36,750 First, this will make sure 60 00:02:36,750 --> 00:02:39,180 you are predicting the final version of your model 61 00:02:39,180 --> 00:02:40,680 if you didn't already. 62 00:02:40,680 --> 00:02:42,480 For instance, if you used to save 63 00:02:42,480 --> 00:02:46,980 every in step strategy instead of every second, 64 00:02:46,980 --> 00:02:48,180 this will draft a model card 65 00:02:48,180 --> 00:02:51,120 that will be the landing page of your model repo. 66 00:02:51,120 --> 00:02:52,260 Once the commit is done, 67 00:02:52,260 --> 00:02:54,810 let's go back on our model page and refresh. 68 00:02:54,810 --> 00:02:56,820 We can see the drafters model card 69 00:02:56,820 --> 00:02:58,080 which includes information, 70 00:02:58,080 --> 00:03:00,381 and which one model we find tuned. 71 00:03:00,381 --> 00:03:03,570 So final evaluation loss and metric, 72 00:03:03,570 --> 00:03:06,300 the training hyperparameter used, 73 00:03:06,300 --> 00:03:08,670 the intermediate training results, 74 00:03:08,670 --> 00:03:10,320 and the framework versions we used 75 00:03:10,320 --> 00:03:13,173 so that other people can easily reproduce our results. 76 00:03:15,270 --> 00:03:16,860 On top of all that information, 77 00:03:16,860 --> 00:03:19,740 the trainer also included some metadata that is interpreted 78 00:03:19,740 --> 00:03:22,650 by the Hugging Face website in the model cloud. 79 00:03:22,650 --> 00:03:26,010 You get the value of the metrics reported in a nice widget 80 00:03:26,010 --> 00:03:29,640 as well as a link to a leaderboard with paper with code. 81 00:03:29,640 --> 00:03:32,550 So the Tensorboard runs have also been pushed 82 00:03:32,550 --> 00:03:34,560 to this report, and we can look at them 83 00:03:34,560 --> 00:03:36,000 directly from the model hub 84 00:03:36,000 --> 00:03:38,850 by clicking on the training metrics sub menu. 85 00:03:38,850 --> 00:03:39,795 If you are not using the Trainer API 86 00:03:39,795 --> 00:03:42,510 to fine-tune your model, 87 00:03:42,510 --> 00:03:43,770 you can use a push_to_hub method 88 00:03:43,770 --> 00:03:46,427 on the model, and tokenizer directly. 89 00:03:46,427 --> 00:03:50,160 Let's test this to fix all labels in the inference widget. 90 00:03:50,160 --> 00:03:52,740 The inference widget was using different names for labels 91 00:03:52,740 --> 00:03:54,810 because we did not indicate the correspondence 92 00:03:54,810 --> 00:03:57,030 between integer and label names. 93 00:03:57,030 --> 00:03:58,740 We can fix this in the configuration 94 00:03:58,740 --> 00:04:01,350 by sitting the label2id, 95 00:04:01,350 --> 00:04:04,170 and id2label fields through the proper values 96 00:04:04,170 --> 00:04:06,933 when pushing the model config to the hub. 97 00:04:07,950 --> 00:04:10,620 Once this is done, we can check on the website, 98 00:04:10,620 --> 00:04:13,380 and the model is now showing the proper label. 99 00:04:13,380 --> 00:04:15,240 Now that the model is on the hub, 100 00:04:15,240 --> 00:04:17,370 we can use it from anywhere 101 00:04:17,370 --> 00:04:19,920 as we would any other Transformer model 102 00:04:19,920 --> 00:04:21,113 with the from_pretrained method 103 00:04:21,113 --> 00:04:22,923 or with the pipeline function. 104 00:04:34,350 --> 00:04:36,780 We just have to use the identifier from the hub, 105 00:04:36,780 --> 00:04:39,450 and we can see that the model configuration and weights 106 00:04:39,450 --> 00:04:42,483 as well as the tokenized files are automatically downloaded. 107 00:04:53,880 --> 00:04:55,950 Try the push_to_hub API in the next training 108 00:04:55,950 --> 00:04:58,650 to easily share your model with the rest of the world. 109 00:05:01,151 --> 00:05:03,818 (air whooshing)