subtitles/en/34_the-push-to-hub-api-(tensorflow).srt (675 lines of code) (raw):

1 00:00:00,587 --> 00:00:02,670 (swoosh) 2 00:00:05,100 --> 00:00:07,080 - [Narrator] Hi, this is going to be a video 3 00:00:07,080 --> 00:00:09,420 about the push_to_hub API 4 00:00:09,420 --> 00:00:10,670 for Tensorflow and Keras. 5 00:00:11,820 --> 00:00:14,850 So, to get started, we'll open up our notebook. 6 00:00:14,850 --> 00:00:16,920 And the first thing you'll need to do is log in to 7 00:00:16,920 --> 00:00:18,170 your HuggingFace account, 8 00:00:19,043 --> 00:00:20,663 for example with the notebook login function. 9 00:00:21,570 --> 00:00:24,630 So to use that, you simply call the function, 10 00:00:24,630 --> 00:00:26,010 the popup will emerge. 11 00:00:26,010 --> 00:00:28,800 You will enter your username and password, 12 00:00:28,800 --> 00:00:31,425 which I'm going to pull out of my password manager here, 13 00:00:31,425 --> 00:00:33,108 and you log in. 14 00:00:33,108 --> 00:00:35,670 The next two cells are just 15 00:00:35,670 --> 00:00:37,080 getting everything ready for training. 16 00:00:37,080 --> 00:00:38,940 So we're just going to load a dataset, 17 00:00:38,940 --> 00:00:41,100 we're going to tokenize that dataset, 18 00:00:41,100 --> 00:00:42,990 and then we're going to load our model and compile 19 00:00:42,990 --> 00:00:45,660 it with the standard Adam optimizer. 20 00:00:45,660 --> 00:00:47,560 So I'm just going to run all of those. 21 00:00:49,830 --> 00:00:52,080 We'll wait a few seconds, 22 00:00:52,080 --> 00:00:54,280 and everything should be ready for training. 23 00:00:57,983 --> 00:00:58,816 Okay. 24 00:00:58,816 --> 00:01:01,440 So now we're ready to train. 25 00:01:01,440 --> 00:01:03,030 I'm going to show you the two ways 26 00:01:03,030 --> 00:01:05,130 you can push your model to the Hub. 27 00:01:05,130 --> 00:01:08,190 So the first is with the PushToHubCallback. 28 00:01:08,190 --> 00:01:10,107 So a callback in Keras 29 00:01:10,107 --> 00:01:13,710 is a function that's called regularly during training. 30 00:01:13,710 --> 00:01:17,400 You can set it to be called after a certain number of steps, 31 00:01:17,400 --> 00:01:21,427 or every epoch, or even just once at the end of training. 32 00:01:21,427 --> 00:01:25,080 So a lot of callbacks in Keras, for example, 33 00:01:25,080 --> 00:01:28,050 control learning rate decaying on plateau, 34 00:01:28,050 --> 00:01:30,047 and things like that. 35 00:01:30,047 --> 00:01:32,520 So this callback, by default, 36 00:01:32,520 --> 00:01:35,760 will save your model to the Hub once every epoch. 37 00:01:35,760 --> 00:01:37,080 And that's really helpful, 38 00:01:37,080 --> 00:01:38,790 especially if your training is very long, 39 00:01:38,790 --> 00:01:40,800 because that means you can resume from that save, 40 00:01:40,800 --> 00:01:43,290 so you get this automatic cloud-saving of your model. 41 00:01:43,290 --> 00:01:45,027 And you can even run inference 42 00:01:45,027 --> 00:01:47,730 with the checkpoints of your model 43 00:01:47,730 --> 00:01:50,208 that have been uploaded by this callback. 44 00:01:50,208 --> 00:01:52,260 And that means you can, 45 00:01:52,260 --> 00:01:54,150 y'know, run some test inputs 46 00:01:54,150 --> 00:01:56,100 and actually see how your model works 47 00:01:56,100 --> 00:01:57,990 at various stages during training, 48 00:01:57,990 --> 00:01:59,540 which is a really nice feature. 49 00:02:00,390 --> 00:02:03,960 So we're going to add the PushToHubCallback, 50 00:02:03,960 --> 00:02:05,670 and it takes just a few arguments. 51 00:02:05,670 --> 00:02:08,250 So the first argument is the temporary directory 52 00:02:08,250 --> 00:02:10,260 that files are going to be saved to 53 00:02:10,260 --> 00:02:12,150 before they're uploaded to the Hub. 54 00:02:12,150 --> 00:02:14,127 The second argument is the tokenizer, 55 00:02:14,127 --> 00:02:15,808 and the third argument here 56 00:02:15,808 --> 00:02:19,080 is the keyword argument hub_model_id. 57 00:02:19,080 --> 00:02:21,330 So that's the name it's going to be saved under 58 00:02:21,330 --> 00:02:23,006 on the HuggingFace Hub. 59 00:02:23,006 --> 00:02:26,267 You can also upload to an organization account 60 00:02:26,267 --> 00:02:29,370 just by adding the organization name 61 00:02:29,370 --> 00:02:32,460 before the repository name with a slash, like this. 62 00:02:32,460 --> 00:02:34,020 So you probably don't have permissions 63 00:02:34,020 --> 00:02:36,000 to upload to the HuggingFace organization, 64 00:02:36,000 --> 00:02:37,170 if you do please file a bug 65 00:02:37,170 --> 00:02:38,973 and let us know extremely urgently. 66 00:02:40,830 --> 00:02:42,960 But if you do have access to your own organization, 67 00:02:42,960 --> 00:02:44,730 then you can use that same approach 68 00:02:44,730 --> 00:02:46,650 to upload models to their account 69 00:02:46,650 --> 00:02:50,760 instead of to your own personal set of models. 70 00:02:50,760 --> 00:02:53,520 So, once you've made your callback, 71 00:02:53,520 --> 00:02:56,310 you simply add it to the callbacks list 72 00:02:56,310 --> 00:02:58,080 when you're calling model.fit. 73 00:02:58,080 --> 00:03:01,110 And everything is uploaded for you from there, 74 00:03:01,110 --> 00:03:02,610 there's nothing else to worry about. 75 00:03:02,610 --> 00:03:04,530 The second way to upload a model, though, 76 00:03:04,530 --> 00:03:07,020 is to call model.push_to_hub. 77 00:03:07,020 --> 00:03:09,086 So this is more of a once-off method. 78 00:03:09,086 --> 00:03:11,550 It's not called regularly during training. 79 00:03:11,550 --> 00:03:13,680 You can just call this manually whenever you want to 80 00:03:13,680 --> 00:03:15,240 upload a model to the hub. 81 00:03:15,240 --> 00:03:18,949 So we recommend running this after the end of training, 82 00:03:18,949 --> 00:03:21,870 just to make sure that you have a commit message 83 00:03:21,870 --> 00:03:24,060 to guarantee that this was the final version 84 00:03:24,060 --> 00:03:26,143 of the model at the end of training. 85 00:03:26,143 --> 00:03:27,930 And it just makes sure that, you know, 86 00:03:27,930 --> 00:03:30,480 you're working with the definitive end-of-training model 87 00:03:30,480 --> 00:03:32,190 and not accidentally using a checkpoint 88 00:03:32,190 --> 00:03:34,224 from somewhere along the way. 89 00:03:34,224 --> 00:03:37,173 So I'm going to run both of these cells. 90 00:03:39,299 --> 00:03:41,716 And then I'm going to cut the video here, 91 00:03:41,716 --> 00:03:43,080 just because training is going to take a couple of minutes. 92 00:03:43,080 --> 00:03:44,580 So I'll skip forward to the end of that, 93 00:03:44,580 --> 00:03:46,320 when the models have all been uploaded, 94 00:03:46,320 --> 00:03:48,390 and I'm gonna show you how you can 95 00:03:48,390 --> 00:03:50,010 access the models in the Hub, 96 00:03:50,010 --> 00:03:52,713 and the other things you can do with them from there. 97 00:03:55,440 --> 00:03:56,700 Okay, we're back, 98 00:03:56,700 --> 00:03:59,160 and our model was uploaded. 99 00:03:59,160 --> 00:04:00,750 Both by the PushToHubCallback 100 00:04:00,750 --> 00:04:04,251 and also by our call to model.push_to_hub after training. 101 00:04:04,251 --> 00:04:05,910 So everything's looking good. 102 00:04:05,910 --> 00:04:09,960 So now if we drop over to my profile on HuggingFace, 103 00:04:09,960 --> 00:04:12,630 and you can get there just by clicking the profile button 104 00:04:12,630 --> 00:04:13,680 in the dropdown. 105 00:04:13,680 --> 00:04:16,860 We can see that the bert-fine-tuned-cola model is here, 106 00:04:16,860 --> 00:04:18,369 and was updated 3 minutes ago. 107 00:04:18,369 --> 00:04:20,520 So it'll always be at the top of your list, 108 00:04:20,520 --> 00:04:23,340 because they're sorted by how recently they were updated. 109 00:04:23,340 --> 00:04:25,740 And we can start querying our model immediately. 110 00:04:30,564 --> 00:04:32,939 So the dataset we were training on 111 00:04:32,939 --> 00:04:34,320 is the Glue CoLA dataset, 112 00:04:34,320 --> 00:04:36,210 and CoLA is an acronym standing for 113 00:04:36,210 --> 00:04:39,420 the Corpus of Linguistic Acceptability. 114 00:04:39,420 --> 00:04:42,480 So what that means is the model is being trained to decide 115 00:04:42,480 --> 00:04:46,350 if a sentence is grammatically or linguistically okay, 116 00:04:46,350 --> 00:04:48,171 or if there's a problem with it. 117 00:04:48,171 --> 00:04:52,890 For example, we could say, "This is a legitimate sentence." 118 00:04:52,890 --> 00:04:54,180 And hopefully it realizes that 119 00:04:54,180 --> 00:04:56,080 this is in fact a legitimate sentence. 120 00:04:57,630 --> 00:05:00,240 So it might take a couple of seconds for the model to load 121 00:05:00,240 --> 00:05:03,060 when you call it for the first time. 122 00:05:03,060 --> 00:05:05,960 So I might cut a couple of seconds out of this video here. 123 00:05:07,860 --> 00:05:09,060 Okay, we're back. 124 00:05:09,060 --> 00:05:12,407 So the model loaded and we got an output, 125 00:05:12,407 --> 00:05:14,340 but there's an obvious problem here. 126 00:05:14,340 --> 00:05:16,888 So these labels aren't really telling us 127 00:05:16,888 --> 00:05:19,740 what categories the model has actually assigned 128 00:05:19,740 --> 00:05:21,655 to this input sentence. 129 00:05:21,655 --> 00:05:23,520 So if we want to fix that, 130 00:05:23,520 --> 00:05:26,010 we want to make sure the model config 131 00:05:26,010 --> 00:05:28,980 has the correct names for each of the label classes, 132 00:05:28,980 --> 00:05:30,707 and then we want to upload that config. 133 00:05:30,707 --> 00:05:32,220 So we can do that down here. 134 00:05:32,220 --> 00:05:34,050 To get the label names, 135 00:05:34,050 --> 00:05:36,547 we can get that from the dataset we loaded, 136 00:05:36,547 --> 00:05:39,627 from the features attribute it has. 137 00:05:39,627 --> 00:05:42,217 And then we can create dictionaries 138 00:05:42,217 --> 00:05:44,865 "id2label" and "label2id", 139 00:05:44,865 --> 00:05:47,452 and just assign them to the model config. 140 00:05:47,452 --> 00:05:50,790 And then we can just push our updated config, 141 00:05:50,790 --> 00:05:54,690 and that'll override the existing config in the Hub repo. 142 00:05:54,690 --> 00:05:56,368 So that's just been done. 143 00:05:56,368 --> 00:05:58,320 So now, if we go back here, 144 00:05:58,320 --> 00:06:00,000 I'm going to use a slightly different sentence 145 00:06:00,000 --> 00:06:03,540 because the outputs for sentences are sometimes cached. 146 00:06:03,540 --> 00:06:06,030 And so, if we want to generate new results 147 00:06:06,030 --> 00:06:07,590 I'm going to use something slightly different. 148 00:06:07,590 --> 00:06:09,783 So let's try an incorrect sentence. 149 00:06:10,830 --> 00:06:12,640 So this is not valid English grammar 150 00:06:13,538 --> 00:06:15,030 and hopefully the model will see that. 151 00:06:15,030 --> 00:06:16,958 It's going to reload here, 152 00:06:16,958 --> 00:06:18,630 so I'm going to cut a couple of seconds here, 153 00:06:18,630 --> 00:06:20,933 and then we'll see what the model is going to say. 154 00:06:22,860 --> 00:06:23,820 Okay. 155 00:06:23,820 --> 00:06:26,580 So the model, it's confidence isn't very good, 156 00:06:26,580 --> 00:06:28,830 because of course we didn't really optimize 157 00:06:28,830 --> 00:06:30,630 our hyperparameters at all. 158 00:06:30,630 --> 00:06:32,190 But it has decided that this sentence 159 00:06:32,190 --> 00:06:35,094 is more likely to be unacceptable than acceptable. 160 00:06:35,094 --> 00:06:38,160 Presumably if we tried a bit harder with training 161 00:06:38,160 --> 00:06:40,080 we could get a much lower validation loss, 162 00:06:40,080 --> 00:06:43,830 and therefore the model's predictions would be more precise. 163 00:06:43,830 --> 00:06:46,260 But let's try our original sentence again. 164 00:06:46,260 --> 00:06:49,140 Of course, because of the caching issue, 165 00:06:49,140 --> 00:06:52,740 we're seeing that the original answers are unchanged. 166 00:06:52,740 --> 00:06:55,196 So let's try a different, valid sentence. 167 00:06:55,196 --> 00:06:58,767 So let's try, "This is a valid English sentence". 168 00:07:00,150 --> 00:07:02,100 And we see that now the model correctly decides 169 00:07:02,100 --> 00:07:04,290 that it has a very high probability of being acceptable, 170 00:07:04,290 --> 00:07:06,900 and a very low probability of being unacceptable. 171 00:07:06,900 --> 00:07:09,930 So you can use this inference API 172 00:07:09,930 --> 00:07:12,810 even with the checkpoints that are uploaded during training, 173 00:07:12,810 --> 00:07:14,546 so it can be very interesting to see how 174 00:07:14,546 --> 00:07:17,690 the model's predictions for sample inputs change 175 00:07:17,690 --> 00:07:20,579 with each epoch of training. 176 00:07:20,579 --> 00:07:23,370 Also, the model we've uploaded 177 00:07:23,370 --> 00:07:25,740 is going to be accessible to you and, 178 00:07:25,740 --> 00:07:28,046 if it's shared publicly, to anyone else. 179 00:07:28,046 --> 00:07:29,788 So if you want to load that model, 180 00:07:29,788 --> 00:07:32,500 all you or anyone else needs to do 181 00:07:34,290 --> 00:07:37,440 is just to load it in either a pipeline, 182 00:07:37,440 --> 00:07:40,925 or you can just load it with, for example, 183 00:07:40,925 --> 00:07:43,203 TFAutoModelForSequenceClassification. 184 00:07:46,920 --> 00:07:49,989 And then for the name you would just simply pass 185 00:07:49,989 --> 00:07:53,325 the path to the repo you want to upload. 186 00:07:53,325 --> 00:07:55,890 Or to download, excuse me. 187 00:07:55,890 --> 00:07:58,710 So if I want to use this model again, 188 00:07:58,710 --> 00:08:00,667 if I want to load it from the hub, 189 00:08:00,667 --> 00:08:01,763 I just run this one line of code. 190 00:08:02,813 --> 00:08:03,773 The model will be downloaded. 191 00:08:07,757 --> 00:08:10,080 And, with any luck, it'll be ready to 192 00:08:10,080 --> 00:08:12,450 fine-tune on a different dataset, make predictions with, 193 00:08:12,450 --> 00:08:14,340 or do anything else you wanna do. 194 00:08:14,340 --> 00:08:17,700 So that was a quick overview of how, 195 00:08:17,700 --> 00:08:19,470 after your training or during your training, 196 00:08:19,470 --> 00:08:21,420 you can upload models to the Hub, 197 00:08:21,420 --> 00:08:22,440 you can checkpoint there, 198 00:08:22,440 --> 00:08:24,240 you can resume training from there, 199 00:08:24,240 --> 00:08:26,790 and you can get inference results 200 00:08:26,790 --> 00:08:28,384 from the models you've uploaded. 201 00:08:28,384 --> 00:08:31,084 So thank you, and I hope to see you in a future video. 202 00:08:32,852 --> 00:08:34,935 (swoosh)