subtitles/en/33_the-push-to-hub-api-(pytorch).srt (370 lines of code) (raw):
1
00:00:00,321 --> 00:00:01,497
(air whooshing)
2
00:00:01,497 --> 00:00:02,330
(smiley face popping)
3
00:00:02,330 --> 00:00:05,130
(air whooshing)
4
00:00:05,130 --> 00:00:06,830
- [Instructor] So push to hub API.
5
00:00:08,310 --> 00:00:10,533
Let's have a look at the push to hub API.
6
00:00:11,730 --> 00:00:14,640
You will need to be logged in
with your Hugging Face account
7
00:00:14,640 --> 00:00:17,400
which you can do by
executing this first cell,
8
00:00:17,400 --> 00:00:21,123
or by typing huggingface-cli
login in a terminal.
9
00:00:21,990 --> 00:00:26,640
Just enter you username and
password, then click login,
10
00:00:26,640 --> 00:00:28,620
this will store a authentication token
11
00:00:28,620 --> 00:00:30,670
in the cache of the machine you're using.
12
00:00:31,890 --> 00:00:35,790
Now, let's launch a fine
tuning of a BERT model
13
00:00:35,790 --> 00:00:37,920
on the GLUE COLA dataset.
14
00:00:37,920 --> 00:00:39,600
We won't go over the fine tuning code
15
00:00:39,600 --> 00:00:42,270
because you can find it in
any transformer tutorial,
16
00:00:42,270 --> 00:00:44,670
or by looking at the videos link below.
17
00:00:44,670 --> 00:00:46,470
What interests us here is
18
00:00:46,470 --> 00:00:48,970
how we can leverage the
model hub during training.
19
00:00:49,860 --> 00:00:52,980
This is done with the
"push_to_hub=true" argument
20
00:00:52,980 --> 00:00:55,530
passed in your TrainingArguments.
21
00:00:55,530 --> 00:00:57,240
This will automatically upload your model
22
00:00:57,240 --> 00:00:59,400
to the Hub each time it is saved,
23
00:00:59,400 --> 00:01:01,323
so every epoch in our case.
24
00:01:02,280 --> 00:01:04,860
This allows you to resume
training from a different machine
25
00:01:04,860 --> 00:01:06,873
if the current one gets interrupted.
26
00:01:08,220 --> 00:01:10,440
The model will be updated
in your name space
27
00:01:10,440 --> 00:01:14,640
with the name of the output
directory you picked by default.
28
00:01:14,640 --> 00:01:16,020
You can choose another name
29
00:01:16,020 --> 00:01:19,113
by passing it to the
hub_model_id argument.
30
00:01:20,070 --> 00:01:23,370
You can also push inside an
organization you are a member of
31
00:01:23,370 --> 00:01:25,740
by passing a full repository name,
32
00:01:25,740 --> 00:01:28,933
with the name of the organization/,
33
00:01:28,933 --> 00:01:30,433
the model ID you want to pick.
34
00:01:32,250 --> 00:01:34,650
With that done, we can
just launch training,
35
00:01:34,650 --> 00:01:36,093
and wait a little bit.
36
00:01:36,960 --> 00:01:39,033
I'll cut the waiting time from the video.
37
00:01:43,260 --> 00:01:46,350
Note that the model is
pushed asynchronously,
38
00:01:46,350 --> 00:01:47,730
meaning that the training continues
39
00:01:47,730 --> 00:01:49,730
while your model is uploaded to the hub.
40
00:01:51,060 --> 00:01:52,950
When your first commit is finished,
41
00:01:52,950 --> 00:01:55,650
you can go inspect your model on the Hub
42
00:01:55,650 --> 00:01:57,960
by looking inside your name space,
43
00:01:57,960 --> 00:01:59,943
and you'll find it at the very top.
44
00:02:01,980 --> 00:02:04,200
You can even start playing
with its inference widget
45
00:02:04,200 --> 00:02:06,630
while it's continuing the training.
46
00:02:06,630 --> 00:02:09,270
The Cola data set tasks
the model with determining
47
00:02:09,270 --> 00:02:11,970
if the sentence is
grammatically correct on that.
48
00:02:11,970 --> 00:02:15,510
So we pick an example of
incorrect sentence to test it.
49
00:02:15,510 --> 00:02:16,950
Note that it'll take a bit of time
50
00:02:16,950 --> 00:02:18,750
to load your model inside
the inference APIs,
51
00:02:18,750 --> 00:02:20,880
so first time you try to use it.
52
00:02:20,880 --> 00:02:23,280
We'll cut by time from the video.
53
00:02:23,280 --> 00:02:24,870
There is something wrong with the labels,
54
00:02:24,870 --> 00:02:27,360
but we'll fix it later in this video.
55
00:02:27,360 --> 00:02:29,520
Once your training is finished,
56
00:02:29,520 --> 00:02:31,770
you should do one last
push with the trainer
57
00:02:31,770 --> 00:02:33,840
that pushed to a method.
58
00:02:33,840 --> 00:02:35,430
This is for two reason.
59
00:02:35,430 --> 00:02:36,750
First, this will make sure
60
00:02:36,750 --> 00:02:39,180
you are predicting the
final version of your model
61
00:02:39,180 --> 00:02:40,680
if you didn't already.
62
00:02:40,680 --> 00:02:42,480
For instance, if you used to save
63
00:02:42,480 --> 00:02:46,980
every in step strategy
instead of every second,
64
00:02:46,980 --> 00:02:48,180
this will draft a model card
65
00:02:48,180 --> 00:02:51,120
that will be the landing
page of your model repo.
66
00:02:51,120 --> 00:02:52,260
Once the commit is done,
67
00:02:52,260 --> 00:02:54,810
let's go back on our
model page and refresh.
68
00:02:54,810 --> 00:02:56,820
We can see the drafters model card
69
00:02:56,820 --> 00:02:58,080
which includes information,
70
00:02:58,080 --> 00:03:00,381
and which one model we find tuned.
71
00:03:00,381 --> 00:03:03,570
So final evaluation loss and metric,
72
00:03:03,570 --> 00:03:06,300
the training hyperparameter used,
73
00:03:06,300 --> 00:03:08,670
the intermediate training results,
74
00:03:08,670 --> 00:03:10,320
and the framework versions we used
75
00:03:10,320 --> 00:03:13,173
so that other people can
easily reproduce our results.
76
00:03:15,270 --> 00:03:16,860
On top of all that information,
77
00:03:16,860 --> 00:03:19,740
the trainer also included some
metadata that is interpreted
78
00:03:19,740 --> 00:03:22,650
by the Hugging Face
website in the model cloud.
79
00:03:22,650 --> 00:03:26,010
You get the value of the metrics
reported in a nice widget
80
00:03:26,010 --> 00:03:29,640
as well as a link to a
leaderboard with paper with code.
81
00:03:29,640 --> 00:03:32,550
So the Tensorboard runs
have also been pushed
82
00:03:32,550 --> 00:03:34,560
to this report, and we can look at them
83
00:03:34,560 --> 00:03:36,000
directly from the model hub
84
00:03:36,000 --> 00:03:38,850
by clicking on the
training metrics sub menu.
85
00:03:38,850 --> 00:03:39,795
If you are not using the Trainer API
86
00:03:39,795 --> 00:03:42,510
to fine-tune your model,
87
00:03:42,510 --> 00:03:43,770
you can use a push_to_hub method
88
00:03:43,770 --> 00:03:46,427
on the model, and tokenizer directly.
89
00:03:46,427 --> 00:03:50,160
Let's test this to fix all
labels in the inference widget.
90
00:03:50,160 --> 00:03:52,740
The inference widget was using
different names for labels
91
00:03:52,740 --> 00:03:54,810
because we did not
indicate the correspondence
92
00:03:54,810 --> 00:03:57,030
between integer and label names.
93
00:03:57,030 --> 00:03:58,740
We can fix this in the configuration
94
00:03:58,740 --> 00:04:01,350
by sitting the label2id,
95
00:04:01,350 --> 00:04:04,170
and id2label fields
through the proper values
96
00:04:04,170 --> 00:04:06,933
when pushing the model config to the hub.
97
00:04:07,950 --> 00:04:10,620
Once this is done, we
can check on the website,
98
00:04:10,620 --> 00:04:13,380
and the model is now
showing the proper label.
99
00:04:13,380 --> 00:04:15,240
Now that the model is on the hub,
100
00:04:15,240 --> 00:04:17,370
we can use it from anywhere
101
00:04:17,370 --> 00:04:19,920
as we would any other Transformer model
102
00:04:19,920 --> 00:04:21,113
with the from_pretrained method
103
00:04:21,113 --> 00:04:22,923
or with the pipeline function.
104
00:04:34,350 --> 00:04:36,780
We just have to use the
identifier from the hub,
105
00:04:36,780 --> 00:04:39,450
and we can see that the model
configuration and weights
106
00:04:39,450 --> 00:04:42,483
as well as the tokenized files
are automatically downloaded.
107
00:04:53,880 --> 00:04:55,950
Try the push_to_hub API
in the next training
108
00:04:55,950 --> 00:04:58,650
to easily share your model
with the rest of the world.
109
00:05:01,151 --> 00:05:03,818
(air whooshing)