subtitles/en/32_managing-a-repo-on-the-model-hub.srt (580 lines of code) (raw):
1
00:00:04,200 --> 00:00:06,210
- [Instructor] In this video,
we're going to understand how
2
00:00:06,210 --> 00:00:08,280
to manage a model repository
3
00:00:08,280 --> 00:00:10,053
on the Hugging Face Hub Model Hub.
4
00:00:10,920 --> 00:00:13,020
In order to handle a repository
5
00:00:13,020 --> 00:00:15,450
you should first have
a Hugging Face account.
6
00:00:15,450 --> 00:00:17,610
A link to create a new
account is available
7
00:00:17,610 --> 00:00:18,573
in the description.
8
00:00:20,130 --> 00:00:22,980
Once you are logged in, you
can create a new repository
9
00:00:22,980 --> 00:00:25,890
by clicking on the new model option.
10
00:00:25,890 --> 00:00:29,400
You should be facing a similar
modal to the following.
11
00:00:29,400 --> 00:00:33,240
In the owner input, you can
put either your own namespace
12
00:00:33,240 --> 00:00:35,703
or any of your organization's namespaces.
13
00:00:36,660 --> 00:00:39,330
The model name is the model identifier
14
00:00:39,330 --> 00:00:40,320
that will then be used
15
00:00:40,320 --> 00:00:43,143
to identify your model
on the chosen namespace.
16
00:00:44,130 --> 00:00:47,700
The final choice is
between public and private.
17
00:00:47,700 --> 00:00:49,950
Public models are accessible by anyone.
18
00:00:49,950 --> 00:00:51,840
This is the recommended free option,
19
00:00:51,840 --> 00:00:54,960
as this makes your model easily
accessible and shareable.
20
00:00:54,960 --> 00:00:57,630
The owners of your
namespace are the only ones
21
00:00:57,630 --> 00:00:59,523
who can update and change your model.
22
00:01:00,450 --> 00:01:03,660
A more advanced option
is the private option.
23
00:01:03,660 --> 00:01:04,560
In this case,
24
00:01:04,560 --> 00:01:06,000
only the owners of your namespace
25
00:01:06,000 --> 00:01:08,280
will have visibility over your model.
26
00:01:08,280 --> 00:01:10,260
Other users won't know it exists
27
00:01:10,260 --> 00:01:11,810
and will not be able to use it.
28
00:01:15,030 --> 00:01:17,030
Let's create a dummy model to play with.
29
00:01:18,180 --> 00:01:19,710
Once your model is created,
30
00:01:19,710 --> 00:01:22,230
comes the management of that model.
31
00:01:22,230 --> 00:01:24,360
Three tabs are available to you.
32
00:01:24,360 --> 00:01:27,960
You're facing the first one,
which is the model card page.
33
00:01:27,960 --> 00:01:29,970
This is the page you use
to showcase your model
34
00:01:29,970 --> 00:01:31,110
to the world.
35
00:01:31,110 --> 00:01:33,260
We'll see how it can
be completed in a bit.
36
00:01:34,500 --> 00:01:37,503
The second one is the
files and versions tab.
37
00:01:38,340 --> 00:01:40,920
Your model itself is a Git repository.
38
00:01:40,920 --> 00:01:43,230
If you're unaware of
what is a Git repository,
39
00:01:43,230 --> 00:01:46,320
you can think of it as a
folder containing files.
40
00:01:46,320 --> 00:01:48,120
If you have never used Git before,
41
00:01:48,120 --> 00:01:50,100
we recommend looking at an introduction
42
00:01:50,100 --> 00:01:52,600
like the one provided in
this video's description.
43
00:01:53,850 --> 00:01:56,910
The Git repository allows you
to see the changes happening
44
00:01:56,910 --> 00:02:00,900
over time in this folder,
hence the term versions.
45
00:02:00,900 --> 00:02:03,453
We'll see how to add files
and versions in a bit.
46
00:02:07,020 --> 00:02:09,570
The final tab is the settings tab,
47
00:02:09,570 --> 00:02:12,120
which allows you to manage
your model's visibility
48
00:02:12,120 --> 00:02:13,203
and availability.
49
00:02:14,790 --> 00:02:17,673
Let's first start by adding
files to the repository.
50
00:02:18,540 --> 00:02:19,560
Files can be added
51
00:02:19,560 --> 00:02:23,340
through the web interface
thanks to the add file button.
52
00:02:23,340 --> 00:02:27,060
The added files can be of
any type, python, JSON, text,
53
00:02:27,060 --> 00:02:27,893
you name it.
54
00:02:28,740 --> 00:02:31,170
Alongside your added file and its content,
55
00:02:31,170 --> 00:02:33,363
you should name your change or commit.
56
00:02:36,330 --> 00:02:38,400
Generally, adding files is simpler
57
00:02:38,400 --> 00:02:40,770
by using the Hugging
Face Hub Python library
58
00:02:40,770 --> 00:02:43,050
or by using the command-line.
59
00:02:43,050 --> 00:02:44,310
We'll showcase how to do this
60
00:02:44,310 --> 00:02:46,290
using the Hugging Face Hub Python library,
61
00:02:46,290 --> 00:02:48,060
and there is a link in the description
62
00:02:48,060 --> 00:02:49,800
to the previous version of this video,
63
00:02:49,800 --> 00:02:52,743
showcasing how to do this
using Git and the command-line.
64
00:02:53,610 --> 00:02:54,840
First, make sure you're logged
65
00:02:54,840 --> 00:02:56,460
into your Hugging Face account,
66
00:02:56,460 --> 00:02:59,523
either through the command-line
or in a Python runtime.
67
00:03:04,634 --> 00:03:06,390
The first approach we'll take a look at
68
00:03:06,390 --> 00:03:08,880
is using the upload file method.
69
00:03:08,880 --> 00:03:10,770
This offers an extremely simple API
70
00:03:10,770 --> 00:03:12,630
to upload files through the hub.
71
00:03:12,630 --> 00:03:14,190
The three required parameters
72
00:03:14,190 --> 00:03:16,083
are the current location of the file,
73
00:03:18,570 --> 00:03:21,300
the path of that file in the repository,
74
00:03:21,300 --> 00:03:24,050
and the idea of the repository
to which you're pushing.
75
00:03:25,650 --> 00:03:27,930
There are a few additional parameters.
76
00:03:27,930 --> 00:03:29,100
The token parameter,
77
00:03:29,100 --> 00:03:31,200
if you would like to
specify a different token
78
00:03:31,200 --> 00:03:33,650
than the one saved in your
cache with your login,
79
00:03:34,830 --> 00:03:36,750
the repo type parameter,
80
00:03:36,750 --> 00:03:40,503
if you would like to push
to a data set or a space.
81
00:03:42,300 --> 00:03:45,690
We'll upload a file called
readme.md to the repository
82
00:03:45,690 --> 00:03:47,190
using this method.
83
00:03:47,190 --> 00:03:49,710
We first start by saving
a file with that name,
84
00:03:49,710 --> 00:03:51,210
which contains some information
85
00:03:51,210 --> 00:03:52,920
about the repository itself.
86
00:03:52,920 --> 00:03:54,243
Here, a title.
87
00:03:55,950 --> 00:03:57,420
Now that the file is saved,
88
00:03:57,420 --> 00:04:00,513
let's use the upload file
method to upload it to the hub.
89
00:04:01,560 --> 00:04:03,540
If we switch to the web
interface for a second
90
00:04:03,540 --> 00:04:07,080
and refresh the page, we'll
see that the README is shown.
91
00:04:07,080 --> 00:04:08,883
The file upload was a success.
92
00:04:10,170 --> 00:04:13,500
Alongside this method
exists a delete file method
93
00:04:13,500 --> 00:04:16,170
so that you may manage
your repository fully.
94
00:04:16,170 --> 00:04:18,820
We'll use it to delete the
file we have just created.
95
00:04:22,860 --> 00:04:25,320
If we refresh the page once again, good,
96
00:04:25,320 --> 00:04:26,973
the file was indeed deleted.
97
00:04:29,070 --> 00:04:32,730
This approach using only these
two methods is super simple.
98
00:04:32,730 --> 00:04:35,400
It doesn't need Git or Git LFS installed,
99
00:04:35,400 --> 00:04:37,650
but it does come with a limitation.
100
00:04:37,650 --> 00:04:39,600
The maximum file size one can upload
101
00:04:39,600 --> 00:04:41,313
is limited to five gigabytes.
102
00:04:42,360 --> 00:04:43,890
To overcome this limit,
103
00:04:43,890 --> 00:04:45,540
let's take a look at the second method
104
00:04:45,540 --> 00:04:47,643
which is the repository utility.
105
00:04:48,600 --> 00:04:51,840
This class is a wrapper over
Git and Git LFS methods,
106
00:04:51,840 --> 00:04:53,850
which abstracts most of the complexity
107
00:04:53,850 --> 00:04:55,500
and offers a flexible API
108
00:04:55,500 --> 00:04:57,990
to manage your online repositories.
109
00:04:57,990 --> 00:04:59,690
Let's take a look at how it works.
110
00:05:03,870 --> 00:05:08,369
We first start by instantiating
the repository utility.
111
00:05:08,369 --> 00:05:10,380
We provide the clone from parameter,
112
00:05:10,380 --> 00:05:13,383
in order to clone the
repository we just created.
113
00:05:14,400 --> 00:05:18,750
The repository is now
cloned in the local folder.
114
00:05:18,750 --> 00:05:22,200
The repo object that we
have just initialized
115
00:05:22,200 --> 00:05:24,873
offers quite a few methods
which are useful for us.
116
00:05:25,920 --> 00:05:28,800
We're interested in
pushing a model to the hub.
117
00:05:28,800 --> 00:05:31,170
I'll start by loading
a model and tokenizer
118
00:05:31,170 --> 00:05:32,643
I trained a few hours ago.
119
00:05:34,380 --> 00:05:36,810
We'll now follow the
traditional Git approach
120
00:05:36,810 --> 00:05:38,670
by first pulling latest changes
121
00:05:38,670 --> 00:05:40,053
using the Git pull method.
122
00:05:40,980 --> 00:05:43,170
We just cloned the repository,
123
00:05:43,170 --> 00:05:45,780
so unless this is a
super active repository,
124
00:05:45,780 --> 00:05:48,660
it's unlikely that new
changes are available.
125
00:05:48,660 --> 00:05:51,000
But it's always a good idea
to pull the latest changes
126
00:05:51,000 --> 00:05:52,300
before doing anything new.
127
00:05:53,220 --> 00:05:55,200
Now that we have pulled the repository,
128
00:05:55,200 --> 00:05:58,500
I'll save the model and
tokenizer inside that folder.
129
00:05:58,500 --> 00:06:01,200
This includes the model
weights, configuration file,
130
00:06:01,200 --> 00:06:02,673
and tokenizer files.
131
00:06:04,440 --> 00:06:05,820
Now that the model is saved,
132
00:06:05,820 --> 00:06:07,890
we'll continue with the
traditional Git approach
133
00:06:07,890 --> 00:06:10,620
and push it to the remote repository.
134
00:06:10,620 --> 00:06:12,150
If we were using the command-line,
135
00:06:12,150 --> 00:06:14,250
there are a few Git LFS specific commands
136
00:06:14,250 --> 00:06:15,600
we would have to invoke.
137
00:06:15,600 --> 00:06:17,940
But here, the Hugging Face hub package
138
00:06:17,940 --> 00:06:20,070
takes care of all of that.
139
00:06:20,070 --> 00:06:24,420
We'll start by staging the
files using the Git add method.
140
00:06:24,420 --> 00:06:27,600
We'll then commit these changes
using Git commit method,
141
00:06:27,600 --> 00:06:30,690
and providing a helpful commit message.
142
00:06:30,690 --> 00:06:33,210
Finally, we'll push the
changes to the remote,
143
00:06:33,210 --> 00:06:34,953
using the Git push method.
144
00:06:45,090 --> 00:06:47,430
If we go back to the
files and versions tab,
145
00:06:47,430 --> 00:06:49,950
we can now see the newly committed files.
146
00:06:49,950 --> 00:06:52,600
We can even play with the
model in the inference API.
147
00:06:53,790 --> 00:06:55,770
Unfortunately, the front page of our model
148
00:06:55,770 --> 00:06:57,540
is still very empty.
149
00:06:57,540 --> 00:06:59,280
Let's add a README markdown file
150
00:06:59,280 --> 00:07:00,753
to complete it a little bit.
151
00:07:01,710 --> 00:07:04,200
This README is known as the model card
152
00:07:04,200 --> 00:07:06,030
and it's arguably as important
153
00:07:06,030 --> 00:07:09,330
as the model and tokenizer
files in the model repository.
154
00:07:09,330 --> 00:07:11,280
It is the central definition
155
00:07:11,280 --> 00:07:13,200
and documentation of your model,
156
00:07:13,200 --> 00:07:16,440
ensuring reusability by
fellow community members
157
00:07:16,440 --> 00:07:18,480
and reproducibility of results.
158
00:07:18,480 --> 00:07:20,760
Providing a platform
on which other members
159
00:07:20,760 --> 00:07:22,293
may build their artifacts.
160
00:07:23,220 --> 00:07:25,590
We'll only add a title and
a small description here
161
00:07:25,590 --> 00:07:27,060
for simplicity's sake,
162
00:07:27,060 --> 00:07:29,370
but we encourage you to
add information relevant
163
00:07:29,370 --> 00:07:30,990
to how was the model trained,
164
00:07:30,990 --> 00:07:33,120
it's intended use and limitations,
165
00:07:33,120 --> 00:07:36,180
as well as it's identified
potential biases,
166
00:07:36,180 --> 00:07:37,440
evaluation results,
167
00:07:37,440 --> 00:07:39,843
and code samples on how to use your model.
168
00:07:41,460 --> 00:07:44,130
Great work contributing
a model to the Model Hub.
169
00:07:44,130 --> 00:07:46,440
This model can now be used
in downstream libraries
170
00:07:46,440 --> 00:07:48,783
simply by specifying
your model identifier.