subtitles/en/10_instantiate-a-transformers-model-(pytorch).srt (240 lines of code) (raw):
1
00:00:00,519 --> 00:00:03,186
(logo swooshes)
2
00:00:05,310 --> 00:00:08,483
- How to instantiate a Transformers model.
3
00:00:08,483 --> 00:00:11,790
In this video, we'll look at
how we can create a user model
4
00:00:11,790 --> 00:00:13,290
from the Transformers library.
5
00:00:14,310 --> 00:00:17,100
As we have seen before
the AutoModel class allows
6
00:00:17,100 --> 00:00:19,140
you to instantiate a pretrained model
7
00:00:19,140 --> 00:00:21,513
from any checkpoint on
the Hugging Face Hub.
8
00:00:22,350 --> 00:00:23,910
It'll pick the right model class
9
00:00:23,910 --> 00:00:26,654
from the library to instantiate
the proper architecture
10
00:00:26,654 --> 00:00:29,793
and loads of weights as the
pretrained model inside.
11
00:00:30,690 --> 00:00:33,810
As we can see, when
given a BERT checkpoint
12
00:00:33,810 --> 00:00:38,043
we end up with a BertModel and
similarly, for GPT-2 or BART.
13
00:00:40,020 --> 00:00:42,360
Behind the scenes,this
API can take the name
14
00:00:42,360 --> 00:00:44,250
of a checkpoint on the Hub
15
00:00:44,250 --> 00:00:46,980
in which case it will download
and cache the configuration
16
00:00:46,980 --> 00:00:48,843
file as well as a model weights file.
17
00:00:49,698 --> 00:00:52,710
You can also specify the
path to a local folder
18
00:00:52,710 --> 00:00:55,290
that contains a valid
configuration file and a
19
00:00:55,290 --> 00:00:56,390
model of weights file.
20
00:00:57,600 --> 00:00:59,479
To instantiate the pretrained model,
21
00:00:59,479 --> 00:01:01,950
the AutoModel API will
first open the configuration
22
00:01:01,950 --> 00:01:05,403
file to look at a configuration
class that should be used.
23
00:01:06,420 --> 00:01:08,580
The configuration class
depends on the type
24
00:01:08,580 --> 00:01:12,663
of the model BERT, GPT-2
or BART for instance.
25
00:01:13,680 --> 00:01:15,930
Once it has a proper configuration class,
26
00:01:15,930 --> 00:01:18,390
it can instantiate that configuration
27
00:01:18,390 --> 00:01:21,900
which is a blueprint to know
how to create the model.
28
00:01:21,900 --> 00:01:24,240
It also uses this configuration class to
29
00:01:24,240 --> 00:01:27,150
find the proper model class,
which is then combined
30
00:01:27,150 --> 00:01:29,823
with the loaded configuration
to load the model.
31
00:01:30,904 --> 00:01:33,210
This model is not yet a pretrained model
32
00:01:33,210 --> 00:01:35,883
as it has just been initialized
with random weights.
33
00:01:36,840 --> 00:01:39,810
The last step is to load the
weight from the model file
34
00:01:39,810 --> 00:01:40,923
inside this model.
35
00:01:42,330 --> 00:01:44,250
To easily load the
configuration of a model
36
00:01:44,250 --> 00:01:46,410
from any checkpoint or folder containing
37
00:01:46,410 --> 00:01:48,210
the configuration file.
38
00:01:48,210 --> 00:01:50,373
We can use the AutoConfig class.
39
00:01:51,240 --> 00:01:52,693
Like the AutoModel class,
40
00:01:52,693 --> 00:01:55,693
it will pick the right configuration
class from the library.
41
00:01:57,060 --> 00:01:59,220
We can also use a specific
class corresponding
42
00:01:59,220 --> 00:02:01,470
to a checkpoint, but
we will need to change
43
00:02:01,470 --> 00:02:03,000
the code each time we want to try
44
00:02:03,000 --> 00:02:04,550
a different model architecture.
45
00:02:06,030 --> 00:02:07,860
As we said before, the configuration
46
00:02:07,860 --> 00:02:10,350
of a model is a blueprint
that contains all the
47
00:02:10,350 --> 00:02:13,830
information necessary to
create the model architecture.
48
00:02:13,830 --> 00:02:15,990
For instance, the BERT model associated
49
00:02:15,990 --> 00:02:19,980
with the bert-base-cased
checkpoint has 12 layers,
50
00:02:19,980 --> 00:02:24,980
a hidden side of 768 and a
vocabulary side of 28,996.
51
00:02:28,020 --> 00:02:29,910
Once we have the configuration,
52
00:02:29,910 --> 00:02:31,950
we can create a model that
does the same architecture
53
00:02:31,950 --> 00:02:35,280
as our checkpoint, but
is randomly initialized.
54
00:02:35,280 --> 00:02:36,660
We can then train it from scratch.
55
00:02:36,660 --> 00:02:38,010
Like any bio PyTorch module
56
00:02:39,497 --> 00:02:40,380
We can also change any part
57
00:02:40,380 --> 00:02:43,200
of the configuration by
using keyword arguments.
58
00:02:43,200 --> 00:02:46,138
The second snippet of code instantiates
59
00:02:46,138 --> 00:02:48,360
a randomly initialized BERT model
60
00:02:48,360 --> 00:02:50,403
with 10 layers instead of 12.
61
00:02:51,409 --> 00:02:55,051
Saving a model once it's trained
or fine-tuned is very easy.
62
00:02:55,051 --> 00:02:57,603
We just have to use a
safe pretrained method.
63
00:02:58,500 --> 00:03:01,417
Here the model will be
saved in a folder named
64
00:03:01,417 --> 00:03:04,473
"my-bert-model" inside the
current working directory.
65
00:03:05,400 --> 00:03:08,255
Such a model can then be
reloaded using the form
66
00:03:08,255 --> 00:03:09,596
pretrained method.
67
00:03:09,596 --> 00:03:11,250
To learn how to easily approach this model
68
00:03:11,250 --> 00:03:13,473
to that, check out the push to a video.