subtitles/en/11_instantiate-a-transformers-model-(tensorflow).srt (246 lines of code) (raw):
1
00:00:00,125 --> 00:00:02,958
(whooshing sound)
2
00:00:05,463 --> 00:00:08,820
- How to instantiate
the Transformers model?
3
00:00:08,820 --> 00:00:11,250
In this video, we will
look at how we can create
4
00:00:11,250 --> 00:00:13,550
and use a model from the
Transformers library.
5
00:00:15,000 --> 00:00:17,850
As we've seen before,
the TFAutoModel class
6
00:00:17,850 --> 00:00:20,100
allows you to instantiate
a pre-trained model
7
00:00:20,100 --> 00:00:22,503
from any checkpoint on
the Hugging Face Hub.
8
00:00:23,430 --> 00:00:25,620
It will pick the right
model class from the library
9
00:00:25,620 --> 00:00:27,750
to instantiate the proper architecture
10
00:00:27,750 --> 00:00:31,200
and load the weights of the
pre-trained model inside.
11
00:00:31,200 --> 00:00:34,020
As we can see, when
given a BERT checkpoint,
12
00:00:34,020 --> 00:00:36,090
we end up with a TFBertModel,
13
00:00:36,090 --> 00:00:38,553
and similarly for GPT2 or BART.
14
00:00:40,170 --> 00:00:42,510
Behind the scenes, this
API can take the name
15
00:00:42,510 --> 00:00:44,040
of a checkpoint on the Hub,
16
00:00:44,040 --> 00:00:45,810
in which case it will download and cache
17
00:00:45,810 --> 00:00:48,660
the configuration file as well
as the model weights file.
18
00:00:49,590 --> 00:00:52,020
You can also specify the
path to a local folder
19
00:00:52,020 --> 00:00:54,090
that contains a valid configuration file
20
00:00:54,090 --> 00:00:55,340
and a model weights file.
21
00:00:56,670 --> 00:00:58,167
To instantiate the pre-trained model,
22
00:00:58,167 --> 00:01:02,400
the TFAutoModel API will first
open the configuration file
23
00:01:02,400 --> 00:01:05,253
to look at the configuration
class that should be used.
24
00:01:06,390 --> 00:01:09,660
The configuration class depends
on the type of the model,
25
00:01:09,660 --> 00:01:12,333
BERT, GPT2 or BART for instance.
26
00:01:13,320 --> 00:01:15,720
Once it has the proper
configuration class,
27
00:01:15,720 --> 00:01:18,000
it can instantiate that configuration,
28
00:01:18,000 --> 00:01:21,090
which is a blueprint to know
how to create the model.
29
00:01:21,090 --> 00:01:22,770
It also uses this configuration class
30
00:01:22,770 --> 00:01:24,750
to find the proper model class,
31
00:01:24,750 --> 00:01:27,120
which is combined with
the loaded configuration
32
00:01:27,120 --> 00:01:28,143
to load the model.
33
00:01:29,250 --> 00:01:31,800
This model is not yet
our pre-trained model
34
00:01:31,800 --> 00:01:34,560
as it has just been initialized
with random weights.
35
00:01:34,560 --> 00:01:36,690
The last step is to load the weights
36
00:01:36,690 --> 00:01:38,973
from the model file inside this model.
37
00:01:40,230 --> 00:01:42,270
To easily load the
configuration of a model
38
00:01:42,270 --> 00:01:44,220
from any checkpoint or a folder
39
00:01:44,220 --> 00:01:46,170
containing the configuration file,
40
00:01:46,170 --> 00:01:47,790
we can use the AutoConfig class.
41
00:01:47,790 --> 00:01:50,460
Like the TFAutoModel class,
42
00:01:50,460 --> 00:01:54,210
it will pick the right configuration
class from the library.
43
00:01:54,210 --> 00:01:56,040
We can also use the specific class
44
00:01:56,040 --> 00:01:57,840
corresponding to a checkpoint,
45
00:01:57,840 --> 00:01:59,430
but we will need to change the code
46
00:01:59,430 --> 00:02:02,230
each time we want to try a
different model architecture.
47
00:02:03,180 --> 00:02:05,353
As we said before, the
configuration of a model
48
00:02:05,353 --> 00:02:08,610
is a blueprint that contains
all the information necessary
49
00:02:08,610 --> 00:02:11,070
to create the model architecture.
50
00:02:11,070 --> 00:02:12,750
For instance, the BERT model
51
00:02:12,750 --> 00:02:15,510
associated with the
bert-base-cased checkpoint
52
00:02:15,510 --> 00:02:19,710
has 12 layers, a hidden size of 768,
53
00:02:19,710 --> 00:02:23,403
and a vocabulary size of 28,996.
54
00:02:24,810 --> 00:02:26,670
Once we have the configuration,
55
00:02:26,670 --> 00:02:28,890
we can create a model that
has the same architecture
56
00:02:28,890 --> 00:02:32,160
as our checkpoint but
is randomly initialized.
57
00:02:32,160 --> 00:02:36,030
We can then train it from scratch
like any TensorFlow model.
58
00:02:36,030 --> 00:02:38,063
We can also change any
part of the configuration
59
00:02:38,063 --> 00:02:40,770
by using keyword arguments.
60
00:02:40,770 --> 00:02:43,110
The second snippet of code instantiates
61
00:02:43,110 --> 00:02:44,970
a randomly initialized BERT model
62
00:02:44,970 --> 00:02:46,983
with 10 layers instead of 12.
63
00:02:48,240 --> 00:02:51,360
Saving a model once it's trained
or fine-tuned is very easy.
64
00:02:51,360 --> 00:02:53,880
We just have to use the
save_pretrained method.
65
00:02:53,880 --> 00:02:55,980
Here, the model will be saved in a folder
66
00:02:55,980 --> 00:02:59,463
named my-bert-model inside
the current working directory.
67
00:03:00,480 --> 00:03:02,250
Such a model can then be reloaded
68
00:03:02,250 --> 00:03:04,500
using the from_pretrained method.
69
00:03:04,500 --> 00:03:06,600
To run it to a projects model to the Hub,
70
00:03:06,600 --> 00:03:08,350
check out the push (mumbles) video.
71
00:03:09,355 --> 00:03:12,188
(whooshing sound)