1
00:00:00,519 --> 00:00:03,186
(logo swooshes)

2
00:00:05,310 --> 00:00:08,483
- How to instantiate a Transformers model.

3
00:00:08,483 --> 00:00:11,790
In this video, we'll look at
how we can create a user model

4
00:00:11,790 --> 00:00:13,290
from the Transformers library.

5
00:00:14,310 --> 00:00:17,100
As we have seen before
the AutoModel class allows

6
00:00:17,100 --> 00:00:19,140
you to instantiate a pretrained model

7
00:00:19,140 --> 00:00:21,513
from any checkpoint on
the Hugging Face Hub.

8
00:00:22,350 --> 00:00:23,910
It'll pick the right model class

9
00:00:23,910 --> 00:00:26,654
from the library to instantiate
the proper architecture

10
00:00:26,654 --> 00:00:29,793
and loads of weights as the
pretrained model inside.

11
00:00:30,690 --> 00:00:33,810
As we can see, when
given a BERT checkpoint

12
00:00:33,810 --> 00:00:38,043
we end up with a BertModel and
similarly, for GPT-2 or BART.

13
00:00:40,020 --> 00:00:42,360
Behind the scenes,this
API can take the name

14
00:00:42,360 --> 00:00:44,250
of a checkpoint on the Hub

15
00:00:44,250 --> 00:00:46,980
in which case it will download
and cache the configuration

16
00:00:46,980 --> 00:00:48,843
file as well as a model weights file.

17
00:00:49,698 --> 00:00:52,710
You can also specify the
path to a local folder

18
00:00:52,710 --> 00:00:55,290
that contains a valid
configuration file and a

19
00:00:55,290 --> 00:00:56,390
model of weights file.

20
00:00:57,600 --> 00:00:59,479
To instantiate the pretrained model,

21
00:00:59,479 --> 00:01:01,950
the AutoModel API will
first open the configuration

22
00:01:01,950 --> 00:01:05,403
file to look at a configuration
class that should be used.

23
00:01:06,420 --> 00:01:08,580
The configuration class
depends on the type

24
00:01:08,580 --> 00:01:12,663
of the model BERT, GPT-2
or BART for instance.

25
00:01:13,680 --> 00:01:15,930
Once it has a proper configuration class,

26
00:01:15,930 --> 00:01:18,390
it can instantiate that configuration

27
00:01:18,390 --> 00:01:21,900
which is a blueprint to know
how to create the model.

28
00:01:21,900 --> 00:01:24,240
It also uses this configuration class to

29
00:01:24,240 --> 00:01:27,150
find the proper model class,
which is then combined

30
00:01:27,150 --> 00:01:29,823
with the loaded configuration
to load the model.

31
00:01:30,904 --> 00:01:33,210
This model is not yet a pretrained model

32
00:01:33,210 --> 00:01:35,883
as it has just been initialized
with random weights.

33
00:01:36,840 --> 00:01:39,810
The last step is to load the
weight from the model file

34
00:01:39,810 --> 00:01:40,923
inside this model.

35
00:01:42,330 --> 00:01:44,250
To easily load the
configuration of a model

36
00:01:44,250 --> 00:01:46,410
from any checkpoint or folder containing

37
00:01:46,410 --> 00:01:48,210
the configuration file.

38
00:01:48,210 --> 00:01:50,373
We can use the AutoConfig class.

39
00:01:51,240 --> 00:01:52,693
Like the AutoModel class,

40
00:01:52,693 --> 00:01:55,693
it will pick the right configuration
class from the library.

41
00:01:57,060 --> 00:01:59,220
We can also use a specific
class corresponding

42
00:01:59,220 --> 00:02:01,470
to a checkpoint, but
we will need to change

43
00:02:01,470 --> 00:02:03,000
the code each time we want to try

44
00:02:03,000 --> 00:02:04,550
a different model architecture.

45
00:02:06,030 --> 00:02:07,860
As we said before, the configuration

46
00:02:07,860 --> 00:02:10,350
of a model is a blueprint
that contains all the

47
00:02:10,350 --> 00:02:13,830
information necessary to
create the model architecture.

48
00:02:13,830 --> 00:02:15,990
For instance, the BERT model associated

49
00:02:15,990 --> 00:02:19,980
with the bert-base-cased
checkpoint has 12 layers,

50
00:02:19,980 --> 00:02:24,980
a hidden side of 768 and a
vocabulary side of 28,996.

51
00:02:28,020 --> 00:02:29,910
Once we have the configuration,

52
00:02:29,910 --> 00:02:31,950
we can create a model that
does the same architecture

53
00:02:31,950 --> 00:02:35,280
as our checkpoint, but
is randomly initialized.

54
00:02:35,280 --> 00:02:36,660
We can then train it from scratch.

55
00:02:36,660 --> 00:02:38,010
Like any bio PyTorch module

56
00:02:39,497 --> 00:02:40,380
We can also change any part

57
00:02:40,380 --> 00:02:43,200
of the configuration by
using keyword arguments.

58
00:02:43,200 --> 00:02:46,138
The second snippet of code instantiates

59
00:02:46,138 --> 00:02:48,360
a randomly initialized BERT model

60
00:02:48,360 --> 00:02:50,403
with 10 layers instead of 12.

61
00:02:51,409 --> 00:02:55,051
Saving a model once it's trained
or fine-tuned is very easy.

62
00:02:55,051 --> 00:02:57,603
We just have to use a
safe pretrained method.

63
00:02:58,500 --> 00:03:01,417
Here the model will be
saved in a folder named

64
00:03:01,417 --> 00:03:04,473
"my-bert-model" inside the
current working directory.

65
00:03:05,400 --> 00:03:08,255
Such a model can then be
reloaded using the form

66
00:03:08,255 --> 00:03:09,596
pretrained method.

67
00:03:09,596 --> 00:03:11,250
To learn how to easily approach this model

68
00:03:11,250 --> 00:03:13,473
to that, check out the push to a video.