1 00:00:00,519 --> 00:00:03,186 (logo swooshes) 2 00:00:05,310 --> 00:00:08,483 - How to instantiate a Transformers model. 3 00:00:08,483 --> 00:00:11,790 In this video, we'll look at how we can create a user model 4 00:00:11,790 --> 00:00:13,290 from the Transformers library. 5 00:00:14,310 --> 00:00:17,100 As we have seen before the AutoModel class allows 6 00:00:17,100 --> 00:00:19,140 you to instantiate a pretrained model 7 00:00:19,140 --> 00:00:21,513 from any checkpoint on the Hugging Face Hub. 8 00:00:22,350 --> 00:00:23,910 It'll pick the right model class 9 00:00:23,910 --> 00:00:26,654 from the library to instantiate the proper architecture 10 00:00:26,654 --> 00:00:29,793 and loads of weights as the pretrained model inside. 11 00:00:30,690 --> 00:00:33,810 As we can see, when given a BERT checkpoint 12 00:00:33,810 --> 00:00:38,043 we end up with a BertModel and similarly, for GPT-2 or BART. 13 00:00:40,020 --> 00:00:42,360 Behind the scenes,this API can take the name 14 00:00:42,360 --> 00:00:44,250 of a checkpoint on the Hub 15 00:00:44,250 --> 00:00:46,980 in which case it will download and cache the configuration 16 00:00:46,980 --> 00:00:48,843 file as well as a model weights file. 17 00:00:49,698 --> 00:00:52,710 You can also specify the path to a local folder 18 00:00:52,710 --> 00:00:55,290 that contains a valid configuration file and a 19 00:00:55,290 --> 00:00:56,390 model of weights file. 20 00:00:57,600 --> 00:00:59,479 To instantiate the pretrained model, 21 00:00:59,479 --> 00:01:01,950 the AutoModel API will first open the configuration 22 00:01:01,950 --> 00:01:05,403 file to look at a configuration class that should be used. 23 00:01:06,420 --> 00:01:08,580 The configuration class depends on the type 24 00:01:08,580 --> 00:01:12,663 of the model BERT, GPT-2 or BART for instance. 25 00:01:13,680 --> 00:01:15,930 Once it has a proper configuration class, 26 00:01:15,930 --> 00:01:18,390 it can instantiate that configuration 27 00:01:18,390 --> 00:01:21,900 which is a blueprint to know how to create the model. 28 00:01:21,900 --> 00:01:24,240 It also uses this configuration class to 29 00:01:24,240 --> 00:01:27,150 find the proper model class, which is then combined 30 00:01:27,150 --> 00:01:29,823 with the loaded configuration to load the model. 31 00:01:30,904 --> 00:01:33,210 This model is not yet a pretrained model 32 00:01:33,210 --> 00:01:35,883 as it has just been initialized with random weights. 33 00:01:36,840 --> 00:01:39,810 The last step is to load the weight from the model file 34 00:01:39,810 --> 00:01:40,923 inside this model. 35 00:01:42,330 --> 00:01:44,250 To easily load the configuration of a model 36 00:01:44,250 --> 00:01:46,410 from any checkpoint or folder containing 37 00:01:46,410 --> 00:01:48,210 the configuration file. 38 00:01:48,210 --> 00:01:50,373 We can use the AutoConfig class. 39 00:01:51,240 --> 00:01:52,693 Like the AutoModel class, 40 00:01:52,693 --> 00:01:55,693 it will pick the right configuration class from the library. 41 00:01:57,060 --> 00:01:59,220 We can also use a specific class corresponding 42 00:01:59,220 --> 00:02:01,470 to a checkpoint, but we will need to change 43 00:02:01,470 --> 00:02:03,000 the code each time we want to try 44 00:02:03,000 --> 00:02:04,550 a different model architecture. 45 00:02:06,030 --> 00:02:07,860 As we said before, the configuration 46 00:02:07,860 --> 00:02:10,350 of a model is a blueprint that contains all the 47 00:02:10,350 --> 00:02:13,830 information necessary to create the model architecture. 48 00:02:13,830 --> 00:02:15,990 For instance, the BERT model associated 49 00:02:15,990 --> 00:02:19,980 with the bert-base-cased checkpoint has 12 layers, 50 00:02:19,980 --> 00:02:24,980 a hidden side of 768 and a vocabulary side of 28,996. 51 00:02:28,020 --> 00:02:29,910 Once we have the configuration, 52 00:02:29,910 --> 00:02:31,950 we can create a model that does the same architecture 53 00:02:31,950 --> 00:02:35,280 as our checkpoint, but is randomly initialized. 54 00:02:35,280 --> 00:02:36,660 We can then train it from scratch. 55 00:02:36,660 --> 00:02:38,010 Like any bio PyTorch module 56 00:02:39,497 --> 00:02:40,380 We can also change any part 57 00:02:40,380 --> 00:02:43,200 of the configuration by using keyword arguments. 58 00:02:43,200 --> 00:02:46,138 The second snippet of code instantiates 59 00:02:46,138 --> 00:02:48,360 a randomly initialized BERT model 60 00:02:48,360 --> 00:02:50,403 with 10 layers instead of 12. 61 00:02:51,409 --> 00:02:55,051 Saving a model once it's trained or fine-tuned is very easy. 62 00:02:55,051 --> 00:02:57,603 We just have to use a safe pretrained method. 63 00:02:58,500 --> 00:03:01,417 Here the model will be saved in a folder named 64 00:03:01,417 --> 00:03:04,473 "my-bert-model" inside the current working directory. 65 00:03:05,400 --> 00:03:08,255 Such a model can then be reloaded using the form 66 00:03:08,255 --> 00:03:09,596 pretrained method. 67 00:03:09,596 --> 00:03:11,250 To learn how to easily approach this model 68 00:03:11,250 --> 00:03:13,473 to that, check out the push to a video.