subtitles/en/11_instantiate-a-transformers-model-(tensorflow).srt (246 lines of code) (raw):

1 00:00:00,125 --> 00:00:02,958 (whooshing sound) 2 00:00:05,463 --> 00:00:08,820 - How to instantiate the Transformers model? 3 00:00:08,820 --> 00:00:11,250 In this video, we will look at how we can create 4 00:00:11,250 --> 00:00:13,550 and use a model from the Transformers library. 5 00:00:15,000 --> 00:00:17,850 As we've seen before, the TFAutoModel class 6 00:00:17,850 --> 00:00:20,100 allows you to instantiate a pre-trained model 7 00:00:20,100 --> 00:00:22,503 from any checkpoint on the Hugging Face Hub. 8 00:00:23,430 --> 00:00:25,620 It will pick the right model class from the library 9 00:00:25,620 --> 00:00:27,750 to instantiate the proper architecture 10 00:00:27,750 --> 00:00:31,200 and load the weights of the pre-trained model inside. 11 00:00:31,200 --> 00:00:34,020 As we can see, when given a BERT checkpoint, 12 00:00:34,020 --> 00:00:36,090 we end up with a TFBertModel, 13 00:00:36,090 --> 00:00:38,553 and similarly for GPT2 or BART. 14 00:00:40,170 --> 00:00:42,510 Behind the scenes, this API can take the name 15 00:00:42,510 --> 00:00:44,040 of a checkpoint on the Hub, 16 00:00:44,040 --> 00:00:45,810 in which case it will download and cache 17 00:00:45,810 --> 00:00:48,660 the configuration file as well as the model weights file. 18 00:00:49,590 --> 00:00:52,020 You can also specify the path to a local folder 19 00:00:52,020 --> 00:00:54,090 that contains a valid configuration file 20 00:00:54,090 --> 00:00:55,340 and a model weights file. 21 00:00:56,670 --> 00:00:58,167 To instantiate the pre-trained model, 22 00:00:58,167 --> 00:01:02,400 the TFAutoModel API will first open the configuration file 23 00:01:02,400 --> 00:01:05,253 to look at the configuration class that should be used. 24 00:01:06,390 --> 00:01:09,660 The configuration class depends on the type of the model, 25 00:01:09,660 --> 00:01:12,333 BERT, GPT2 or BART for instance. 26 00:01:13,320 --> 00:01:15,720 Once it has the proper configuration class, 27 00:01:15,720 --> 00:01:18,000 it can instantiate that configuration, 28 00:01:18,000 --> 00:01:21,090 which is a blueprint to know how to create the model. 29 00:01:21,090 --> 00:01:22,770 It also uses this configuration class 30 00:01:22,770 --> 00:01:24,750 to find the proper model class, 31 00:01:24,750 --> 00:01:27,120 which is combined with the loaded configuration 32 00:01:27,120 --> 00:01:28,143 to load the model. 33 00:01:29,250 --> 00:01:31,800 This model is not yet our pre-trained model 34 00:01:31,800 --> 00:01:34,560 as it has just been initialized with random weights. 35 00:01:34,560 --> 00:01:36,690 The last step is to load the weights 36 00:01:36,690 --> 00:01:38,973 from the model file inside this model. 37 00:01:40,230 --> 00:01:42,270 To easily load the configuration of a model 38 00:01:42,270 --> 00:01:44,220 from any checkpoint or a folder 39 00:01:44,220 --> 00:01:46,170 containing the configuration file, 40 00:01:46,170 --> 00:01:47,790 we can use the AutoConfig class. 41 00:01:47,790 --> 00:01:50,460 Like the TFAutoModel class, 42 00:01:50,460 --> 00:01:54,210 it will pick the right configuration class from the library. 43 00:01:54,210 --> 00:01:56,040 We can also use the specific class 44 00:01:56,040 --> 00:01:57,840 corresponding to a checkpoint, 45 00:01:57,840 --> 00:01:59,430 but we will need to change the code 46 00:01:59,430 --> 00:02:02,230 each time we want to try a different model architecture. 47 00:02:03,180 --> 00:02:05,353 As we said before, the configuration of a model 48 00:02:05,353 --> 00:02:08,610 is a blueprint that contains all the information necessary 49 00:02:08,610 --> 00:02:11,070 to create the model architecture. 50 00:02:11,070 --> 00:02:12,750 For instance, the BERT model 51 00:02:12,750 --> 00:02:15,510 associated with the bert-base-cased checkpoint 52 00:02:15,510 --> 00:02:19,710 has 12 layers, a hidden size of 768, 53 00:02:19,710 --> 00:02:23,403 and a vocabulary size of 28,996. 54 00:02:24,810 --> 00:02:26,670 Once we have the configuration, 55 00:02:26,670 --> 00:02:28,890 we can create a model that has the same architecture 56 00:02:28,890 --> 00:02:32,160 as our checkpoint but is randomly initialized. 57 00:02:32,160 --> 00:02:36,030 We can then train it from scratch like any TensorFlow model. 58 00:02:36,030 --> 00:02:38,063 We can also change any part of the configuration 59 00:02:38,063 --> 00:02:40,770 by using keyword arguments. 60 00:02:40,770 --> 00:02:43,110 The second snippet of code instantiates 61 00:02:43,110 --> 00:02:44,970 a randomly initialized BERT model 62 00:02:44,970 --> 00:02:46,983 with 10 layers instead of 12. 63 00:02:48,240 --> 00:02:51,360 Saving a model once it's trained or fine-tuned is very easy. 64 00:02:51,360 --> 00:02:53,880 We just have to use the save_pretrained method. 65 00:02:53,880 --> 00:02:55,980 Here, the model will be saved in a folder 66 00:02:55,980 --> 00:02:59,463 named my-bert-model inside the current working directory. 67 00:03:00,480 --> 00:03:02,250 Such a model can then be reloaded 68 00:03:02,250 --> 00:03:04,500 using the from_pretrained method. 69 00:03:04,500 --> 00:03:06,600 To run it to a projects model to the Hub, 70 00:03:06,600 --> 00:03:08,350 check out the push (mumbles) video. 71 00:03:09,355 --> 00:03:12,188 (whooshing sound)