1 00:00:00,430 --> 00:00:03,013 (upbeat music) 2 00:00:05,160 --> 00:00:07,080 - In this video, I'm going to give you 3 00:00:07,080 --> 00:00:10,350 a very quick introduction to how our transformer models 4 00:00:10,350 --> 00:00:14,040 work together with Tensorflow and Keras. 5 00:00:14,040 --> 00:00:15,510 The very short explanation 6 00:00:15,510 --> 00:00:17,310 is that all of our Tensorflow models 7 00:00:17,310 --> 00:00:19,470 are also Keras model objects, 8 00:00:19,470 --> 00:00:22,950 and so they have the standard Keras model API. 9 00:00:22,950 --> 00:00:24,960 If you're an experienced machine learning engineer 10 00:00:24,960 --> 00:00:28,230 who's used Keras a lot, that's probably all you need to know 11 00:00:28,230 --> 00:00:29,610 to start working with them. 12 00:00:29,610 --> 00:00:30,900 But for everyone else, 13 00:00:30,900 --> 00:00:34,170 including the prodigal PyTorch engineers out there 14 00:00:34,170 --> 00:00:35,910 who are returning to the fold, 15 00:00:35,910 --> 00:00:38,430 I'm going to quickly introduce Keras models, 16 00:00:38,430 --> 00:00:40,440 and how we work with them. 17 00:00:40,440 --> 00:00:43,080 In other videos, which I'll link below, 18 00:00:43,080 --> 00:00:46,440 I'll run through training with Keras models in more detail. 19 00:00:46,440 --> 00:00:50,820 But first, at a high level, what is a Keras model? 20 00:00:50,820 --> 00:00:54,810 So your model basically contains your entire network. 21 00:00:54,810 --> 00:00:58,230 It contains the layers, and the weights for those layers, 22 00:00:58,230 --> 00:01:00,690 and also tells the model what to do with them 23 00:01:00,690 --> 00:01:02,880 so it defines the whole path all the way 24 00:01:02,880 --> 00:01:05,460 from your inputs to your outputs. 25 00:01:05,460 --> 00:01:07,380 If you've used Keras before, 26 00:01:07,380 --> 00:01:09,480 you probably started using model objects 27 00:01:09,480 --> 00:01:11,850 by building them out by hand, 28 00:01:11,850 --> 00:01:14,250 you added one layer after another 29 00:01:14,250 --> 00:01:18,690 and maybe using the model.add() or the functional approach. 30 00:01:18,690 --> 00:01:20,490 And there's nothing wrong with that. 31 00:01:21,390 --> 00:01:23,430 Lots of great models are built that way 32 00:01:23,430 --> 00:01:26,970 but you can also pre-load an entire model, weights and all. 33 00:01:26,970 --> 00:01:29,994 And this is really helpful, because if you, 34 00:01:29,994 --> 00:01:32,490 as you can see here, if you try reading the paper 35 00:01:32,490 --> 00:01:34,110 or if you try looking at the code, 36 00:01:34,110 --> 00:01:37,350 you'll see the inside of a Transformer is pretty complex, 37 00:01:37,350 --> 00:01:40,110 and writing it all out from scratch and getting it right 38 00:01:40,110 --> 00:01:41,850 would be hard even for an experienced 39 00:01:41,850 --> 00:01:43,500 machine learning engineer. 40 00:01:43,500 --> 00:01:45,870 But because it's all packed inside a model, 41 00:01:45,870 --> 00:01:48,150 you don't need to worry about that complexity on that 42 00:01:48,150 --> 00:01:49,140 if you don't want to. 43 00:01:49,140 --> 00:01:51,570 If you're a researcher, if you want to really dig in there 44 00:01:51,570 --> 00:01:55,650 you can, but you can also just load a pre-trained, 45 00:01:55,650 --> 00:01:59,013 pre-configured transformer model in just one line of code. 46 00:02:00,150 --> 00:02:03,480 And when I mentioned earlier about the Keras API, 47 00:02:03,480 --> 00:02:04,560 the advantage of it is that 48 00:02:04,560 --> 00:02:06,690 whether you write your own model from scratch 49 00:02:06,690 --> 00:02:09,510 or load a pre-trained one, you interact with the model 50 00:02:09,510 --> 00:02:11,850 through that same API, so you use exactly 51 00:02:11,850 --> 00:02:13,950 the same few methods and you're gonna see them 52 00:02:13,950 --> 00:02:16,380 again and again, these methods like fit, 53 00:02:16,380 --> 00:02:19,650 compile and predict, and like I've mentioned 54 00:02:19,650 --> 00:02:22,530 we'll cover concrete examples of how to use those methods 55 00:02:22,530 --> 00:02:24,330 in the videos I'll link below. 56 00:02:24,330 --> 00:02:27,000 For now the key thing to take away from this video, 57 00:02:27,000 --> 00:02:28,950 if you've never seen Keras before, 58 00:02:28,950 --> 00:02:30,870 is that this neat encapsulation means 59 00:02:30,870 --> 00:02:33,090 that all the complexity of a huge neural net 60 00:02:33,090 --> 00:02:35,430 becomes manageable, because you interact with it 61 00:02:35,430 --> 00:02:39,000 in exactly the same way, using exactly the same methods, 62 00:02:39,000 --> 00:02:41,700 whether it's a huge pre-trained language model 63 00:02:41,700 --> 00:02:43,950 or a simple model that you wrote out by hand. 64 00:02:45,466 --> 00:02:48,049 (upbeat music)