1
00:00:00,430 --> 00:00:03,013
(upbeat music)

2
00:00:05,160 --> 00:00:07,080
- In this video, I'm going to give you

3
00:00:07,080 --> 00:00:10,350
a very quick introduction to
how our transformer models

4
00:00:10,350 --> 00:00:14,040
work together with Tensorflow and Keras.

5
00:00:14,040 --> 00:00:15,510
The very short explanation

6
00:00:15,510 --> 00:00:17,310
is that all of our Tensorflow models

7
00:00:17,310 --> 00:00:19,470
are also Keras model objects,

8
00:00:19,470 --> 00:00:22,950
and so they have the
standard Keras model API.

9
00:00:22,950 --> 00:00:24,960
If you're an experienced
machine learning engineer

10
00:00:24,960 --> 00:00:28,230
who's used Keras a lot, that's
probably all you need to know

11
00:00:28,230 --> 00:00:29,610
to start working with them.

12
00:00:29,610 --> 00:00:30,900
But for everyone else,

13
00:00:30,900 --> 00:00:34,170
including the prodigal
PyTorch engineers out there

14
00:00:34,170 --> 00:00:35,910
who are returning to the fold,

15
00:00:35,910 --> 00:00:38,430
I'm going to quickly
introduce Keras models,

16
00:00:38,430 --> 00:00:40,440
and how we work with them.

17
00:00:40,440 --> 00:00:43,080
In other videos, which I'll link below,

18
00:00:43,080 --> 00:00:46,440
I'll run through training with
Keras models in more detail.

19
00:00:46,440 --> 00:00:50,820
But first, at a high level,
what is a Keras model?

20
00:00:50,820 --> 00:00:54,810
So your model basically
contains your entire network.

21
00:00:54,810 --> 00:00:58,230
It contains the layers, and
the weights for those layers,

22
00:00:58,230 --> 00:01:00,690
and also tells the model
what to do with them

23
00:01:00,690 --> 00:01:02,880
so it defines the whole path all the way

24
00:01:02,880 --> 00:01:05,460
from your inputs to your outputs.

25
00:01:05,460 --> 00:01:07,380
If you've used Keras before,

26
00:01:07,380 --> 00:01:09,480
you probably started using model objects

27
00:01:09,480 --> 00:01:11,850
by building them out by hand,

28
00:01:11,850 --> 00:01:14,250
you added one layer after another

29
00:01:14,250 --> 00:01:18,690
and maybe using the model.add()
or the functional approach.

30
00:01:18,690 --> 00:01:20,490
And there's nothing wrong with that.

31
00:01:21,390 --> 00:01:23,430
Lots of great models are built that way

32
00:01:23,430 --> 00:01:26,970
but you can also pre-load an
entire model, weights and all.

33
00:01:26,970 --> 00:01:29,994
And this is really
helpful, because if you,

34
00:01:29,994 --> 00:01:32,490
as you can see here, if
you try reading the paper

35
00:01:32,490 --> 00:01:34,110
or if you try looking at the code,

36
00:01:34,110 --> 00:01:37,350
you'll see the inside of a
Transformer is pretty complex,

37
00:01:37,350 --> 00:01:40,110
and writing it all out from
scratch and getting it right

38
00:01:40,110 --> 00:01:41,850
would be hard even for an experienced

39
00:01:41,850 --> 00:01:43,500
machine learning engineer.

40
00:01:43,500 --> 00:01:45,870
But because it's all
packed inside a model,

41
00:01:45,870 --> 00:01:48,150
you don't need to worry
about that complexity on that

42
00:01:48,150 --> 00:01:49,140
if you don't want to.

43
00:01:49,140 --> 00:01:51,570
If you're a researcher, if you
want to really dig in there

44
00:01:51,570 --> 00:01:55,650
you can, but you can also
just load a pre-trained,

45
00:01:55,650 --> 00:01:59,013
pre-configured transformer
model in just one line of code.

46
00:02:00,150 --> 00:02:03,480
And when I mentioned
earlier about the Keras API,

47
00:02:03,480 --> 00:02:04,560
the advantage of it is that

48
00:02:04,560 --> 00:02:06,690
whether you write your
own model from scratch

49
00:02:06,690 --> 00:02:09,510
or load a pre-trained one,
you interact with the model

50
00:02:09,510 --> 00:02:11,850
through that same API, so you use exactly

51
00:02:11,850 --> 00:02:13,950
the same few methods and
you're gonna see them

52
00:02:13,950 --> 00:02:16,380
again and again, these methods like fit,

53
00:02:16,380 --> 00:02:19,650
compile and predict,
and like I've mentioned

54
00:02:19,650 --> 00:02:22,530
we'll cover concrete examples
of how to use those methods

55
00:02:22,530 --> 00:02:24,330
in the videos I'll link below.

56
00:02:24,330 --> 00:02:27,000
For now the key thing to
take away from this video,

57
00:02:27,000 --> 00:02:28,950
if you've never seen Keras before,

58
00:02:28,950 --> 00:02:30,870
is that this neat encapsulation means

59
00:02:30,870 --> 00:02:33,090
that all the complexity
of a huge neural net

60
00:02:33,090 --> 00:02:35,430
becomes manageable, because
you interact with it

61
00:02:35,430 --> 00:02:39,000
in exactly the same way, using
exactly the same methods,

62
00:02:39,000 --> 00:02:41,700
whether it's a huge
pre-trained language model

63
00:02:41,700 --> 00:02:43,950
or a simple model that
you wrote out by hand.

64
00:02:45,466 --> 00:02:48,049
(upbeat music)