subtitles/en/25_keras-introduction.srt (226 lines of code) (raw):
1
00:00:00,430 --> 00:00:03,013
(upbeat music)
2
00:00:05,160 --> 00:00:07,080
- In this video, I'm going to give you
3
00:00:07,080 --> 00:00:10,350
a very quick introduction to
how our transformer models
4
00:00:10,350 --> 00:00:14,040
work together with Tensorflow and Keras.
5
00:00:14,040 --> 00:00:15,510
The very short explanation
6
00:00:15,510 --> 00:00:17,310
is that all of our Tensorflow models
7
00:00:17,310 --> 00:00:19,470
are also Keras model objects,
8
00:00:19,470 --> 00:00:22,950
and so they have the
standard Keras model API.
9
00:00:22,950 --> 00:00:24,960
If you're an experienced
machine learning engineer
10
00:00:24,960 --> 00:00:28,230
who's used Keras a lot, that's
probably all you need to know
11
00:00:28,230 --> 00:00:29,610
to start working with them.
12
00:00:29,610 --> 00:00:30,900
But for everyone else,
13
00:00:30,900 --> 00:00:34,170
including the prodigal
PyTorch engineers out there
14
00:00:34,170 --> 00:00:35,910
who are returning to the fold,
15
00:00:35,910 --> 00:00:38,430
I'm going to quickly
introduce Keras models,
16
00:00:38,430 --> 00:00:40,440
and how we work with them.
17
00:00:40,440 --> 00:00:43,080
In other videos, which I'll link below,
18
00:00:43,080 --> 00:00:46,440
I'll run through training with
Keras models in more detail.
19
00:00:46,440 --> 00:00:50,820
But first, at a high level,
what is a Keras model?
20
00:00:50,820 --> 00:00:54,810
So your model basically
contains your entire network.
21
00:00:54,810 --> 00:00:58,230
It contains the layers, and
the weights for those layers,
22
00:00:58,230 --> 00:01:00,690
and also tells the model
what to do with them
23
00:01:00,690 --> 00:01:02,880
so it defines the whole path all the way
24
00:01:02,880 --> 00:01:05,460
from your inputs to your outputs.
25
00:01:05,460 --> 00:01:07,380
If you've used Keras before,
26
00:01:07,380 --> 00:01:09,480
you probably started using model objects
27
00:01:09,480 --> 00:01:11,850
by building them out by hand,
28
00:01:11,850 --> 00:01:14,250
you added one layer after another
29
00:01:14,250 --> 00:01:18,690
and maybe using the model.add()
or the functional approach.
30
00:01:18,690 --> 00:01:20,490
And there's nothing wrong with that.
31
00:01:21,390 --> 00:01:23,430
Lots of great models are built that way
32
00:01:23,430 --> 00:01:26,970
but you can also pre-load an
entire model, weights and all.
33
00:01:26,970 --> 00:01:29,994
And this is really
helpful, because if you,
34
00:01:29,994 --> 00:01:32,490
as you can see here, if
you try reading the paper
35
00:01:32,490 --> 00:01:34,110
or if you try looking at the code,
36
00:01:34,110 --> 00:01:37,350
you'll see the inside of a
Transformer is pretty complex,
37
00:01:37,350 --> 00:01:40,110
and writing it all out from
scratch and getting it right
38
00:01:40,110 --> 00:01:41,850
would be hard even for an experienced
39
00:01:41,850 --> 00:01:43,500
machine learning engineer.
40
00:01:43,500 --> 00:01:45,870
But because it's all
packed inside a model,
41
00:01:45,870 --> 00:01:48,150
you don't need to worry
about that complexity on that
42
00:01:48,150 --> 00:01:49,140
if you don't want to.
43
00:01:49,140 --> 00:01:51,570
If you're a researcher, if you
want to really dig in there
44
00:01:51,570 --> 00:01:55,650
you can, but you can also
just load a pre-trained,
45
00:01:55,650 --> 00:01:59,013
pre-configured transformer
model in just one line of code.
46
00:02:00,150 --> 00:02:03,480
And when I mentioned
earlier about the Keras API,
47
00:02:03,480 --> 00:02:04,560
the advantage of it is that
48
00:02:04,560 --> 00:02:06,690
whether you write your
own model from scratch
49
00:02:06,690 --> 00:02:09,510
or load a pre-trained one,
you interact with the model
50
00:02:09,510 --> 00:02:11,850
through that same API, so you use exactly
51
00:02:11,850 --> 00:02:13,950
the same few methods and
you're gonna see them
52
00:02:13,950 --> 00:02:16,380
again and again, these methods like fit,
53
00:02:16,380 --> 00:02:19,650
compile and predict,
and like I've mentioned
54
00:02:19,650 --> 00:02:22,530
we'll cover concrete examples
of how to use those methods
55
00:02:22,530 --> 00:02:24,330
in the videos I'll link below.
56
00:02:24,330 --> 00:02:27,000
For now the key thing to
take away from this video,
57
00:02:27,000 --> 00:02:28,950
if you've never seen Keras before,
58
00:02:28,950 --> 00:02:30,870
is that this neat encapsulation means
59
00:02:30,870 --> 00:02:33,090
that all the complexity
of a huge neural net
60
00:02:33,090 --> 00:02:35,430
becomes manageable, because
you interact with it
61
00:02:35,430 --> 00:02:39,000
in exactly the same way, using
exactly the same methods,
62
00:02:39,000 --> 00:02:41,700
whether it's a huge
pre-trained language model
63
00:02:41,700 --> 00:02:43,950
or a simple model that
you wrote out by hand.
64
00:02:45,466 --> 00:02:48,049
(upbeat music)