1
00:00:00,253 --> 00:00:02,920
（空气呼啸）
(air whooshing)

2
00:00:06,060 --> 00:00:08,070
- 在这段视频中，我们将看到
- In this video, we're going to see

3
00:00:08,070 --> 00:00:11,430
如何加载和微调预训练模型。
how to load and fine tune a pre-trained model.

4
00:00:11,430 --> 00:00:12,510
它非常快。
It's very quick.

5
00:00:12,510 --> 00:00:14,490
如果你看过我们关于 pipeline 的视频，
And if you've watched our pipeline videos,

6
00:00:14,490 --> 00:00:18,150
我将在下面链接之，过程非常相似。
which I'll link below, the process is very similar.

7
00:00:18,150 --> 00:00:20,940
不过这一次，我们将使用迁移学习
This time, though, we're going to be using transfer learning

8
00:00:20,940 --> 00:00:23,040
并自己做一些训练，
and doing some training ourselves,

9
00:00:23,040 --> 00:00:26,400
而不是仅仅加载模型并按原样使用它。
rather than just loading a model and using it as is.

10
00:00:26,400 --> 00:00:28,710
所以要了解更多关于迁移学习的信息，
So to learn more about transfer learning,

11
00:00:28,710 --> 00:00:31,320
前往 “What is transfer learning?” 视频，
head to the 'What is transfer learning?' video,

12
00:00:31,320 --> 00:00:33,420
我们也会在下面链接它。
and we'll link that below as well.

13
00:00:33,420 --> 00:00:35,610
但现在，让我们看看这段代码。
But for now, let's look at this code.

14
00:00:35,610 --> 00:00:38,730
首先，我们选择我们想要开始的模型。
To start, we pick which model we want to start with.

15
00:00:38,730 --> 00:00:40,920
在这种情况下，我们将使用著名的，
In this case, we're going to use the famous,

16
00:00:40,920 --> 00:00:42,060
原始的 BERT，
the original BERT,

17
00:00:42,060 --> 00:00:44,850
作为我们今天训练的基础。
as the foundation for our training today.

18
00:00:44,850 --> 00:00:46,770
但这条怪物般的一行是什么，
But what is this monstrosity line,

19
00:00:46,770 --> 00:00:48,797
这个 “TFAutoModelForSequenceClassification”？
this 'TFAutoModelForSequenceClassification'?

20
00:00:49,860 --> 00:00:51,180
这意味着什么？
What does that mean?

21
00:00:51,180 --> 00:00:53,130
好吧，TF 代表 TensorFlow。
Well, the TF stands for TensorFlow.

22
00:00:53,130 --> 00:00:54,660
剩下的意味着，
And the rest means,

23
00:00:54,660 --> 00:00:55,950
采用语言模型，
take a language model,

24
00:00:55,950 --> 00:00:58,380
并在上面贴上一个序列分类头
and stick a sequence classification head onto it

25
00:00:58,380 --> 00:01:00,750
如果它还没有的话。
if it doesn't have one already.

26
00:01:00,750 --> 00:01:02,880
所以这行代码加载了 BERT，
So this line of code loads BERT,

27
00:01:02,880 --> 00:01:05,040
这是一个通用语言模型，
which is a general purpose language model,

28
00:01:05,040 --> 00:01:07,650
它加载重量、架构和所有
it loads at weights, architecture, and all

29
00:01:07,650 --> 00:01:10,920
然后在上面添加一个新的序列分类头
and then adds a new sequence classification head onto it

30
00:01:10,920 --> 00:01:13,440
具有随机初始化的权重。
with randomly initialized weights.

31
00:01:13,440 --> 00:01:15,870
所以这个方法需要知道两件事。
So this method needs to know two things.

32
00:01:15,870 --> 00:01:18,270
首先，它需要知道模型的名称
Firstly, it needs to know the name of the model

33
00:01:18,270 --> 00:01:21,060
你想要加载的架构和权重。
you wanted to load, the architecture and weights for.

34
00:01:21,060 --> 00:01:23,940
其次，它需要知道有多少类
And secondly, it needs to know how many classes

35
00:01:23,940 --> 00:01:26,693
你的问题是有的，因为这将决定大小，
your problem has, because that will determine the size,

36
00:01:26,693 --> 00:01:29,610
对输出头中的神经元数量。
the number of neurons in the output head.

37
00:01:29,610 --> 00:01:31,530
所以如果你想跟随数据
So if you want to follow along with the data

38
00:01:31,530 --> 00:01:34,500
来自我们有关数据集的视频，我将在下面链接，
from our datasets videos, which I'll link below,

39
00:01:34,500 --> 00:01:37,440
那么你将有两个类别，积极的和消极的，
then you'll have two classes, positive and negative,

40
00:01:37,440 --> 00:01:39,723
因此 num_labels 等于二。
and thus num_labels equals two.

41
00:01:40,830 --> 00:01:43,230
但是这个编译行呢？
But what about this compile line?

42
00:01:43,230 --> 00:01:44,970
好吧，如果你熟悉 Keras，
Well, if you're familiar with Keras,

43
00:01:44,970 --> 00:01:46,920
你可能已经看过了。
you've probably seen this already.

44
00:01:46,920 --> 00:01:49,800
但如果不是，这是 Keras 中的核心方法之一
But if not, this is one of the core methods in Keras

45
00:01:49,800 --> 00:01:51,450
你会一次又一次地看到。
that you're gonna see again, and again.

46
00:01:51,450 --> 00:01:54,900
你总是需要在训练之前编译你的模型。
You always need to compile your model before you train it.

47
00:01:54,900 --> 00:01:57,870
编译需要知道两件事。
And compile needs to know two things.

48
00:01:57,870 --> 00:02:00,090
首先，它需要知道损失函数，
Firstly, it needs to know the loss function,

49
00:02:00,090 --> 00:02:02,340
这就是你要优化的内容。
which is what you're trying to optimize.

50
00:02:02,340 --> 00:02:05,910
所以在这里，我们导入 SparseCategoricalCrossentropy
So here, we import the SparseCategoricalCrossentropy

51
00:02:05,910 --> 00:02:07,260
损失函数。
loss function.

52
00:02:07,260 --> 00:02:09,930
所以这是一点点，但它是标准的损失函数
So that's a mouthful, but it's the standard loss function

53
00:02:09,930 --> 00:02:13,260
对于任何正在执行分类任务的神经网络。
for any neural network that's doing a classification task.

54
00:02:13,260 --> 00:02:14,970
它基本上鼓励网络
It basically encourages the network

55
00:02:14,970 --> 00:02:17,730
为正确的类输出大的值，
to output large values for the right class,

56
00:02:17,730 --> 00:02:20,910
以及为错误的类别输出低值。
and low values for the wrong classes.

57
00:02:20,910 --> 00:02:24,150
请注意，你可以将损失函数指定为字符串，
Note that you can specify the loss function as a string,

58
00:02:24,150 --> 00:02:26,010
就像我们对优化器所做的那样。
like we did with the optimizer.

59
00:02:26,010 --> 00:02:27,600
但这里有一个问题，
But there's a risk there,

60
00:02:27,600 --> 00:02:30,090
人们容易掉入一个非常普遍的陷阱，
there's a very common trap people fall into,

61
00:02:30,090 --> 00:02:32,580
就是默认情况下，这种损失假设
which is that by default, this loss assumes

62
00:02:32,580 --> 00:02:36,510
输出是 softmax 层之后的概率。
the output is probabilities after a softmax layer.

63
00:02:36,510 --> 00:02:38,310
但是我们的模型实际输出了什么
But what our model has actually output

64
00:02:38,310 --> 00:02:40,770
是 softmax 之前的值，
is the values before the softmax,

65
00:02:40,770 --> 00:02:43,800
通常称为 logits，有时称为 logits。
often called the logits, sometimes logits.

66
00:02:43,800 --> 00:02:46,110
没有人十分确定如何发音。
No one's quite sure how to pronounce that one.

67
00:02:46,110 --> 00:02:47,790
但你可能以前见过这些
But you probably seen these before

68
00:02:47,790 --> 00:02:49,950
在关于 pipeline 的视频中。
in the video about pipelines.

69
00:02:49,950 --> 00:02:52,320
所以如果你弄错了，你的模型就不会训练
So if you get this wrong, your model won't train

70
00:02:52,320 --> 00:02:54,723
弄清楚原因会很烦人。
and it'll be very annoying to figure out why.

71
00:02:55,590 --> 00:02:57,540
在以后的视频中，我们会看到
In future videos, we're gonna see

72
00:02:57,540 --> 00:03:00,540
如何使用模型的内部损失计算，
how to use the model's internal loss computations,

73
00:03:00,540 --> 00:03:02,910
这样你就不必自己指定损失
so that you don't have to specify the loss yourself

74
00:03:02,910 --> 00:03:05,340
而且你不必担心这些细节。
and you don't have to worry about these details.

75
00:03:05,340 --> 00:03:09,480
但是现在，请记住将 from_logits 设置为 true。
But for now, remember to set from_logits to true.

76
00:03:09,480 --> 00:03:11,430
编译需要知道的第二件事
The second thing compile needs to know

77
00:03:11,430 --> 00:03:13,230
是你想要的优化器。
is the optimizer you want.

78
00:03:13,230 --> 00:03:15,120
在我们的例子中，我们使用 adam ，
In our case, we use adam,

79
00:03:15,120 --> 00:03:16,830
这是一种标准的优化器
which is sort of the standard optimizer

80
00:03:16,830 --> 00:03:18,720
用于现代深度学习。
for deep learning these days.

81
00:03:18,720 --> 00:03:20,520
你可能想要改变的一件事
The one thing you might want to change

82
00:03:20,520 --> 00:03:21,780
是学习率。
is the learning rate.

83
00:03:21,780 --> 00:03:24,630
为此，我们需要导入实际的优化器
And to do that, we'll need to import the actual optimizer

84
00:03:24,630 --> 00:03:26,910
而不仅仅是通过字符串调用它。
rather than just calling it by string.

85
00:03:26,910 --> 00:03:28,680
但我们会在另一个视频中讨论这个，
But we'll talk about that in another video,

86
00:03:28,680 --> 00:03:30,090
我将在下面链接。
which I'll link below.

87
00:03:30,090 --> 00:03:33,360
现在，让我们尝试训练模型。
For now, let's just try training the model.

88
00:03:33,360 --> 00:03:35,580
那么，你如何训练模型呢？
Well, so how do you train the model?

89
00:03:35,580 --> 00:03:37,950
同样，如果你以前使用过 Keras，
Again, if you've used Keras before,

90
00:03:37,950 --> 00:03:40,350
这一切对你来说都很熟悉。
this is all going to be very familiar to you.

91
00:03:40,350 --> 00:03:42,210
但如果没有，让我们快速看看
But if not, let's very quickly look

92
00:03:42,210 --> 00:03:43,710
我们在这里做什么。
at what we're doing here.

93
00:03:43,710 --> 00:03:47,010
fit 几乎是 Keras 模型的核心方法。
fit is pretty much the central method for Keras models.

94
00:03:47,010 --> 00:03:49,983
它告诉模型根据我们传入的数据进行训练。
It tells the model to train on the data we're passing in.

95
00:03:50,820 --> 00:03:52,920
所以这里我们传递我们制作的数据集
So here we pass the datasets we made

96
00:03:52,920 --> 00:03:54,510
在上一节中，
in the previous section,

97
00:03:54,510 --> 00:03:57,990
数据集包含我们的输入和标签。
the dataset contains both our inputs and our labels.

98
00:03:57,990 --> 00:04:00,420
所以我们不需要指定单独的标签，
So we don't need to specify separate labels,

99
00:04:00,420 --> 00:04:01,570
当我们调用 fit 时。
when we're calling fit.

100
00:04:02,490 --> 00:04:05,340
然后我们对 validation_data 做同样的事情。
Then we do the same thing with the validation_data.

101
00:04:05,340 --> 00:04:08,190
然后我们可以，如果我们愿意，我们可以指定细节，
And then we can if we want, we can specify details,

102
00:04:08,190 --> 00:04:09,900
比如训练的次数
like the number of epochs for training

103
00:04:09,900 --> 00:04:12,420
你可以传递一些其他参数给 fit 。
where there's some other arguments you can pass to fit.

104
00:04:12,420 --> 00:04:15,240
但最后，你只需将所有这些传递给 model.fit
But in the end, you just pass all of this to model.fit

105
00:04:15,240 --> 00:04:16,440
然后你让它运行。
and you let it run.

106
00:04:16,440 --> 00:04:17,520
如果一切顺利，
If everything works out,

107
00:04:17,520 --> 00:04:19,320
你应该看到一个小训练栏
you should see a little training bar

108
00:04:19,320 --> 00:04:21,300
随着损失的减少而进步。
progressing along as your loss goes down.

109
00:04:21,300 --> 00:04:22,290
就是这样。
And that's it.

110
00:04:22,290 --> 00:04:23,123
在运行时，
While that's running,

111
00:04:23,123 --> 00:04:25,380
你知道，你可以打电话给你的老板并告诉他们
you know, you can call your boss and tell them

112
00:04:25,380 --> 00:04:27,810
你现在是高级 NLP 机器学习工程师
you're a senior NLP machine learning engineer now

113
00:04:27,810 --> 00:04:30,900
你会想要下个季度的薪水审查。
and you're gonna want a salary review next quarter.

114
00:04:30,900 --> 00:04:32,880
这几行代码真的够用了
These few lines of code are really all it takes

115
00:04:32,880 --> 00:04:34,500
应用大量的力量
to apply the power of a massive

116
00:04:34,500 --> 00:04:36,510
预训练语言问题，
pre-trained language problem,

117
00:04:36,510 --> 00:04:38,250
大规模的预训练语言模型，不好意思，
massive pre-trained language model, excuse me,

118
00:04:38,250 --> 00:04:40,080
到你的 NLP 问题。
to your NLP problem.

119
00:04:40,080 --> 00:04:42,150
但我们能做得更好吗？
But could we do better than this?

120
00:04:42,150 --> 00:04:43,920
我的意思是，我们当然可以。
I mean, we certainly could.

121
00:04:43,920 --> 00:04:45,720
具有一些更高级的 Keras 功能
With a few more advanced Keras features

122
00:04:45,720 --> 00:04:47,730
就像一个经过调整的、预定的学习率，
like a tuned, scheduled learning rate,

123
00:04:47,730 --> 00:04:49,290
我们可以获得更低的损失
we can get an even lower loss

124
00:04:49,290 --> 00:04:51,990
以及一个更准确、更有用的模型。
and an even more accurate, more useful model.

125
00:04:51,990 --> 00:04:54,120
训练模型后，我们如何处理模型？
And what do we do with our model after we train it?

126
00:04:54,120 --> 00:04:55,950
所以所有这些都将包含在视频中
So all of this is going to be covered in the videos

127
00:04:55,950 --> 00:04:57,963
即将推出，敬请期待。
that are coming up, so stay tuned.

128
00:04:59,220 --> 00:05:01,887
（空气呼啸）
(air whooshing)