subtitles/zh-CN/29_write-your-training-loop-in-pytorch.srt (496 lines of code) (raw):
1
00:00:00,298 --> 00:00:01,511
(空气呼啸)
(air whooshing)
2
00:00:01,511 --> 00:00:02,769
(笑脸弹出)
(smiley face popping)
3
00:00:02,769 --> 00:00:05,460
(空气呼啸)
(air whooshing)
4
00:00:05,460 --> 00:00:08,486
使用 PyTorch 编写你自己的训练循环。
Write your own training loop with PyTorch.
5
00:00:08,486 --> 00:00:09,960
在本视频中,我们将了解
In this video, we'll look at
6
00:00:09,960 --> 00:00:12,750
我们如何进行与训练器视频中相同的微调,
how we can do the same fine-tuning as in the Trainer video,
7
00:00:12,750 --> 00:00:14,760
但不依赖那个课程。
but without relying on that class.
8
00:00:14,760 --> 00:00:17,790
这样,你就可以根据你的需要轻松自定义
This way, you'll be able to easily customize each step
9
00:00:17,790 --> 00:00:20,310
训练循环的每个步骤。
to the training loop to your needs.
10
00:00:20,310 --> 00:00:21,660
这个也很有用
This is also very useful
11
00:00:21,660 --> 00:00:22,740
对于手动调试
to manually debug something
12
00:00:22,740 --> 00:00:24,590
Trainer API 出现的问题。
that went wrong with the Trainer API.
13
00:00:26,220 --> 00:00:28,020
在我们深入研究代码之前,
Before we dive into the code,
14
00:00:28,020 --> 00:00:30,481
这是训练循环的草图。
here is a sketch of a training loop.
15
00:00:30,481 --> 00:00:33,381
我们获取一批训练数据并将其提供给模型。
We take a batch of training data and feed it to the model.
16
00:00:34,223 --> 00:00:36,960
有了标签,我们就可以计算损失。
With the labels, we can then compute a loss.
17
00:00:36,960 --> 00:00:39,316
这个数字本身没有用,
That number is not useful in its own,
18
00:00:39,316 --> 00:00:40,260
它是用于计算
but is used to compute
19
00:00:40,260 --> 00:00:42,150
我们模型权重的成分,
the ingredients of our model weights,
20
00:00:42,150 --> 00:00:43,440
即,损失关于
that is the derivative of the loss
21
00:00:44,610 --> 00:00:47,160
每个模型权重的导数。
with respect to each model weight.
22
00:00:47,160 --> 00:00:49,800
然后优化器使用这些梯度
Those gradients are then used by the optimizer
23
00:00:49,800 --> 00:00:51,210
更新模型权重,
to update the model weights,
24
00:00:51,210 --> 00:00:53,550
让他们变得更好一点。
and make them a little bit better.
25
00:00:53,550 --> 00:00:54,510
然后我们重复这个过程
We then repeat the process
26
00:00:54,510 --> 00:00:56,880
使用一批新的训练数据。
with a new batch of training data.
27
00:00:56,880 --> 00:00:58,620
如果有任何不清楚的地方,
If any of this isn't clear,
28
00:00:58,620 --> 00:01:00,270
不要犹豫,复习一下
don't hesitate to take a refresher
29
00:01:00,270 --> 00:01:02,170
在你最喜欢的深度学习课程上。
on your favorite deep learning course.
30
00:01:03,210 --> 00:01:06,000
我们将在这里再次使用 GLUE MRPC 数据集,
We'll use the GLUE MRPC data set here again,
31
00:01:06,000 --> 00:01:07,680
我们已经看到了如何预先提出数据
and we've seen how to prepropose the data
32
00:01:07,680 --> 00:01:11,130
使用具有动态填充的数据集库。
using the Datasets library with dynamic padding.
33
00:01:11,130 --> 00:01:12,630
查看下面的视频链接
Check out the videos link below
34
00:01:12,630 --> 00:01:14,280
如果你还没有看过它们。
if you haven't seen them already.
35
00:01:15,480 --> 00:01:18,930
完成后,我们只需要定义 PyTorch DataLoaders
With this done, we only have to define PyTorch DataLoaders
36
00:01:18,930 --> 00:01:20,610
这将负责转换
which will be responsible to convert
37
00:01:20,610 --> 00:01:23,253
我们数据集的元素到批次数据中。
the elements of our dataset into batches.
38
00:01:24,450 --> 00:01:27,960
我们将 DataColletorForPadding 用作整理函数,
We use our DataColletorForPadding as a collate function,
39
00:01:27,960 --> 00:01:29,460
并打乱训练集的次序
and shuffle the training set
40
00:01:29,460 --> 00:01:31,080
确保我们不会每个纪元
to make sure we don't go over the samples
41
00:01:31,080 --> 00:01:33,870
以相同的顺序遍历样本。
in the same order at a epoch*.
42
00:01:33,870 --> 00:01:36,390
要检查一切是否按预期工作,
To check that everything works as intended,
43
00:01:36,390 --> 00:01:38,883
我们尝试获取一批数据,并对其进行检查。
we try to grab a batch of data, and inspect it.
44
00:01:40,080 --> 00:01:43,050
就像我们的数据集元素一样,它是一个字典,
Like our dataset elements, it's a dictionary,
45
00:01:43,050 --> 00:01:46,260
但这里的值不是一个整数列表
but this times the values are not a single list of integers
46
00:01:46,260 --> 00:01:49,053
而是形状为批量大小乘以序列长度的张量。
but a tensor of shape batch size by sequence length.
47
00:01:50,460 --> 00:01:53,580
下一步是在我们的模型中发送训练数据。
The next step is to send the training data in our model.
48
00:01:53,580 --> 00:01:56,730
为此,我们需要实际创建一个模型。
For that, we'll need to actually create a model.
49
00:01:56,730 --> 00:01:58,740
如模型 API 视频中所示,
As seen in the Model API video,
50
00:01:58,740 --> 00:02:00,540
我们使用 from_pretrained 方法,
we use the from_pretrained method,
51
00:02:00,540 --> 00:02:03,270
并将标签数量调整为这个数据集
and adjust the number of labels to the number of classes
52
00:02:03,270 --> 00:02:06,810
拥有的类别数量,这里是 2。
we have on this data set, here two.
53
00:02:06,810 --> 00:02:08,940
再次确保一切顺利,
Again to be sure everything is going well,
54
00:02:08,940 --> 00:02:11,100
我们将我们抓取的批次传递给我们的模型,
we pass the batch we grabbed to our model,
55
00:02:11,100 --> 00:02:13,320
并检查没有错误。
and check there is no error.
56
00:02:13,320 --> 00:02:14,940
如果提供了标签,
If the labels are provided,
57
00:02:14,940 --> 00:02:16,590
Transformers 库的模型
the models of the Transformers library
58
00:02:16,590 --> 00:02:18,273
总是直接返回损失。
always returns a loss directly.
59
00:02:19,525 --> 00:02:21,090
我们将能够做损失的反向传播
We will be able to do loss.backward ()
60
00:02:21,090 --> 00:02:22,860
以计算所有梯度,
to compute all the gradients,
61
00:02:22,860 --> 00:02:26,460
然后需要一个优化器来完成训练步骤。
and will then need an optimizer to do the training step.
62
00:02:26,460 --> 00:02:28,860
我们在这里使用 AdamW 优化器,
We use the AdamW optimizer here,
63
00:02:28,860 --> 00:02:31,440
这是具有适当权重衰减的 Adam 变体,
which is a variant of Adam with proper weight decay,
64
00:02:31,440 --> 00:02:33,840
但你可以选择任何你喜欢的 PyTorch 优化器。
but you can pick any PyTorch optimizer you like.
65
00:02:34,830 --> 00:02:36,150
使用之前的损失,
Using the previous loss,
66
00:02:36,150 --> 00:02:39,060
并使用 loss.backward() 计算梯度,
and computing the gradients with loss.backward (),
67
00:02:39,060 --> 00:02:41,130
我们检查我们是否可以无误
we check that we can do the optimizer step
68
00:02:41,130 --> 00:02:42,030
执行优化器步骤。
without any error.
69
00:02:43,380 --> 00:02:45,870
之后不要忘记将梯度归零,
Don't forget to zero your gradient afterwards,
70
00:02:45,870 --> 00:02:46,890
或者在下一步,
or at the next step,
71
00:02:46,890 --> 00:02:49,343
它们将被添加到你计算的梯度中。
they will get added to the gradients you computed.
72
00:02:50,490 --> 00:02:52,080
我们已经可以编写我们的训练循环,
We could already write our training loop,
73
00:02:52,080 --> 00:02:53,220
但我们还要再做两件事
but we will add two more things
74
00:02:53,220 --> 00:02:55,620
使它尽可能好。
to make it as good as it can be.
75
00:02:55,620 --> 00:02:57,690
第一个是学习率调度器,
The first one is a learning rate scheduler,
76
00:02:57,690 --> 00:03:00,140
逐步将我们的学习率降低到零。
to progressively decay our learning rate to zero.
77
00:03:01,195 --> 00:03:04,590
Transformers 库中的 get_scheduler 函数
The get_scheduler function from the Transformers library
78
00:03:04,590 --> 00:03:06,150
只是一个方便的功能
is just a convenience function
79
00:03:06,150 --> 00:03:07,800
轻松构建这样的调度器。
to easily build such a scheduler.
80
00:03:08,850 --> 00:03:09,683
你可以再次使用
You can again use
81
00:03:09,683 --> 00:03:11,860
取而代之的是任何 PyTorch 学习率调度器。
any PyTorch learning rate scheduler instead.
82
00:03:13,110 --> 00:03:14,850
最后,如果我们想要我们的培训
Finally, if we want our training
83
00:03:14,850 --> 00:03:17,610
花几分钟而不是几个小时,
to take a couple of minutes instead of a few hours,
84
00:03:17,610 --> 00:03:19,530
我们需要使用 GPU。
we will need to use a GPU.
85
00:03:19,530 --> 00:03:21,270
第一步是得到一个,
The first step is to get one,
86
00:03:21,270 --> 00:03:23,283
例如通过使用协作笔记本。
for instance by using a colab notebook.
87
00:03:24,180 --> 00:03:26,040
然后你需要实际发送你的模型,
Then you need to actually send your model,
88
00:03:26,040 --> 00:03:28,923
并使用 torch 设备对其进行训练数据。
and training data on it by using a torch device.
89
00:03:29,790 --> 00:03:30,840
仔细检查以下代码
Double-check the following lines
90
00:03:30,840 --> 00:03:32,340
为你打印一个 CUDA 设备。
print a CUDA device for you.
91
00:03:32,340 --> 00:03:35,640
或准备将你的训练减少到一个多小时。
or be prepared for your training to less, more than an hour.
92
00:03:35,640 --> 00:03:37,390
我们现在可以把所有东西放在一起。
We can now put everything together.
93
00:03:38,550 --> 00:03:40,860
首先,我们将模型置于训练模式
First, we put our model in training mode
94
00:03:40,860 --> 00:03:42,240
这将激活训练行为
which will activate the training behavior
95
00:03:42,240 --> 00:03:44,790
对于某些层,例如 Dropout。
for some layers, like Dropout.
96
00:03:44,790 --> 00:03:46,860
然后遍历我们选择的纪元数,
Then go through the number of epochs we picked,
97
00:03:46,860 --> 00:03:50,070
以及我们训练数据加载器中的所有数据。
and all the data in our training dataloader.
98
00:03:50,070 --> 00:03:52,410
然后我们完成我们已经看到的所有步骤;
Then we go through all the steps we have seen already;
99
00:03:52,410 --> 00:03:54,240
将数据发送到 GPU,
send the data to the GPU,
100
00:03:54,240 --> 00:03:55,560
计算模型输出,
compute the model outputs,
101
00:03:55,560 --> 00:03:57,720
尤其是损失。
and in particular the loss.
102
00:03:57,720 --> 00:03:59,850
使用损失来计算梯度,
Use the loss to compute gradients,
103
00:03:59,850 --> 00:04:02,880
然后使用优化器进行训练。
then make a training step with the optimizer.
104
00:04:02,880 --> 00:04:04,500
在我们的调度器中更新学习率
Update the learning rate in our scheduler
105
00:04:04,500 --> 00:04:05,970
对于下一次迭代,
for the next iteration,
106
00:04:05,970 --> 00:04:07,763
并将优化器的梯度归零。
and zero the gradients of the optimizer.
107
00:04:09,240 --> 00:04:10,500
一旦完成,
Once this is finished,
108
00:04:10,500 --> 00:04:12,150
我们可以很容易地评估我们的模型
we can evaluate our model very easily
109
00:04:12,150 --> 00:04:14,283
使用数据集库中的指标。
with a metric from the Datasets library.
110
00:04:15,180 --> 00:04:17,880
首先,我们将模型置于评估模式,
First, we put our model in evaluation mode,
111
00:04:17,880 --> 00:04:20,550
停用像 Dropout 这样的层,
to deactivate layers like Dropout,
112
00:04:20,550 --> 00:04:23,850
然后遍历评估数据加载器中的所有数据。
then go through all the data in the evaluation data loader.
113
00:04:23,850 --> 00:04:25,530
正如我们在训练器视频中看到的那样,
As we have seen in the Trainer video,
114
00:04:25,530 --> 00:04:26,850
模型输出逻辑斯蒂,
the model outputs logits,
115
00:04:26,850 --> 00:04:28,530
我们需要应用 argmax 函数
and we need to apply the argmax function
116
00:04:28,530 --> 00:04:30,213
将它们转化为预测。
to convert them into predictions.
117
00:04:31,350 --> 00:04:33,420
然后度量对象有一个 add_batch 方法,
The metric object then has an add_batch method,
118
00:04:33,420 --> 00:04:36,810
我们可以用来向它发送那些中间预测。
we can use to send it those intermediate predictions.
119
00:04:36,810 --> 00:04:38,700
一旦评估循环完成,
Once the evaluation loop is finished,
120
00:04:38,700 --> 00:04:40,320
我们只需要调用计算方法
we just have to call the compute method
121
00:04:40,320 --> 00:04:42,180
得到我们的最终结果。
to get our final results.
122
00:04:42,180 --> 00:04:44,490
恭喜,你现在已经微调了一个模型
Congratulations, you have now fine-tuned a model
123
00:04:44,490 --> 00:04:45,633
全靠你自己。
all by yourself.
124
00:04:47,253 --> 00:04:49,920
(空气呼啸)
(air whooshing)