1
00:00:00,225 --> 00:00:02,892
（空气呼啸）
(air whooshing)

2
00:00:05,460 --> 00:00:07,470
- 增强你的 PyTorch 训练循环 
- Supercharge your PyTorch training loop

3
00:00:07,470 --> 00:00:08,943
使用 Hugging Face Accelerate
with Hugging Face Accelerate.

4
00:00:11,340 --> 00:00:12,600
有多个设置
There are multiple setups

5
00:00:12,600 --> 00:00:14,580
你可以在其上进行训练：
on which you can run your training:

6
00:00:14,580 --> 00:00:17,910
它可以在 CPU、GPU、TPU 上，
it could be on CPU, GPUs, TPUs,

7
00:00:17,910 --> 00:00:20,610
分布在具有多个设备的一台机器上，
distributed on one machine with several devices,

8
00:00:20,610 --> 00:00:23,220
甚至几台机器，通常称为节点 (node) ，
or even several machines, often called nodes,

9
00:00:23,220 --> 00:00:25,173
每个都有多个设备。
each with multiple devices.

10
00:00:26,340 --> 00:00:28,200
在这之上，还有新的调整
On top of that, there are new tweaks

11
00:00:28,200 --> 00:00:30,810
使你的训练更快或更有效，
to make your training faster or more efficient,

12
00:00:30,810 --> 00:00:32,763
比如混合精度和 DeepSpeed 。
like mixed precision and DeepSpeed.

13
00:00:33,840 --> 00:00:36,600
这每一个设置或训练调整
Each of those setups or training tweaks

14
00:00:36,600 --> 00:00:38,760
都要求你更改训练循环的代码
requires you to change the code of your training loop

15
00:00:38,760 --> 00:00:41,733
以某种方式, 或者学习新的 API。
in one way or another and to learn a new API.

16
00:00:43,260 --> 00:00:45,940
所有这些设置都由 Trainer API 处理，
All those setups are handled by the Trainer API,

17
00:00:45,940 --> 00:00:49,590
并且有几个第三方库可以提供帮助。
and there are several third-party libraries that can help.

18
00:00:49,590 --> 00:00:50,760
它们的问题
The problem with those

19
00:00:50,760 --> 00:00:53,100
是他们感觉像个黑盒子
is that they can feel like a black box

20
00:00:53,100 --> 00:00:55,320
并且实现调整可能并不容易
and that it might not be easy to implement the tweak

21
00:00:55,320 --> 00:00:56,820
到你需要的训练循环上。
to the training loop you need.

22
00:00:57,840 --> 00:00:59,760
Accelerate 是专门设计的
Accelerate has been designed specifically

23
00:00:59,760 --> 00:01:02,790
以让你保持对训练循环的完全控制
to let you retain full control over your training loop

24
00:01:02,790 --> 00:01:04,833
并尽可能不打乱。
and be as non-intrusive as possible.

25
00:01:05,760 --> 00:01:08,760
只需四行代码即可添加到你的训练循环中，
With just four lines of code to add to your training loop,

26
00:01:08,760 --> 00:01:11,733
这里显示在训练循环视频的例子中，
here shown on the example of the training loop video,

27
00:01:12,630 --> 00:01:14,730
Accelerate 将处理所有设置
Accelerate will handle all the setups

28
00:01:14,730 --> 00:01:17,180
和第一张幻灯片中提到的训练调整。
and training tweaks mentioned on the first slide.

29
00:01:18,630 --> 00:01:20,400
只需学习和掌握一个 API
It's only one API to learn and master

30
00:01:20,400 --> 00:01:21,933
而不是 10 个不同的。
instead of 10 different ones.

31
00:01:23,340 --> 00:01:25,980
更具体地说，你必须导入和实例化
More specifically, you have to import and instantiate

32
00:01:25,980 --> 00:01:27,360
一个 accelerator 对象，
an accelerator object,

33
00:01:27,360 --> 00:01:29,100
这将处理所有必要的代码
that will handle all the necessary code

34
00:01:29,100 --> 00:01:30,300
为你的特定设置。
for your specific setup.

35
00:01:31,380 --> 00:01:33,780
然后你必须发给它模型
Then you have to send it the model,

36
00:01:33,780 --> 00:01:36,000
你正在使用的优化器和数据加载器
optimizer and dataloaders you are using

37
00:01:36,000 --> 00:01:39,633
在 prepare 方法中，这是要记住的主要方法。
in the prepare method, which is the main method to remember.

38
00:01:40,860 --> 00:01:42,870
加速处理设备放置，
Accelerate handles device placement,

39
00:01:42,870 --> 00:01:44,370
所以你不需要把你的分批
so you don't need to put your batch

40
00:01:44,370 --> 00:01:46,980
在你使用的特定设备上。
on the specific device you are using.

41
00:01:46,980 --> 00:01:50,640
最后，你必须更换 loss.backward 行
Finally, you have to replace the loss.backward line

42
00:01:50,640 --> 00:01:54,300
成 accelerator.backwardloss，
by accelerator.backwardloss,

43
00:01:54,300 --> 00:01:55,500
这就是你所需要的！
and that's all you need!

44
00:01:58,410 --> 00:02:01,710
Accelerate 还处理分布式评估。
Accelerate also handles distributed evaluation.

45
00:02:01,710 --> 00:02:04,020
你仍然可以使用经典的评估循环
You can still use a classic evaluation loop

46
00:02:04,020 --> 00:02:06,750
比如我们在训练循环视频中看到的那个，
such as the one we saw in the training loop video,

47
00:02:06,750 --> 00:02:08,280
在这种情况下所有进程
in which case all processes

48
00:02:08,280 --> 00:02:10,083
将进行全面评估。
will perform the full evaluation.

49
00:02:11,340 --> 00:02:13,530
要使用分布式评估，
To use a distributed evaluation,

50
00:02:13,530 --> 00:02:16,380
你只需要像这样调整你的评估循环：
you just have to adapt your evaluation loop like this:

51
00:02:16,380 --> 00:02:17,657
传递评估数据加载器
pass along the evaluation dataloader

52
00:02:17,657 --> 00:02:21,093
到 accelerator.prepare 方法，比如训练。
to the accelerator.prepare method, like for training.

53
00:02:22,170 --> 00:02:23,430
然后你可以关闭这行
Then you can dismiss the line

54
00:02:23,430 --> 00:02:26,160
将分批放在适当的设备上，
that places the batch on the proper device,

55
00:02:26,160 --> 00:02:27,870
在输入你的预测
and just before passing your predictions

56
00:02:27,870 --> 00:02:31,110
和指标的标签之前，使用 accelerator.gather
and labels to your metric, use accelerator.gather

57
00:02:31,110 --> 00:02:33,300
来收集预测
to gather together the predictions

58
00:02:33,300 --> 00:02:34,803
和每个进程的标签。
and labels from each process.

59
00:02:36,420 --> 00:02:37,890
分布式训练脚本
A distributed training script

60
00:02:37,890 --> 00:02:41,040
必须在不同的进程中多次启动，
has to be launched several times on different processes,

61
00:02:41,040 --> 00:02:43,203
例如，你使用的每个 GPU 一个。
for instance, one per GPU you are using.

62
00:02:44,070 --> 00:02:46,350
你可以使用 PyTorch 工具来做到这一点
You can use the PyTorch tools to do that

63
00:02:46,350 --> 00:02:48,210
如果你熟悉他们，
if you are familiar with them,

64
00:02:48,210 --> 00:02:50,520
但 Accelerate 还提供了一个简单的 API
but Accelerate also provides an easy API

65
00:02:50,520 --> 00:02:53,523
配置你的设置并启动你的训练脚本。
to configure your setup and launch your training script.

66
00:02:54,540 --> 00:02:57,270
在终端中，运行加速配置
In a terminal, run accelerate config

67
00:02:57,270 --> 00:02:58,650
并回答小问题
and answer the small questionnaire

68
00:02:58,650 --> 00:03:00,330
生成配置文件
to generate a configuration file

69
00:03:00,330 --> 00:03:02,073
含所有相关信息，
with all the relevant information,

70
00:03:03,240 --> 00:03:05,790
然后你可以运行加速启动，
then you can just run accelerate launch,

71
00:03:05,790 --> 00:03:08,580
然后是训练脚本的路径。
followed by the path to your training script.

72
00:03:08,580 --> 00:03:12,000
在 notebook 中，你可以使用 notebook 启动器函数
In a notebook, you can use the notebook launcher function

73
00:03:12,000 --> 00:03:13,233
开始你的训练。
to launch your training.

74
00:03:15,186 --> 00:03:17,853
（空气呼啸）
(air whooshing)