subtitles/zh-CN/02_the-carbon-footprint-of-transformers.srt (516 lines of code) (raw):
1
00:00:05,580 --> 00:00:08,820
- 让我们谈谈 transformer 的碳足迹。
- So let's talk about the carbon footprint of transformers.
2
00:00:08,820 --> 00:00:10,530
也许您看过这样的头条新闻
Maybe you've seen headlines such as this one
3
00:00:10,530 --> 00:00:13,530
训练单个 AI 模型排放的碳含量
that training a single AI model can emit as much carbon
4
00:00:13,530 --> 00:00:16,020
相当于五辆汽车生命周期的总排放量。
as five cars in their lifetimes.
5
00:00:16,020 --> 00:00:19,440
那么什么时候是真的,而且总是真的吗?
So when is this true and is it always true?
6
00:00:19,440 --> 00:00:21,803
其实呢,这实际上取决于几个因素。
Well, it actually depends on several things.
7
00:00:21,803 --> 00:00:23,430
最重要的一点,这取决于
Most importantly, it depends
8
00:00:23,430 --> 00:00:24,960
您所使用的能源类型。
on the type of energy you're using.
9
00:00:24,960 --> 00:00:26,267
如果您使用的是可再生能源,例如
If you're using renewable energy such as
10
00:00:26,267 --> 00:00:30,670
太阳能、风能、水力发电,那么
solar, wind, hydroelectricity, you're really
11
00:00:30,670 --> 00:00:33,810
根本不会排放任何碳,非常非常少。
not emitting any carbon at all, very, very little.
12
00:00:33,810 --> 00:00:36,769
如果您使用的是煤炭等不可再生能源
If you're using non-renewable energy sources such as coal
13
00:00:36,769 --> 00:00:39,570
那么它们的碳足迹要高得多
then their carbon footprint is a lot higher
14
00:00:39,570 --> 00:00:43,260
因为本质上您正在排放大量的温室气体。
'cuz essentially you are emitting a lot of greenhouse gases.
15
00:00:43,260 --> 00:00:44,670
另一个因素是训练时间。
Another aspect is training time.
16
00:00:44,670 --> 00:00:47,232
所以您训练的时间越长,消耗的能量就越多
So the longer you train, the more energy you use
17
00:00:47,232 --> 00:00:50,250
您使用的能源越多,您排放的碳就越多,对吗?
the more energy you use, the more carbon you emit, right?
18
00:00:50,250 --> 00:00:51,270
所以所有这些因素一起进行考虑
So this really adds up
19
00:00:51,270 --> 00:00:53,520
特别是如果您正在训练大型模型
especially if you're training large models for
20
00:00:53,520 --> 00:00:56,460
且持续了数小时、数天或数周的时间。
for hours and days and weeks.
21
00:00:56,460 --> 00:00:58,380
您使用的硬件也很重要
The hardware you use also matters
22
00:00:58,380 --> 00:01:00,930
例如某些 GPU 效率更高
because some GPUs, for example, are more efficient
23
00:01:00,930 --> 00:01:05,460
相比较别的硬件来说,其利用效率使用得当。
than others and utilizing efficiency use properly.
24
00:01:05,460 --> 00:01:07,500
一直能够百分百地被使用
So using them a hundred percent all the time
25
00:01:07,500 --> 00:01:10,650
可以真正减少您的能源消耗。
can really reduce the energy consumption that you have.
26
00:01:10,650 --> 00:01:13,290
进一步减少您的碳足迹。
And then once again, reduce your carbon footprint.
27
00:01:13,290 --> 00:01:15,870
还有其他因素比如 IO
There's also other aspects such as IO
28
00:01:15,870 --> 00:01:17,730
比如数据,等等。
such as data, et cetera, et cetera.
29
00:01:17,730 --> 00:01:20,940
但这三点是您应该关注的主要因素。
But these are the main three that you should focus on.
30
00:01:20,940 --> 00:01:23,340
所以当我谈论能源和碳强度时
So when I talk about energy sources and carbon intensity
31
00:01:23,340 --> 00:01:24,420
那个的真实意义是什么?
what does that really mean?
32
00:01:24,420 --> 00:01:27,480
所以如果您看屏幕顶部
So if you look at the top of the screen
33
00:01:27,480 --> 00:01:30,480
您可以看到印度孟买的云计算实例
you have a carbon footprint
34
00:01:30,480 --> 00:01:33,860
所产生的碳足迹
of a cloud computing instance in Mumbai, India
35
00:01:33,860 --> 00:01:38,700
每千瓦时排放 920 克二氧化碳。
which emits 920 grams of CO2 per kilowatt hour.
36
00:01:38,700 --> 00:01:40,110
这差不多有一公斤
This is almost one kilogram
37
00:01:40,110 --> 00:01:43,680
每千瓦时电力使用的二氧化碳排放量。
of CO2 per kilowatt hour of electricity used.
38
00:01:43,680 --> 00:01:45,150
如果您把它与加拿大蒙特利尔,
If you compare that with Canada, Montreal
39
00:01:45,150 --> 00:01:48,720
也就是我现在所在的位置相比,每千克小时排放 20 克二氧化碳。
where I am right now, 20 grams of CO2 per kilo hour.
40
00:01:48,720 --> 00:01:50,040
所以它们有着非常大的区别。
So that's a really, really big difference.
41
00:01:50,040 --> 00:01:54,240
碳排放量几乎增加了 40 倍
Almost more than 40 times more carbon emitted
42
00:01:54,240 --> 00:01:55,950
在孟买对蒙特利尔。
in Mumbai versus Montreal.
43
00:01:55,950 --> 00:01:57,720
所以这真的都需要考虑进去。
And so this can really, really add up.
44
00:01:57,720 --> 00:01:59,820
例如,如果您要训练一个模型数周
If you're training a model for weeks, for example
45
00:01:59,820 --> 00:02:01,920
您乘以 40
you're multiplying times 40
46
00:02:01,920 --> 00:02:03,450
您排放的碳。
the carbon that you're emitting.
47
00:02:03,450 --> 00:02:05,070
因此选择合适的实例
So choosing the right instance
48
00:02:05,070 --> 00:02:07,080
选择低碳计算实例
choosing a low carbon compute instance
49
00:02:07,080 --> 00:02:09,690
这真的是您能做的最有影响力的事情。
is really the most impactful thing that you can do.
50
00:02:09,690 --> 00:02:13,020
这就是它真正可以产生影响的地方
And this is where it can really add up
51
00:02:13,020 --> 00:02:15,930
如果您正在一个碳密集的地区
if you're training in a very intensive
52
00:02:15,930 --> 00:02:17,580
进行非常密集的训练
in a very carbon intensive region
53
00:02:19,170 --> 00:02:21,750
其他要考虑的因素,例如
other elements to consider, for example
54
00:02:21,750 --> 00:02:22,770
使用预训练模型
using pre-trained models
55
00:02:22,770 --> 00:02:25,590
这就是回收的机器学习等价物。
that's the machine learning equivalent of recycling.
56
00:02:25,590 --> 00:02:28,292
当您有可用的预训练模型时
When you have pre-trained models available using them
57
00:02:28,292 --> 00:02:30,120
您根本没有排放任何碳,对吧?
you're not emitting any carbon at all, right?
58
00:02:30,120 --> 00:02:31,230
因为您没有在训练任何东西。
You're not retraining anything.
59
00:02:31,230 --> 00:02:33,450
因此先看看当前已经有了哪些工具
So that's also doing your homework
60
00:02:33,450 --> 00:02:35,574
能够帮助您处理所需要进行的任务。
and kind of looking around what already exists.
61
00:02:35,574 --> 00:02:37,890
微调而不是从头开始训练。
Fine tuning instead of training from scratch.
62
00:02:37,890 --> 00:02:38,723
所以再一次
So once again
63
00:02:38,723 --> 00:02:40,590
如果您找到几乎是您需要的模型
if you find a model that is almost what you need
64
00:02:40,590 --> 00:02:43,530
但对最后几层的调整不是很精细
but not quite fine tuning the last couple of layers
65
00:02:43,530 --> 00:02:45,210
通过调整来达到目的
making it really fit your purpose instead
66
00:02:45,210 --> 00:02:46,500
而不是从头通过训练 transformer
of training a large transformer
67
00:02:46,500 --> 00:02:48,810
这样的话会大大提高您的效率
from scratch can really help,
68
00:02:48,810 --> 00:02:51,270
从较小的实验开始
starting with smaller experiments
69
00:02:51,270 --> 00:02:52,800
并边调试边工作。
and debugging as you go.
70
00:02:52,800 --> 00:02:54,630
这意味着,例如,培训
So that means, for example, training
71
00:02:54,630 --> 00:02:58,770
弄清楚数据编码
figuring out data encoding, figuring out, you know
72
00:02:58,770 --> 00:03:01,170
确保没有小错误
making sure that there's no small bugs, that you'll
73
00:03:01,170 --> 00:03:03,840
您会意识到,经过 16 个小时的训练
you'll realize, you know, 16 hours into training
74
00:03:03,840 --> 00:03:05,820
从小事做起,真正确保
starting small and really making sure
75
00:03:05,820 --> 00:03:08,760
您在做什么,您的代码是什么,这样才是稳妥的。
that what you're doing, what your code is, is stable.
76
00:03:08,760 --> 00:03:11,430
最后做一个文献综述
And then finally doing a literature review to
77
00:03:11,430 --> 00:03:13,740
选择超参数范围,然后跟随
choose hyper parameter ranges and then following
78
00:03:13,740 --> 00:03:15,900
使用随机搜索而不是网格搜索。
up with a random search instead of a grid search.
79
00:03:15,900 --> 00:03:18,420
所以随机搜索超参数
So random searches for hyper parameters
80
00:03:18,420 --> 00:03:21,300
在寻找最佳配置作为网格搜索时
combinations have actually shown to be as efficient
81
00:03:21,300 --> 00:03:24,000
组合实际上被证明是有效的。
in finding the optimal configuration as grid search.
82
00:03:24,000 --> 00:03:27,510
但显然您并没有尝试所有可能的组合
But obviously you're not trying all possible combinations
83
00:03:27,510 --> 00:03:29,520
您只是在尝试其中的一部分。
you're only trying a subset of them.
84
00:03:29,520 --> 00:03:31,800
所以这也很有帮助。
So this can really help as well.
85
00:03:31,800 --> 00:03:32,760
所以现在如果我们回去
So now if we go back
86
00:03:32,760 --> 00:03:36,300
2019 年 Strubell 等人的原始论文
to the original paper by Strubell et all in 2019
87
00:03:36,300 --> 00:03:39,180
关于那五辆车的论文中。
the infamous five cars in their lifetimes paper.
88
00:03:39,180 --> 00:03:40,013
如果您只是考虑
If you just look
89
00:03:40,013 --> 00:03:43,606
一个 2 亿周边 transformer 的因素
at a transformer of 200 million perimeter transformer
90
00:03:43,606 --> 00:03:46,950
它的碳足迹约为 200 磅二氧化碳
it is carbon footprint is around 200 pounds of CO2
91
00:03:46,950 --> 00:03:47,940
这很多
which is significant
92
00:03:47,940 --> 00:03:49,980
但它离五辆汽车还差得很远,对吧?
but it's nowhere near five cars, right?
93
00:03:49,980 --> 00:03:52,893
这甚至不是跨大西洋航班。
It's not even a transatlantic flight.
94
00:03:52,893 --> 00:03:55,020
它真正到达这一量级的方式是当您在做
How it really adds up is when you're doing
95
00:03:55,020 --> 00:03:56,190
神经架构搜索的时候
neural architecture search
96
00:03:56,190 --> 00:03:58,560
当您进行超参数调整时,以及
when you're doing hyper parameter tuning, and
97
00:03:58,560 --> 00:04:00,930
在尝试所有可能的组合的时候
this is trying all possible combinations
98
00:04:00,930 --> 00:04:01,763
等等,等等。
et cetera, et cetera.
99
00:04:01,763 --> 00:04:02,596
这是就像
And this is where
100
00:04:02,596 --> 00:04:05,400
就像 600,000 磅的二氧化碳来自哪里一样。
like the 600,000 pounds of CO2 came from.
101
00:04:05,400 --> 00:04:08,490
所以这真的是需要把所有因素考虑进来才可以。
So this is really where things add up.
102
00:04:08,490 --> 00:04:11,880
所以,但如果您正认真地做事
So, but if you're doing things mindfully and conscientiously
103
00:04:11,880 --> 00:04:16,410
那么您的碳足迹就不会那么大,
then your carbon footprint wont be as big as,
104
00:04:16,410 --> 00:04:20,040
正如本文所暗示的,一些工具可以用来计算
as the paper implied, some tools to figure
105
00:04:20,040 --> 00:04:22,111
出您排放的 CO2 量。
out how much CO2 exactly you're emitting.
106
00:04:22,111 --> 00:04:24,270
有一个基于网络的工具叫做 machine
There's a web-based tool called machine
107
00:04:24,270 --> 00:04:26,430
学习提交计算器,它可以让您
learning submissions calculator, which allows you
108
00:04:26,430 --> 00:04:29,010
手动输入,例如,您使用的硬件
to manually input, for example, which hardware you used
109
00:04:29,010 --> 00:04:30,488
您用了多少小时
how many hours you used it for
110
00:04:30,488 --> 00:04:34,260
它位于本地或云端。
where it was located locally or in the cloud.
111
00:04:34,260 --> 00:04:35,640
然后它会给您一个估计
And then it's gonna give you an estimate
112
00:04:35,640 --> 00:04:37,560
您排放了多少二氧化碳。
of how much CO2 you emitted.
113
00:04:37,560 --> 00:04:40,200
另一个以编程方式执行此操作的工具,
Another tool that does this programmatically,
114
00:04:40,200 --> 00:04:41,190
称为代号碳。
is called code carbon.
115
00:04:41,190 --> 00:04:45,112
您可以 PIP 安装它,您可以去 GitHub
So you can PIP install it, you can, you can go to the GitHub
116
00:04:45,112 --> 00:04:48,120
它与您的代码并行运行。
and essentially it runs in parallel to your code.
117
00:04:48,120 --> 00:04:49,085
所以基本上您调用它
So essentially you call it
118
00:04:49,085 --> 00:04:51,060
然后交给它做所有的训练。
and then you do all your training.
119
00:04:51,060 --> 00:04:53,760
最后它会给您一个估计
And then at the end it's gonna give you an estimate
120
00:04:53,760 --> 00:04:57,210
包含排放量估算值的 CSV 文件。
a CSV file with an estimate of your emissions.
121
00:04:57,210 --> 00:04:59,250
它会给您一些比较。
And it's gonna give you some comparisons.
122
00:04:59,250 --> 00:05:01,230
它有一个可视化用户界面,您可以在其中真正看到
It's got a visual UI where you can really look
123
00:05:01,230 --> 00:05:04,680
这与开车或看电视相比如何。
at how this compares to driving a car or watching TV.
124
00:05:04,680 --> 00:05:06,060
所以它可以给您一个想法
So it can give you an idea
125
00:05:06,060 --> 00:05:07,740
您的排放范围也是如此。
of the scope of your emissions as well.
126
00:05:07,740 --> 00:05:09,930
实际上,code carbon 已经集成到 auto
And actually, code carbon is already integrated into auto
127
00:05:09,930 --> 00:05:12,270
和 LP 中,希望人们能够
and LP and hopefully people will be using it
128
00:05:12,270 --> 00:05:15,240
开箱即用,轻松跟踪所有训练和部署 transformer
out of the box and easily tracking their emissions all
129
00:05:15,240 --> 00:05:17,523
的碳排放。
through training and deploying transformers.