subtitles/zh-CN/02_the-carbon-footprint-of-transformers.srt (516 lines of code) (raw):

1 00:00:05,580 --> 00:00:08,820 - 让我们谈谈 transformer 的碳足迹。 - So let's talk about the carbon footprint of transformers. 2 00:00:08,820 --> 00:00:10,530 也许您看过这样的头条新闻 Maybe you've seen headlines such as this one 3 00:00:10,530 --> 00:00:13,530 训练单个 AI 模型排放的碳含量 that training a single AI model can emit as much carbon 4 00:00:13,530 --> 00:00:16,020 相当于五辆汽车生命周期的总排放量。 as five cars in their lifetimes. 5 00:00:16,020 --> 00:00:19,440 那么什么时候是真的,而且总是真的吗? So when is this true and is it always true? 6 00:00:19,440 --> 00:00:21,803 其实呢,这实际上取决于几个因素。 Well, it actually depends on several things. 7 00:00:21,803 --> 00:00:23,430 最重要的一点,这取决于 Most importantly, it depends 8 00:00:23,430 --> 00:00:24,960 您所使用的能源类型。 on the type of energy you're using. 9 00:00:24,960 --> 00:00:26,267 如果您使用的是可再生能源,例如 If you're using renewable energy such as 10 00:00:26,267 --> 00:00:30,670 太阳能、风能、水力发电,那么 solar, wind, hydroelectricity, you're really 11 00:00:30,670 --> 00:00:33,810 根本不会排放任何碳,非常非常少。 not emitting any carbon at all, very, very little. 12 00:00:33,810 --> 00:00:36,769 如果您使用的是煤炭等不可再生能源 If you're using non-renewable energy sources such as coal 13 00:00:36,769 --> 00:00:39,570 那么它们的碳足迹要高得多 then their carbon footprint is a lot higher 14 00:00:39,570 --> 00:00:43,260 因为本质上您正在排放大量的温室气体。 'cuz essentially you are emitting a lot of greenhouse gases. 15 00:00:43,260 --> 00:00:44,670 另一个因素是训练时间。 Another aspect is training time. 16 00:00:44,670 --> 00:00:47,232 所以您训练的时间越长,消耗的能量就越多 So the longer you train, the more energy you use 17 00:00:47,232 --> 00:00:50,250 您使用的能源越多,您排放的碳就越多,对吗? the more energy you use, the more carbon you emit, right? 18 00:00:50,250 --> 00:00:51,270 所以所有这些因素一起进行考虑 So this really adds up 19 00:00:51,270 --> 00:00:53,520 特别是如果您正在训练大型模型 especially if you're training large models for 20 00:00:53,520 --> 00:00:56,460 且持续了数小时、数天或数周的时间。 for hours and days and weeks. 21 00:00:56,460 --> 00:00:58,380 您使用的硬件也很重要 The hardware you use also matters 22 00:00:58,380 --> 00:01:00,930 例如某些 GPU 效率更高 because some GPUs, for example, are more efficient 23 00:01:00,930 --> 00:01:05,460 相比较别的硬件来说,其利用效率使用得当。 than others and utilizing efficiency use properly. 24 00:01:05,460 --> 00:01:07,500 一直能够百分百地被使用 So using them a hundred percent all the time 25 00:01:07,500 --> 00:01:10,650 可以真正减少您的能源消耗。 can really reduce the energy consumption that you have. 26 00:01:10,650 --> 00:01:13,290 进一步减少您的碳足迹。 And then once again, reduce your carbon footprint. 27 00:01:13,290 --> 00:01:15,870 还有其他因素比如 IO There's also other aspects such as IO 28 00:01:15,870 --> 00:01:17,730 比如数据,等等。 such as data, et cetera, et cetera. 29 00:01:17,730 --> 00:01:20,940 但这三点是您应该关注的主要因素。 But these are the main three that you should focus on. 30 00:01:20,940 --> 00:01:23,340 所以当我谈论能源和碳强度时 So when I talk about energy sources and carbon intensity 31 00:01:23,340 --> 00:01:24,420 那个的真实意义是什么? what does that really mean? 32 00:01:24,420 --> 00:01:27,480 所以如果您看屏幕顶部 So if you look at the top of the screen 33 00:01:27,480 --> 00:01:30,480 您可以看到印度孟买的云计算实例 you have a carbon footprint 34 00:01:30,480 --> 00:01:33,860 所产生的碳足迹 of a cloud computing instance in Mumbai, India 35 00:01:33,860 --> 00:01:38,700 每千瓦时排放 920 克二氧化碳。 which emits 920 grams of CO2 per kilowatt hour. 36 00:01:38,700 --> 00:01:40,110 这差不多有一公斤 This is almost one kilogram 37 00:01:40,110 --> 00:01:43,680 每千瓦时电力使用的二氧化碳排放量。 of CO2 per kilowatt hour of electricity used. 38 00:01:43,680 --> 00:01:45,150 如果您把它与加拿大蒙特利尔, If you compare that with Canada, Montreal 39 00:01:45,150 --> 00:01:48,720 也就是我现在所在的位置相比,每千克小时排放 20 克二氧化碳。 where I am right now, 20 grams of CO2 per kilo hour. 40 00:01:48,720 --> 00:01:50,040 所以它们有着非常大的区别。 So that's a really, really big difference. 41 00:01:50,040 --> 00:01:54,240 碳排放量几乎增加了 40 倍 Almost more than 40 times more carbon emitted 42 00:01:54,240 --> 00:01:55,950 在孟买对蒙特利尔。 in Mumbai versus Montreal. 43 00:01:55,950 --> 00:01:57,720 所以这真的都需要考虑进去。 And so this can really, really add up. 44 00:01:57,720 --> 00:01:59,820 例如,如果您要训练一个模型数周 If you're training a model for weeks, for example 45 00:01:59,820 --> 00:02:01,920 您乘以 40 you're multiplying times 40 46 00:02:01,920 --> 00:02:03,450 您排放的碳。 the carbon that you're emitting. 47 00:02:03,450 --> 00:02:05,070 因此选择合适的实例 So choosing the right instance 48 00:02:05,070 --> 00:02:07,080 选择低碳计算实例 choosing a low carbon compute instance 49 00:02:07,080 --> 00:02:09,690 这真的是您能做的最有影响力的事情。 is really the most impactful thing that you can do. 50 00:02:09,690 --> 00:02:13,020 这就是它真正可以产生影响的地方 And this is where it can really add up 51 00:02:13,020 --> 00:02:15,930 如果您正在一个碳密集的地区 if you're training in a very intensive 52 00:02:15,930 --> 00:02:17,580 进行非常密集的训练 in a very carbon intensive region 53 00:02:19,170 --> 00:02:21,750 其他要考虑的因素,例如 other elements to consider, for example 54 00:02:21,750 --> 00:02:22,770 使用预训练模型 using pre-trained models 55 00:02:22,770 --> 00:02:25,590 这就是回收的机器学习等价物。 that's the machine learning equivalent of recycling. 56 00:02:25,590 --> 00:02:28,292 当您有可用的预训练模型时 When you have pre-trained models available using them 57 00:02:28,292 --> 00:02:30,120 您根本没有排放任何碳,对吧? you're not emitting any carbon at all, right? 58 00:02:30,120 --> 00:02:31,230 因为您没有在训练任何东西。 You're not retraining anything. 59 00:02:31,230 --> 00:02:33,450 因此先看看当前已经有了哪些工具 So that's also doing your homework 60 00:02:33,450 --> 00:02:35,574 能够帮助您处理所需要进行的任务。 and kind of looking around what already exists. 61 00:02:35,574 --> 00:02:37,890 微调而不是从头开始训练。 Fine tuning instead of training from scratch. 62 00:02:37,890 --> 00:02:38,723 所以再一次 So once again 63 00:02:38,723 --> 00:02:40,590 如果您找到几乎是您需要的模型 if you find a model that is almost what you need 64 00:02:40,590 --> 00:02:43,530 但对最后几层的调整不是很精细 but not quite fine tuning the last couple of layers 65 00:02:43,530 --> 00:02:45,210 通过调整来达到目的 making it really fit your purpose instead 66 00:02:45,210 --> 00:02:46,500 而不是从头通过训练 transformer of training a large transformer 67 00:02:46,500 --> 00:02:48,810 这样的话会大大提高您的效率 from scratch can really help, 68 00:02:48,810 --> 00:02:51,270 从较小的实验开始 starting with smaller experiments 69 00:02:51,270 --> 00:02:52,800 并边调试边工作。 and debugging as you go. 70 00:02:52,800 --> 00:02:54,630 这意味着,例如,培训 So that means, for example, training 71 00:02:54,630 --> 00:02:58,770 弄清楚数据编码 figuring out data encoding, figuring out, you know 72 00:02:58,770 --> 00:03:01,170 确保没有小错误 making sure that there's no small bugs, that you'll 73 00:03:01,170 --> 00:03:03,840 您会意识到,经过 16 个小时的训练 you'll realize, you know, 16 hours into training 74 00:03:03,840 --> 00:03:05,820 从小事做起,真正确保 starting small and really making sure 75 00:03:05,820 --> 00:03:08,760 您在做什么,您的代码是什么,这样才是稳妥的。 that what you're doing, what your code is, is stable. 76 00:03:08,760 --> 00:03:11,430 最后做一个文献综述 And then finally doing a literature review to 77 00:03:11,430 --> 00:03:13,740 选择超参数范围,然后跟随 choose hyper parameter ranges and then following 78 00:03:13,740 --> 00:03:15,900 使用随机搜索而不是网格搜索。 up with a random search instead of a grid search. 79 00:03:15,900 --> 00:03:18,420 所以随机搜索超参数 So random searches for hyper parameters 80 00:03:18,420 --> 00:03:21,300 在寻找最佳配置作为网格搜索时 combinations have actually shown to be as efficient 81 00:03:21,300 --> 00:03:24,000 组合实际上被证明是有效的。 in finding the optimal configuration as grid search. 82 00:03:24,000 --> 00:03:27,510 但显然您并没有尝试所有可能的组合 But obviously you're not trying all possible combinations 83 00:03:27,510 --> 00:03:29,520 您只是在尝试其中的一部分。 you're only trying a subset of them. 84 00:03:29,520 --> 00:03:31,800 所以这也很有帮助。 So this can really help as well. 85 00:03:31,800 --> 00:03:32,760 所以现在如果我们回去 So now if we go back 86 00:03:32,760 --> 00:03:36,300 2019 年 Strubell 等人的原始论文 to the original paper by Strubell et all in 2019 87 00:03:36,300 --> 00:03:39,180 关于那五辆车的论文中。 the infamous five cars in their lifetimes paper. 88 00:03:39,180 --> 00:03:40,013 如果您只是考虑 If you just look 89 00:03:40,013 --> 00:03:43,606 一个 2 亿周边 transformer 的因素 at a transformer of 200 million perimeter transformer 90 00:03:43,606 --> 00:03:46,950 它的碳足迹约为 200 磅二氧化碳 it is carbon footprint is around 200 pounds of CO2 91 00:03:46,950 --> 00:03:47,940 这很多 which is significant 92 00:03:47,940 --> 00:03:49,980 但它离五辆汽车还差得很远,对吧? but it's nowhere near five cars, right? 93 00:03:49,980 --> 00:03:52,893 这甚至不是跨大西洋航班。 It's not even a transatlantic flight. 94 00:03:52,893 --> 00:03:55,020 它真正到达这一量级的方式是当您在做 How it really adds up is when you're doing 95 00:03:55,020 --> 00:03:56,190 神经架构搜索的时候 neural architecture search 96 00:03:56,190 --> 00:03:58,560 当您进行超参数调整时,以及 when you're doing hyper parameter tuning, and 97 00:03:58,560 --> 00:04:00,930 在尝试所有可能的组合的时候 this is trying all possible combinations 98 00:04:00,930 --> 00:04:01,763 等等,等等。 et cetera, et cetera. 99 00:04:01,763 --> 00:04:02,596 这是就像 And this is where 100 00:04:02,596 --> 00:04:05,400 就像 600,000 磅的二氧化碳来自哪里一样。 like the 600,000 pounds of CO2 came from. 101 00:04:05,400 --> 00:04:08,490 所以这真的是需要把所有因素考虑进来才可以。 So this is really where things add up. 102 00:04:08,490 --> 00:04:11,880 所以,但如果您正认真地做事 So, but if you're doing things mindfully and conscientiously 103 00:04:11,880 --> 00:04:16,410 那么您的碳足迹就不会那么大, then your carbon footprint wont be as big as, 104 00:04:16,410 --> 00:04:20,040 正如本文所暗示的,一些工具可以用来计算 as the paper implied, some tools to figure 105 00:04:20,040 --> 00:04:22,111 出您排放的 CO2 量。 out how much CO2 exactly you're emitting. 106 00:04:22,111 --> 00:04:24,270 有一个基于网络的工具叫做 machine There's a web-based tool called machine 107 00:04:24,270 --> 00:04:26,430 学习提交计算器,它可以让您 learning submissions calculator, which allows you 108 00:04:26,430 --> 00:04:29,010 手动输入,例如,您使用的硬件 to manually input, for example, which hardware you used 109 00:04:29,010 --> 00:04:30,488 您用了多少小时 how many hours you used it for 110 00:04:30,488 --> 00:04:34,260 它位于本地或云端。 where it was located locally or in the cloud. 111 00:04:34,260 --> 00:04:35,640 然后它会给您一个估计 And then it's gonna give you an estimate 112 00:04:35,640 --> 00:04:37,560 您排放了多少二氧化碳。 of how much CO2 you emitted. 113 00:04:37,560 --> 00:04:40,200 另一个以编程方式执行此操作的工具, Another tool that does this programmatically, 114 00:04:40,200 --> 00:04:41,190 称为代号碳。 is called code carbon. 115 00:04:41,190 --> 00:04:45,112 您可以 PIP 安装它,您可以去 GitHub So you can PIP install it, you can, you can go to the GitHub 116 00:04:45,112 --> 00:04:48,120 它与您的代码并行运行。 and essentially it runs in parallel to your code. 117 00:04:48,120 --> 00:04:49,085 所以基本上您调用它 So essentially you call it 118 00:04:49,085 --> 00:04:51,060 然后交给它做所有的训练。 and then you do all your training. 119 00:04:51,060 --> 00:04:53,760 最后它会给您一个估计 And then at the end it's gonna give you an estimate 120 00:04:53,760 --> 00:04:57,210 包含排放量估算值的 CSV 文件。 a CSV file with an estimate of your emissions. 121 00:04:57,210 --> 00:04:59,250 它会给您一些比较。 And it's gonna give you some comparisons. 122 00:04:59,250 --> 00:05:01,230 它有一个可视化用户界面,您可以在其中真正看到 It's got a visual UI where you can really look 123 00:05:01,230 --> 00:05:04,680 这与开车或看电视相比如何。 at how this compares to driving a car or watching TV. 124 00:05:04,680 --> 00:05:06,060 所以它可以给您一个想法 So it can give you an idea 125 00:05:06,060 --> 00:05:07,740 您的排放范围也是如此。 of the scope of your emissions as well. 126 00:05:07,740 --> 00:05:09,930 实际上,code carbon 已经集成到 auto And actually, code carbon is already integrated into auto 127 00:05:09,930 --> 00:05:12,270 和 LP 中,希望人们能够 and LP and hopefully people will be using it 128 00:05:12,270 --> 00:05:15,240 开箱即用,轻松跟踪所有训练和部署 transformer out of the box and easily tracking their emissions all 129 00:05:15,240 --> 00:05:17,523 的碳排放。 through training and deploying transformers.