1 00:00:00,253 --> 00:00:02,920 (空气呼啸) (air whooshing) 2 00:00:06,060 --> 00:00:08,070 - 在这段视频中,我们将看到 - In this video, we're going to see 3 00:00:08,070 --> 00:00:11,430 如何加载和微调预训练模型。 how to load and fine tune a pre-trained model. 4 00:00:11,430 --> 00:00:12,510 它非常快。 It's very quick. 5 00:00:12,510 --> 00:00:14,490 如果你看过我们关于 pipeline 的视频, And if you've watched our pipeline videos, 6 00:00:14,490 --> 00:00:18,150 我将在下面链接之,过程非常相似。 which I'll link below, the process is very similar. 7 00:00:18,150 --> 00:00:20,940 不过这一次,我们将使用迁移学习 This time, though, we're going to be using transfer learning 8 00:00:20,940 --> 00:00:23,040 并自己做一些训练, and doing some training ourselves, 9 00:00:23,040 --> 00:00:26,400 而不是仅仅加载模型并按原样使用它。 rather than just loading a model and using it as is. 10 00:00:26,400 --> 00:00:28,710 所以要了解更多关于迁移学习的信息, So to learn more about transfer learning, 11 00:00:28,710 --> 00:00:31,320 前往 “What is transfer learning?” 视频, head to the 'What is transfer learning?' video, 12 00:00:31,320 --> 00:00:33,420 我们也会在下面链接它。 and we'll link that below as well. 13 00:00:33,420 --> 00:00:35,610 但现在,让我们看看这段代码。 But for now, let's look at this code. 14 00:00:35,610 --> 00:00:38,730 首先,我们选择我们想要开始的模型。 To start, we pick which model we want to start with. 15 00:00:38,730 --> 00:00:40,920 在这种情况下,我们将使用著名的, In this case, we're going to use the famous, 16 00:00:40,920 --> 00:00:42,060 原始的 BERT, the original BERT, 17 00:00:42,060 --> 00:00:44,850 作为我们今天训练的基础。 as the foundation for our training today. 18 00:00:44,850 --> 00:00:46,770 但这条怪物般的一行是什么, But what is this monstrosity line, 19 00:00:46,770 --> 00:00:48,797 这个 “TFAutoModelForSequenceClassification”? this 'TFAutoModelForSequenceClassification'? 20 00:00:49,860 --> 00:00:51,180 这意味着什么? What does that mean? 21 00:00:51,180 --> 00:00:53,130 好吧,TF 代表 TensorFlow。 Well, the TF stands for TensorFlow. 22 00:00:53,130 --> 00:00:54,660 剩下的意味着, And the rest means, 23 00:00:54,660 --> 00:00:55,950 采用语言模型, take a language model, 24 00:00:55,950 --> 00:00:58,380 并在上面贴上一个序列分类头 and stick a sequence classification head onto it 25 00:00:58,380 --> 00:01:00,750 如果它还没有的话。 if it doesn't have one already. 26 00:01:00,750 --> 00:01:02,880 所以这行代码加载了 BERT, So this line of code loads BERT, 27 00:01:02,880 --> 00:01:05,040 这是一个通用语言模型, which is a general purpose language model, 28 00:01:05,040 --> 00:01:07,650 它加载重量、架构和所有 it loads at weights, architecture, and all 29 00:01:07,650 --> 00:01:10,920 然后在上面添加一个新的序列分类头 and then adds a new sequence classification head onto it 30 00:01:10,920 --> 00:01:13,440 具有随机初始化的权重。 with randomly initialized weights. 31 00:01:13,440 --> 00:01:15,870 所以这个方法需要知道两件事。 So this method needs to know two things. 32 00:01:15,870 --> 00:01:18,270 首先,它需要知道模型的名称 Firstly, it needs to know the name of the model 33 00:01:18,270 --> 00:01:21,060 你想要加载的架构和权重。 you wanted to load, the architecture and weights for. 34 00:01:21,060 --> 00:01:23,940 其次,它需要知道有多少类 And secondly, it needs to know how many classes 35 00:01:23,940 --> 00:01:26,693 你的问题是有的,因为这将决定大小, your problem has, because that will determine the size, 36 00:01:26,693 --> 00:01:29,610 对输出头中的神经元数量。 the number of neurons in the output head. 37 00:01:29,610 --> 00:01:31,530 所以如果你想跟随数据 So if you want to follow along with the data 38 00:01:31,530 --> 00:01:34,500 来自我们有关数据集的视频,我将在下面链接, from our datasets videos, which I'll link below, 39 00:01:34,500 --> 00:01:37,440 那么你将有两个类别,积极的和消极的, then you'll have two classes, positive and negative, 40 00:01:37,440 --> 00:01:39,723 因此 num_labels 等于二。 and thus num_labels equals two. 41 00:01:40,830 --> 00:01:43,230 但是这个编译行呢? But what about this compile line? 42 00:01:43,230 --> 00:01:44,970 好吧,如果你熟悉 Keras, Well, if you're familiar with Keras, 43 00:01:44,970 --> 00:01:46,920 你可能已经看过了。 you've probably seen this already. 44 00:01:46,920 --> 00:01:49,800 但如果不是,这是 Keras 中的核心方法之一 But if not, this is one of the core methods in Keras 45 00:01:49,800 --> 00:01:51,450 你会一次又一次地看到。 that you're gonna see again, and again. 46 00:01:51,450 --> 00:01:54,900 你总是需要在训练之前编译你的模型。 You always need to compile your model before you train it. 47 00:01:54,900 --> 00:01:57,870 编译需要知道两件事。 And compile needs to know two things. 48 00:01:57,870 --> 00:02:00,090 首先,它需要知道损失函数, Firstly, it needs to know the loss function, 49 00:02:00,090 --> 00:02:02,340 这就是你要优化的内容。 which is what you're trying to optimize. 50 00:02:02,340 --> 00:02:05,910 所以在这里,我们导入 SparseCategoricalCrossentropy So here, we import the SparseCategoricalCrossentropy 51 00:02:05,910 --> 00:02:07,260 损失函数。 loss function. 52 00:02:07,260 --> 00:02:09,930 所以这是一点点,但它是标准的损失函数 So that's a mouthful, but it's the standard loss function 53 00:02:09,930 --> 00:02:13,260 对于任何正在执行分类任务的神经网络。 for any neural network that's doing a classification task. 54 00:02:13,260 --> 00:02:14,970 它基本上鼓励网络 It basically encourages the network 55 00:02:14,970 --> 00:02:17,730 为正确的类输出大的值, to output large values for the right class, 56 00:02:17,730 --> 00:02:20,910 以及为错误的类别输出低值。 and low values for the wrong classes. 57 00:02:20,910 --> 00:02:24,150 请注意,你可以将损失函数指定为字符串, Note that you can specify the loss function as a string, 58 00:02:24,150 --> 00:02:26,010 就像我们对优化器所做的那样。 like we did with the optimizer. 59 00:02:26,010 --> 00:02:27,600 但这里有一个问题, But there's a risk there, 60 00:02:27,600 --> 00:02:30,090 人们容易掉入一个非常普遍的陷阱, there's a very common trap people fall into, 61 00:02:30,090 --> 00:02:32,580 就是默认情况下,这种损失假设 which is that by default, this loss assumes 62 00:02:32,580 --> 00:02:36,510 输出是 softmax 层之后的概率。 the output is probabilities after a softmax layer. 63 00:02:36,510 --> 00:02:38,310 但是我们的模型实际输出了什么 But what our model has actually output 64 00:02:38,310 --> 00:02:40,770 是 softmax 之前的值, is the values before the softmax, 65 00:02:40,770 --> 00:02:43,800 通常称为 logits,有时称为 logits。 often called the logits, sometimes logits. 66 00:02:43,800 --> 00:02:46,110 没有人十分确定如何发音。 No one's quite sure how to pronounce that one. 67 00:02:46,110 --> 00:02:47,790 但你可能以前见过这些 But you probably seen these before 68 00:02:47,790 --> 00:02:49,950 在关于 pipeline 的视频中。 in the video about pipelines. 69 00:02:49,950 --> 00:02:52,320 所以如果你弄错了,你的模型就不会训练 So if you get this wrong, your model won't train 70 00:02:52,320 --> 00:02:54,723 弄清楚原因会很烦人。 and it'll be very annoying to figure out why. 71 00:02:55,590 --> 00:02:57,540 在以后的视频中,我们会看到 In future videos, we're gonna see 72 00:02:57,540 --> 00:03:00,540 如何使用模型的内部损失计算, how to use the model's internal loss computations, 73 00:03:00,540 --> 00:03:02,910 这样你就不必自己指定损失 so that you don't have to specify the loss yourself 74 00:03:02,910 --> 00:03:05,340 而且你不必担心这些细节。 and you don't have to worry about these details. 75 00:03:05,340 --> 00:03:09,480 但是现在,请记住将 from_logits 设置为 true。 But for now, remember to set from_logits to true. 76 00:03:09,480 --> 00:03:11,430 编译需要知道的第二件事 The second thing compile needs to know 77 00:03:11,430 --> 00:03:13,230 是你想要的优化器。 is the optimizer you want. 78 00:03:13,230 --> 00:03:15,120 在我们的例子中,我们使用 adam , In our case, we use adam, 79 00:03:15,120 --> 00:03:16,830 这是一种标准的优化器 which is sort of the standard optimizer 80 00:03:16,830 --> 00:03:18,720 用于现代深度学习。 for deep learning these days. 81 00:03:18,720 --> 00:03:20,520 你可能想要改变的一件事 The one thing you might want to change 82 00:03:20,520 --> 00:03:21,780 是学习率。 is the learning rate. 83 00:03:21,780 --> 00:03:24,630 为此,我们需要导入实际的优化器 And to do that, we'll need to import the actual optimizer 84 00:03:24,630 --> 00:03:26,910 而不仅仅是通过字符串调用它。 rather than just calling it by string. 85 00:03:26,910 --> 00:03:28,680 但我们会在另一个视频中讨论这个, But we'll talk about that in another video, 86 00:03:28,680 --> 00:03:30,090 我将在下面链接。 which I'll link below. 87 00:03:30,090 --> 00:03:33,360 现在,让我们尝试训练模型。 For now, let's just try training the model. 88 00:03:33,360 --> 00:03:35,580 那么,你如何训练模型呢? Well, so how do you train the model? 89 00:03:35,580 --> 00:03:37,950 同样,如果你以前使用过 Keras, Again, if you've used Keras before, 90 00:03:37,950 --> 00:03:40,350 这一切对你来说都很熟悉。 this is all going to be very familiar to you. 91 00:03:40,350 --> 00:03:42,210 但如果没有,让我们快速看看 But if not, let's very quickly look 92 00:03:42,210 --> 00:03:43,710 我们在这里做什么。 at what we're doing here. 93 00:03:43,710 --> 00:03:47,010 fit 几乎是 Keras 模型的核心方法。 fit is pretty much the central method for Keras models. 94 00:03:47,010 --> 00:03:49,983 它告诉模型根据我们传入的数据进行训练。 It tells the model to train on the data we're passing in. 95 00:03:50,820 --> 00:03:52,920 所以这里我们传递我们制作的数据集 So here we pass the datasets we made 96 00:03:52,920 --> 00:03:54,510 在上一节中, in the previous section, 97 00:03:54,510 --> 00:03:57,990 数据集包含我们的输入和标签。 the dataset contains both our inputs and our labels. 98 00:03:57,990 --> 00:04:00,420 所以我们不需要指定单独的标签, So we don't need to specify separate labels, 99 00:04:00,420 --> 00:04:01,570 当我们调用 fit 时。 when we're calling fit. 100 00:04:02,490 --> 00:04:05,340 然后我们对 validation_data 做同样的事情。 Then we do the same thing with the validation_data. 101 00:04:05,340 --> 00:04:08,190 然后我们可以,如果我们愿意,我们可以指定细节, And then we can if we want, we can specify details, 102 00:04:08,190 --> 00:04:09,900 比如训练的次数 like the number of epochs for training 103 00:04:09,900 --> 00:04:12,420 你可以传递一些其他参数给 fit 。 where there's some other arguments you can pass to fit. 104 00:04:12,420 --> 00:04:15,240 但最后,你只需将所有这些传递给 model.fit But in the end, you just pass all of this to model.fit 105 00:04:15,240 --> 00:04:16,440 然后你让它运行。 and you let it run. 106 00:04:16,440 --> 00:04:17,520 如果一切顺利, If everything works out, 107 00:04:17,520 --> 00:04:19,320 你应该看到一个小训练栏 you should see a little training bar 108 00:04:19,320 --> 00:04:21,300 随着损失的减少而进步。 progressing along as your loss goes down. 109 00:04:21,300 --> 00:04:22,290 就是这样。 And that's it. 110 00:04:22,290 --> 00:04:23,123 在运行时, While that's running, 111 00:04:23,123 --> 00:04:25,380 你知道,你可以打电话给你的老板并告诉他们 you know, you can call your boss and tell them 112 00:04:25,380 --> 00:04:27,810 你现在是高级 NLP 机器学习工程师 you're a senior NLP machine learning engineer now 113 00:04:27,810 --> 00:04:30,900 你会想要下个季度的薪水审查。 and you're gonna want a salary review next quarter. 114 00:04:30,900 --> 00:04:32,880 这几行代码真的够用了 These few lines of code are really all it takes 115 00:04:32,880 --> 00:04:34,500 应用大量的力量 to apply the power of a massive 116 00:04:34,500 --> 00:04:36,510 预训练语言问题, pre-trained language problem, 117 00:04:36,510 --> 00:04:38,250 大规模的预训练语言模型,不好意思, massive pre-trained language model, excuse me, 118 00:04:38,250 --> 00:04:40,080 到你的 NLP 问题。 to your NLP problem. 119 00:04:40,080 --> 00:04:42,150 但我们能做得更好吗? But could we do better than this? 120 00:04:42,150 --> 00:04:43,920 我的意思是,我们当然可以。 I mean, we certainly could. 121 00:04:43,920 --> 00:04:45,720 具有一些更高级的 Keras 功能 With a few more advanced Keras features 122 00:04:45,720 --> 00:04:47,730 就像一个经过调整的、预定的学习率, like a tuned, scheduled learning rate, 123 00:04:47,730 --> 00:04:49,290 我们可以获得更低的损失 we can get an even lower loss 124 00:04:49,290 --> 00:04:51,990 以及一个更准确、更有用的模型。 and an even more accurate, more useful model. 125 00:04:51,990 --> 00:04:54,120 训练模型后,我们如何处理模型? And what do we do with our model after we train it? 126 00:04:54,120 --> 00:04:55,950 所以所有这些都将包含在视频中 So all of this is going to be covered in the videos 127 00:04:55,950 --> 00:04:57,963 即将推出,敬请期待。 that are coming up, so stay tuned. 128 00:04:59,220 --> 00:05:01,887 (空气呼啸) (air whooshing)