subtitles/zh-CN/07_transformer-models-encoder-decoders.srt (568 lines of code) (raw):

1 00:00:00,520 --> 00:00:02,603 (嗖嗖) (swoosh) 2 00:00:04,230 --> 00:00:05,063 - 在这个视频中, - In this video, 3 00:00:05,063 --> 00:00:07,638 我们将研究编码-解码器架构。 we'll study the encoder-decoder architecture. 4 00:00:07,638 --> 00:00:12,243 流行的编码-解码器模型的一个示例是 T5。 An example of a popular encoder-decoder model is T5. 5 00:00:13,770 --> 00:00:16,980 为了理解编码-解码器是如何工作的, In order to understand how the encoder-decoder works, 6 00:00:16,980 --> 00:00:18,630 我们建议您查看 we recommend you check out the videos 7 00:00:18,630 --> 00:00:22,590 将编码-解码器作为独立的模型(Encoders and Decoders as standalone models)这一视频。 on encoders and decoders as standalone models. 8 00:00:22,590 --> 00:00:24,990 了解它们如何单独工作 Understanding how they work individually 9 00:00:24,990 --> 00:00:28,323 将有助于理解编码-解码器的工作原理。 will help understanding how an encoder-decoder works. 10 00:00:30,510 --> 00:00:33,390 让我们从我们已了解的编码器开始。 Let's start from what we've seen about the encoder. 11 00:00:33,390 --> 00:00:36,240 编码器将单词作为输入, The encoder takes words as inputs, 12 00:00:36,240 --> 00:00:38,520 通过编码器进行转换, casts them through the encoder, 13 00:00:38,520 --> 00:00:40,800 并检索每个单词的 and retrieves a numerical representation 14 00:00:40,800 --> 00:00:42,663 数值表示。 for each word cast through it. 15 00:00:43,560 --> 00:00:46,470 我们现在知道这个数值表示 We now know that this numerical representation 16 00:00:46,470 --> 00:00:49,473 包含关于序列意义的信息。 holds information about the meaning of the sequence. 17 00:00:51,090 --> 00:00:54,243 让我们把这个放在一边,将解码器添加到图中。 Let's put this aside and add the decoder to the diagram. 18 00:00:56,610 --> 00:00:57,510 在这种情况下, In this scenario, 19 00:00:57,510 --> 00:00:59,190 我们以某种我们以前没见过的方式 we're using the decoder in a manner 20 00:00:59,190 --> 00:01:00,960 使用解码器。 that we haven't seen before. 21 00:01:00,960 --> 00:01:04,173 我们将编码器的输出直接传递给它。 We're passing the outputs of the encoder directly to it. 22 00:01:05,356 --> 00:01:07,770 另外,在给解码器输入序列的同时 Additionally to the encoder outputs, 23 00:01:07,770 --> 00:01:10,800 我们还需要编码器的输出。 we also give the decoder a sequence. 24 00:01:10,800 --> 00:01:12,840 在不给定初始序列的情况下 When prompting the decoder for an output 25 00:01:12,840 --> 00:01:14,190 向解码器提示输出时, with no initial sequence, 26 00:01:14,190 --> 00:01:16,140 我们可以给它一个 we can give it the value that indicates 27 00:01:16,140 --> 00:01:18,060 表示序列开头的值。 the start of a sequence. 28 00:01:18,060 --> 00:01:20,919 这就是编码-解码器魔术发生的地方。 And that's where the encoder-decoder magic happens. 29 00:01:20,919 --> 00:01:24,082 编码器接受一个序列作为输入。 The encoder accepts a sequence as input. 30 00:01:24,082 --> 00:01:25,980 它计算一个预测, It computes a prediction, 31 00:01:25,980 --> 00:01:28,858 并输出一个数值表示。 and outputs a numerical representation. 32 00:01:28,858 --> 00:01:33,120 然后,它将其发送到解码器。 Then, it sends that over to the decoder. 33 00:01:33,120 --> 00:01:36,300 从某种意义上说,它编码了那个序列。 It has, in a sense, encoded that sequence. 34 00:01:36,300 --> 00:01:38,130 反过来,解码器 And the decoder, in turn, 35 00:01:38,130 --> 00:01:40,847 将此输入与其通常的序列输入一起使用, using this input alongside its usual sequence input, 36 00:01:40,847 --> 00:01:43,906 将尝试解码序列。 will take a stab at decoding the sequence. 37 00:01:43,906 --> 00:01:46,530 解码器解码序列, The decoder decodes the sequence, 38 00:01:46,530 --> 00:01:48,360 并输出一个词。 and outputs a word. 39 00:01:48,360 --> 00:01:51,300 到目前为止,我们不需要理解这个词, As of now, we don't need to make sense of that word, 40 00:01:51,300 --> 00:01:53,100 但我们可以理解解码器 but we can understand that the decoder 41 00:01:53,100 --> 00:01:56,103 本质上是解码编码器输出的内容。 is essentially decoding what the encoder has output. 42 00:01:57,008 --> 00:02:00,000 序列词的开头在这里 The start of sequence word here 43 00:02:00,000 --> 00:02:02,871 表示它应该开始解码序列。 indicates that it should start decoding the sequence. 44 00:02:02,871 --> 00:02:06,870 现在我们有了编码器的数字表示 Now that we have both the encoder numerical representation 45 00:02:06,870 --> 00:02:09,570 和一个初始生成的词, and an initial generated word, 46 00:02:09,570 --> 00:02:11,343 我们不再需要编码器了。 we don't need the encoder anymore. 47 00:02:12,269 --> 00:02:15,540 正如我们之前在解码器中看到的那样, As we have seen before with the decoder, 48 00:02:15,540 --> 00:02:18,720 它可以以自回归的方式起作用。 it can act in an auto-regressive manner. 49 00:02:18,720 --> 00:02:22,933 它刚刚输出的单词现在可以用作输入。 The word it has just output can now be used as an input. 50 00:02:22,933 --> 00:02:26,188 这个编码器输出的数值表示 This, in combination with the numerical representation 51 00:02:26,188 --> 00:02:28,560 和初始化的值结合, output by the encoder, 52 00:02:28,560 --> 00:02:31,203 可以被用于生成第二个单词。 can now be used to generate a second word. 53 00:02:33,040 --> 00:02:35,910 请注意,第一个词仍然在这里, Please note that the first word is still here, 54 00:02:35,910 --> 00:02:37,770 因为模型仍然输出它。 as the model still outputs it. 55 00:02:37,770 --> 00:02:39,240 但是,我们已将其变灰 However, we have grayed it out 56 00:02:39,240 --> 00:02:40,940 因为我们不再需要它了。 as we have no need for it anymore. 57 00:02:41,880 --> 00:02:44,070 我们可以继续下去,例如, We can continue on and on, for example, 58 00:02:44,070 --> 00:02:46,320 直到解码器输出一个 until the decoder outputs a value 59 00:02:46,320 --> 00:02:48,540 我们认为是停止值的数值, that we consider a stopping value, 60 00:02:48,540 --> 00:02:51,093 比如句号表示序列的结束。 like a dot meaning the end of a sequence. 61 00:02:53,580 --> 00:02:55,926 在这里,我们已经看到了编码-解码 transformer Here, we've seen the full mechanism 62 00:02:55,926 --> 00:02:57,540 完整的机制。 of the encoder-decoder transformer. 63 00:02:57,540 --> 00:02:59,280 让我们再看一遍。 Let's go over it one more time. 64 00:02:59,280 --> 00:03:02,773 我们有一个初始序列被送到编码器中。 We have an initial sequence that is sent to the encoder. 65 00:03:02,773 --> 00:03:06,450 编码器的输出发送到解码器 That encoder output is then sent to the decoder 66 00:03:06,450 --> 00:03:07,563 以便对其进行解码。 for it to be decoded. 67 00:03:08,760 --> 00:03:12,450 虽然在一次使用后可以丢弃编码器, While it can now discard the encoder after a single use, 68 00:03:12,450 --> 00:03:14,427 但解码器将被多次使用 the decoder will be used several times 69 00:03:14,427 --> 00:03:17,763 直到我们生成了所需要的每一个词。 until we have generated every word that we need. 70 00:03:19,288 --> 00:03:21,510 那么让我们结合翻译语言建模 So let's see a concrete example 71 00:03:21,510 --> 00:03:23,460 看一个具体的例子。 with Translation Language Modeling. 72 00:03:23,460 --> 00:03:24,930 也称为传导, Also called transduction, 73 00:03:24,930 --> 00:03:28,200 这是翻译序列的行为。 which is the act of translating a sequence. 74 00:03:28,200 --> 00:03:30,577 在这里,我们想翻译这个英文序列 Here, we would like to translate this English sequence 75 00:03:30,577 --> 00:03:33,067 “Welcome to NYC”到法语。 "Welcome to NYC" in French. 76 00:03:33,067 --> 00:03:35,460 我们正在使用 transformer 模型 We're using a transformer model 77 00:03:35,460 --> 00:03:38,070 明确针对该任务进行了训练。 that is trained for that task explicitly. 78 00:03:38,070 --> 00:03:40,560 我们使用编码器来创建英语句子 We use the encoder to create a representation 79 00:03:40,560 --> 00:03:42,240 的表示。 of the English sentence. 80 00:03:42,240 --> 00:03:44,730 我们使用编码器来创建英语句子的表示形式, We cast this to the decoder, 81 00:03:44,730 --> 00:03:46,620 然后将其传递给解码器, with the use of the start of sequence word, 82 00:03:46,620 --> 00:03:49,173 在使用开始序列单词的情况下,请求它输出第一个单词。 we ask it to output the first word. 83 00:03:50,029 --> 00:03:53,607 输出 bienvenue,表示欢迎。 It outputs bienvenue, which means welcome. 84 00:03:53,607 --> 00:03:56,640 然后我们使用 bienvenue And we then use bienvenue 85 00:03:56,640 --> 00:03:59,283 作为解码器的输入序列。 as the input sequence for the decoder. 86 00:04:00,188 --> 00:04:04,470 这与编码器数字表示一起, This, alongside the encoder numerical representation, 87 00:04:04,470 --> 00:04:07,440 允许解码器预测第二个词, allows the decoder to predict the second word, Ã, 88 00:04:07,440 --> 00:04:09,240 这是英文的。 which is to in English. 89 00:04:09,240 --> 00:04:13,590 最后,我们要求解码器预测第三个词 Finally, we ask the decoder to predict a third word 90 00:04:13,590 --> 00:04:15,330 它预测 NYC,这是正确的。 It predicts NYC, which is correct. 91 00:04:15,330 --> 00:04:18,288 我们已经翻译了这句话。 We've translated the sentence. 92 00:04:18,288 --> 00:04:20,760 编码-解码器真正发挥作用的地方, Where the encoder-decoder really shines, 93 00:04:20,760 --> 00:04:23,550 是我们有一个编码器和一个解码器, is that we have an encoder and a decoder, 94 00:04:23,550 --> 00:04:25,323 通常不共享权重。 which often do not share weights. 95 00:04:26,256 --> 00:04:29,460 因此,我们有一个完整的块,编码器, Therefore, we have an entire block, the encoder, 96 00:04:29,460 --> 00:04:31,650 可以被训练,从而理解序列 that can be trained to understand the sequence 97 00:04:31,650 --> 00:04:34,290 并提取相关信息。 and extract the relevant information. 98 00:04:34,290 --> 00:04:36,450 对于我们之前看到的翻译场景, For the translation scenario we've seen earlier, 99 00:04:36,450 --> 00:04:38,760 例如,这意味着解析 for example, this would mean parsing 100 00:04:38,760 --> 00:04:42,003 并理解用英语说的内容。 and understanding what was said in the English language. 101 00:04:42,900 --> 00:04:45,960 这意味着从该语言中提取信息, It would mean extracting information from that language, 102 00:04:45,960 --> 00:04:49,413 并将所有这些放在一个信息密集的向量中。 and putting all of that in a vector dense in information. 103 00:04:50,361 --> 00:04:53,370 另一方面,我们有解码器, On the other hand, we have the decoder, 104 00:04:53,370 --> 00:04:56,850 它的唯一目的是解码 whose sole purpose is to decode the numerical representation 105 00:04:56,850 --> 00:04:58,203 编码器输出的数值表示。 output by the encoder. 106 00:04:59,460 --> 00:05:01,170 这个解码器可以专门 This decoder can be specialized 107 00:05:01,170 --> 00:05:02,970 用完全不同的语言, in a completely different language, 108 00:05:02,970 --> 00:05:05,403 甚至像图像或语音这样的模态。 or even modality like images or speech. 109 00:05:07,170 --> 00:05:10,473 编码-解码器之所以特殊,有几个原因。 Encoders-decoders are special for several reasons. 110 00:05:11,310 --> 00:05:15,570 首先,它们能够管理任务的顺序, Firstly, they're able to manage sequence to sequence tasks, 111 00:05:15,570 --> 00:05:18,358 就像我们刚刚看到的翻译一样。 like translation that we have just seen. 112 00:05:18,358 --> 00:05:20,940 其次,编码器和解码器之间的权重 Secondly, the weights between the encoder 113 00:05:20,940 --> 00:05:24,540 并不一定共享。 and the decoder parts are not necessarily shared. 114 00:05:24,540 --> 00:05:27,172 再举一个翻译的例子。 Let's take another example of translation. 115 00:05:27,172 --> 00:05:30,810 这里我们用法语 Here we're translating "Transformers are powerful" 116 00:05:30,810 --> 00:05:32,048 翻译 Transformers are powerful in French. 117 00:05:32,048 --> 00:05:35,258 首先,这意味着从三个单词的序列中, Firstly, this means that from a sequence of three words, 118 00:05:35,258 --> 00:05:39,030 我们能够生成一个包含四个单词的序列。 we're able to generate a sequence of four words. 119 00:05:39,030 --> 00:05:42,480 有人可能会争辩说这可以用解码器来处理 One could argue that this could be handled with a decoder 120 00:05:42,480 --> 00:05:44,160 那会以自回归的方式 that would generate the translation 121 00:05:44,160 --> 00:05:46,260 生成翻译结果, in an auto-regressive manner, 122 00:05:46,260 --> 00:05:47,460 他们是对的。 and they would be right. 123 00:05:49,980 --> 00:05:51,930 基于 Transformers Seq2Seq 模型的 Another example of where sequence to sequence 124 00:05:51,930 --> 00:05:54,810 另一个亮点是进行总结的能力 transformers shine is in summarization. 125 00:05:54,810 --> 00:05:58,379 这里我们有一个很长的序列,通常是全文, Here we have a very long sequence, generally a full text, 126 00:05:58,379 --> 00:06:01,020 我们想总结一下。 and we want to summarize it. 127 00:06:01,020 --> 00:06:04,020 由于编码器和解码器是分开的, Since the encoder and decoders are separated, 128 00:06:04,020 --> 00:06:06,300 我们可以有不同的上下文长度。 we can have different context lengths. 129 00:06:06,300 --> 00:06:08,910 例如,编码器的一个非常长的上下文, For example, a very long context for the encoder, 130 00:06:08,910 --> 00:06:10,230 处理文本, which handles the text, 131 00:06:10,230 --> 00:06:12,210 尔解码器上下文则较小 and a smaller context for the decoder 132 00:06:12,210 --> 00:06:14,223 它处理汇总序列。 which handles the summarized sequence. 133 00:06:16,470 --> 00:06:18,840 有很多序列到序列的模型。 There are a lot of sequence to sequence models. 134 00:06:18,840 --> 00:06:20,310 这包含了 Transformers 库中 This contains a few examples 135 00:06:20,310 --> 00:06:22,500 几个受欢迎的 of popular encoder-decoder models 136 00:06:22,500 --> 00:06:24,400 编码-解码器模型的示例。 available in the transformers library. 137 00:06:25,829 --> 00:06:29,940 此外,您可以在编码-解码器模型中 Additionally, you can load an encoder and a decoder 138 00:06:29,940 --> 00:06:32,130 加载编码器和解码器。 inside an encoder-decoder model. 139 00:06:32,130 --> 00:06:35,190 因此,根据您要解决的具体任务, Therefore, according to the specific task you are targeting, 140 00:06:35,190 --> 00:06:38,700 您可能会选择使用在这些具体任务上证明 you may choose to use specific encoders and decoders, 141 00:06:38,700 --> 00:06:42,613 其价值的特定编码器和解码器。 which have proven their worth on these specific tasks. 142 00:06:42,613 --> 00:06:44,696 (嗖嗖) (swoosh)