subtitles/zh-CN/04_the-transformer-architecture.srt (256 lines of code) (raw):

1 00:00:00,000 --> 00:00:02,750 (徽标呼啸而过) (logo whooshing) 2 00:00:05,010 --> 00:00:07,323 - 让我们 Transformer 的架构。 - Let's study the transformer architecture. 3 00:00:09,150 --> 00:00:12,030 该视频是编码器的介绍视频, This video is the introductory video to the encoders, 4 00:00:12,030 --> 00:00:15,510 解码器和编码器 - 解码器系列视频。 decoders, and encoder-decoder series of videos. 5 00:00:15,510 --> 00:00:16,343 在这个系列中, In this series, 6 00:00:16,343 --> 00:00:18,900 我们将尝试了解是什么构成了 transformer 网络, we'll try to understand what makes a transformer network, 7 00:00:18,900 --> 00:00:22,770 我们将尝试用简单、高层次的术语来解释它。 and we'll try to explain it in simple, high-level terms. 8 00:00:22,770 --> 00:00:25,800 无需深入了解神经网络, No advanced understanding of neural networks is necessary, 9 00:00:25,800 --> 00:00:29,343 但了解基本向量和张量可能会有所帮助。 but an understanding of basic vectors and tensors may help. 10 00:00:32,250 --> 00:00:33,270 开始, To get started, 11 00:00:33,270 --> 00:00:34,530 我们将处理这张图 we'll take up this diagram 12 00:00:34,530 --> 00:00:36,630 从原来的变压器纸, from the original transformer paper, 13 00:00:36,630 --> 00:00:40,140 Vaswani 等人题为 “注意力就是你所需要的”。 entitled "Attention Is All You Need", by Vaswani et al. 14 00:00:40,140 --> 00:00:41,010 正如我们将在这里看到的, As we'll see here, 15 00:00:41,010 --> 00:00:42,780 我们只能利用它的一部分, we can leverage only some parts of it, 16 00:00:42,780 --> 00:00:44,630 根据我们正在尝试做的事情。 according to what we're trying to do. 17 00:00:45,480 --> 00:00:47,610 我们想深入到特定的层次, We want to dive into the specific layers, 18 00:00:47,610 --> 00:00:48,990 建立那个架构, building up that architecture, 19 00:00:48,990 --> 00:00:51,390 但我们会尝试理解不同的方式 but we'll try to understand the different ways 20 00:00:51,390 --> 00:00:52,893 可以使用此架构。 this architecture can be used. 21 00:00:55,170 --> 00:00:56,003 让我们先开始 Let's first start 22 00:00:56,003 --> 00:00:58,260 通过将该架构分成两部分。 by splitting that architecture into two parts. 23 00:00:58,260 --> 00:00:59,910 在左边我们有编码器, On the left we have the encoder, 24 00:00:59,910 --> 00:01:01,980 右边是解码器。 and on the right, the decoder. 25 00:01:01,980 --> 00:01:03,330 这两个可以一起使用, These two can be used together, 26 00:01:03,330 --> 00:01:05,330 但它们也可以独立使用。 but they can also be used independently. 27 00:01:06,180 --> 00:01:08,610 让我们了解这些是如何工作的。 Let's understand how these work. 28 00:01:08,610 --> 00:01:11,460 编码器接受表示文本的输入。 The encoder accepts inputs that represent text. 29 00:01:11,460 --> 00:01:13,620 它转换这个文本,这些词, It converts this text, these words, 30 00:01:13,620 --> 00:01:15,675 成数值表示。 into numerical representations. 31 00:01:15,675 --> 00:01:17,400 这些数值表示 These numerical representations 32 00:01:17,400 --> 00:01:20,460 也可以称为嵌入或特征。 can also be called embeddings, or features. 33 00:01:20,460 --> 00:01:23,100 我们会看到它使用了 self-attention 机制 We'll see that it uses the self-attention mechanism 34 00:01:23,100 --> 00:01:24,483 作为其主要组成部分。 as its main component. 35 00:01:25,500 --> 00:01:27,120 我们建议你查看视频 We recommend you check out the video 36 00:01:27,120 --> 00:01:29,700 关于编码器具体要了解 on encoders specifically to understand 37 00:01:29,700 --> 00:01:31,680 这个数字表示是什么, what is this numerical representation, 38 00:01:31,680 --> 00:01:33,690 以及它是如何工作的。 as well as how it works. 39 00:01:33,690 --> 00:01:36,660 我们将更详细地研究自注意力机制, We'll study the self-attention mechanism in more detail, 40 00:01:36,660 --> 00:01:38,913 以及它的双向属性。 as well as its bi-directional properties. 41 00:01:40,650 --> 00:01:42,780 解码器类似于编码器。 The decoder is similar to the encoder. 42 00:01:42,780 --> 00:01:45,630 它还可以接受文本输入。 It can also accept text inputs. 43 00:01:45,630 --> 00:01:48,210 它使用与编码器类似的机制, It uses a similar mechanism as the encoder, 44 00:01:48,210 --> 00:01:51,150 这也是掩蔽的自我关注。 which is the masked self-attention as well. 45 00:01:51,150 --> 00:01:52,590 它不同于编码器 It differs from the encoder 46 00:01:52,590 --> 00:01:54,990 由于其单向特性 due to its uni-directional feature 47 00:01:54,990 --> 00:01:58,590 并且传统上以自回归方式使用。 and is traditionally used in an auto-regressive manner. 48 00:01:58,590 --> 00:02:01,650 在这里,我们也建议你查看有关解码器的视频 Here too, we recommend you check out the video on decoders 49 00:02:01,650 --> 00:02:04,000 特别是要了解所有这些是如何工作的。 especially to understand how all of this works. 50 00:02:06,810 --> 00:02:07,890 结合两部分 Combining the two parts 51 00:02:07,890 --> 00:02:10,200 结果就是所谓的编码器 - 解码器, results in what is known as an encoder-decoder, 52 00:02:10,200 --> 00:02:12,720 或序列到序列转换器。 or a sequence-to-sequence transformer. 53 00:02:12,720 --> 00:02:14,280 编码器接受输入 The encoder accepts inputs 54 00:02:14,280 --> 00:02:17,850 并计算这些输入的高级表示。 and computes a high-level representation of those inputs. 55 00:02:17,850 --> 00:02:20,252 然后将这些输出传递给解码器。 These outputs are then passed to the decoder. 56 00:02:20,252 --> 00:02:22,860 解码器使用编码器的输出, The decoder uses the encoder's output, 57 00:02:22,860 --> 00:02:26,370 与其他输入一起生成预测。 alongside other inputs to generate a prediction. 58 00:02:26,370 --> 00:02:27,900 然后它预测输出, It then predicts an output, 59 00:02:27,900 --> 00:02:30,248 它将在未来的迭代中重复使用, which it will re-use in future iterations, 60 00:02:30,248 --> 00:02:32,662 因此,术语自回归。 hence the term, auto-regressive. 61 00:02:32,662 --> 00:02:34,740 最后,为了理解 Finally, to get an understanding 62 00:02:34,740 --> 00:02:36,690 编码器 - 解码器作为一个整体, of the encoder-decoders as a whole, 63 00:02:36,690 --> 00:02:39,670 我们建议你查看有关编码器 - 解码器的视频。 we recommend you check out the video on encoder-decoders. 64 00:02:39,670 --> 00:02:42,420 (徽标呼啸而过) (logo whooshing)