subtitles/zh-CN/17_batching-inputs-together-(pytorch).srt (265 lines of code) (raw):
1
00:00:00,373 --> 00:00:02,956
(微妙的爆炸)
(subtle blast)
2
00:00:05,400 --> 00:00:07,590
- 如何一起批量输入。
- How to batch inputs together.
3
00:00:07,590 --> 00:00:09,240
在本视频中,我们将看到如何
In this video, we will see how
4
00:00:09,240 --> 00:00:11,073
将输入序列一起批处理。
to batch input sequences together.
5
00:00:12,137 --> 00:00:15,420
一般来说,我们想要输入模型的句子
In general, the sentences we want to pass through our model
6
00:00:15,420 --> 00:00:17,670
不会都有相同的长度。
won't all have the same lengths.
7
00:00:17,670 --> 00:00:19,740
在这里,我们使用模型
Here, we are using the model we saw
8
00:00:19,740 --> 00:00:22,080
在情绪分析 pipeline 中讲的
in the sentiment analysis pipeline
9
00:00:22,080 --> 00:00:24,063
并想对两个句子进行分类。
and want to classify two sentences.
10
00:00:24,900 --> 00:00:27,360
将它们分词化并映射每个分词时
*[译者注: token, tokenization, tokenizer 等词均译成了 分词*, 实则不翻译最佳]
When tokenizing them and mapping each token
11
00:00:27,360 --> 00:00:29,610
到其相应的输入 ID,
to its corresponding input IDs,
12
00:00:29,610 --> 00:00:31,593
我们得到两个不同长度的列表。
we get two lists of different lengths.
13
00:00:33,240 --> 00:00:35,340
尝试创建 tensor 或 NumPy 数组
Trying to create a tensor or a NumPy array
14
00:00:35,340 --> 00:00:38,220
从这两个列表中, 将导致错误,
from those two lists will result in an error,
15
00:00:38,220 --> 00:00:41,043
因为所有数组和张量都应该是矩形的。
because all arrays and tensors should be rectangular.
16
00:00:42,240 --> 00:00:44,160
突破此限制的一种方法
One way to overcome this limit
17
00:00:44,160 --> 00:00:45,690
是让第二句
is to make the second sentence
18
00:00:45,690 --> 00:00:47,640
与第一个长度相同
the same length as the first
19
00:00:47,640 --> 00:00:50,463
通过根据需要多次添加特殊分词。
by adding a special token as many times as necessary.
20
00:00:51,600 --> 00:00:53,970
另一种方法是截断第一个序列
Another way would be to truncate the first sequence
21
00:00:53,970 --> 00:00:55,710
到第二个的长度,
to the length of the second,
22
00:00:55,710 --> 00:00:58,140
但我们会失去很多信息
but we would then lose a lot of information
23
00:00:58,140 --> 00:01:01,083
而这可能是正确分类句子所必需的。
that might be necessary to properly classify the sentence.
24
00:01:02,190 --> 00:01:04,830
一般来说,我们只截断句子
In general, we only truncate sentences
25
00:01:04,830 --> 00:01:06,840
当它们长于最大长度时
when they are longer than the maximum length
26
00:01:06,840 --> 00:01:08,073
该模型可以处理。
the model can handle.
27
00:01:09,720 --> 00:01:11,850
用于填充第二句的值
The value used to pad the second sentence
28
00:01:11,850 --> 00:01:13,740
不应被随意挑选;
should not be picked randomly;
29
00:01:13,740 --> 00:01:16,680
该模型已经用特定的填充 ID 进行了预训练,
the model has been pretrained with a certain padding ID,
30
00:01:16,680 --> 00:01:19,533
你可以在 tokenizer.pad_token_id 中找到它。
which you can find in tokenizer.pad_token_id.
31
00:01:21,090 --> 00:01:22,800
现在我们已经填充了句子,
Now that we have padded our sentences,
32
00:01:22,800 --> 00:01:24,303
我们可以和他们做成一批。
we can make a batch with them.
33
00:01:25,380 --> 00:01:28,320
如果我们分别将两个句子传递给模型
If we pass the two sentences to the model separately
34
00:01:28,320 --> 00:01:30,120
和并批在一起,然而
and batched together however,
35
00:01:30,120 --> 00:01:32,100
我们注意到我们没有得到相同的结果
we notice that we don't get the same results
36
00:01:32,100 --> 00:01:34,060
对于被填充的句子,
for the sentence that is padded,
37
00:01:34,060 --> 00:01:35,403
在这里,第二个。
here, the second one.
38
00:01:36,390 --> 00:01:39,420
是 Transformers 有问题?不。
It's at the backend in the Transformers Library? No.
39
00:01:39,420 --> 00:01:40,770
如果你还记得 Transformer 模型
If you remember that Transformer models
40
00:01:40,770 --> 00:01:42,810
大量使用注意力层,
make heavy use of attention layers,
41
00:01:42,810 --> 00:01:45,210
这应该不足为奇;
this should not come as a total surprise;
42
00:01:45,210 --> 00:01:48,277
在计算每个分词的上下文表示时,
when computing the contextual representation of each token,
43
00:01:48,277 --> 00:01:50,910
注意层查看所有其他词
the attention layers look at all the other words
44
00:01:50,910 --> 00:01:52,410
在句子中。
in the sentence.
45
00:01:52,410 --> 00:01:53,850
如果我们只有这句话
If we have just the sentence
46
00:01:53,850 --> 00:01:56,970
或者添加了几个填充 token 的句子,
or the sentence with several padding tokens added,
47
00:01:56,970 --> 00:01:59,073
我们没有得到相同的值是合乎逻辑的。
it's logical we don't get the same values.
48
00:02:00,270 --> 00:02:03,030
要在有或没有填充的情况下获得相同的结果,
To get the same results with or without padding,
49
00:02:03,030 --> 00:02:05,340
我们需要向注意力层表明
we need to indicate to the attention layers
50
00:02:05,340 --> 00:02:08,070
他们应该忽略那些填充 token 。
that they should ignore those padding tokens.
51
00:02:08,070 --> 00:02:10,620
这是通过创建一个注意力掩码来完成的,
This is done by creating an attention mask,
52
00:02:10,620 --> 00:02:13,320
与输入 ID 具有相同形状的张量,
a tensor with the same shape as the input IDs,
53
00:02:13,320 --> 00:02:14,733
用 0 和 1 。
with zeros and ones.
54
00:02:15,780 --> 00:02:18,120
1 的分词表示注意层
Ones indicate the tokens the attention layers
55
00:02:18,120 --> 00:02:20,100
应该结合上下文考虑
should consider in the context
56
00:02:20,100 --> 00:02:22,100
并且 0 的分词他们应该忽略。
and zeros the tokens they should ignore.
57
00:02:23,520 --> 00:02:26,760
现在,将这个注意掩码与输入 ID 一起传入
Now, passing this attention mask along with the input ID
58
00:02:26,760 --> 00:02:28,170
会给我们相同的结果
will give us the same results
59
00:02:28,170 --> 00:02:31,170
就像我们将两个句子单独发送给模型一样。
as when we sent the two sentences individually to the model.
60
00:02:32,400 --> 00:02:34,950
这一切都是由分词器在幕后完成的
This is all done behind the scenes by the tokenizer
61
00:02:34,950 --> 00:02:36,900
当你将它应用于几个句子时
when you apply it to several sentences
62
00:02:36,900 --> 00:02:38,613
设参数 padding=True。
with the flag padding=True.
63
00:02:39,599 --> 00:02:41,490
它将应用具有适当值的填充
It will apply the padding with the proper value
64
00:02:41,490 --> 00:02:43,140
对较小的句子
to the smaller sentences
65
00:02:43,140 --> 00:02:45,423
并创建适当的注意掩码。
and create the appropriate attention mask.
66
00:02:46,993 --> 00:02:49,576
(微妙的爆炸)
(subtle blast)