1
00:00:00,459 --> 00:00:03,542
（风拂过的声音）
(wind swiping sound)

2
00:00:05,880 --> 00:00:08,910
- [讲师] 在终端中使用 Python 调试器。
- [Instructor] Using the Python debugger in a terminal.

3
00:00:08,910 --> 00:00:11,580
在本视频中，我们将学习如何使用 Python 调试器
In this video, we'll learn how to use a Python debugger

4
00:00:11,580 --> 00:00:13,140
在一个终端。
in a terminal.

5
00:00:13,140 --> 00:00:15,390
对于这个例子，我们正在运行代码
For this example, we're running code

6
00:00:15,390 --> 00:00:17,760
从令牌分类部分，
from the token classification section,

7
00:00:17,760 --> 00:00:19,950
下载 Conll 数据集
downloading the Conll dataset

8
00:00:19,950 --> 00:00:23,340
在加载标记器以对其进行预处理之前。
before loading a tokenizer to pre-process it.

9
00:00:23,340 --> 00:00:25,140
查看下面的课程链接部分
Check out the section of the course link below

10
00:00:25,140 --> 00:00:26,223
了解更多信息。
for more information.

11
00:00:27,600 --> 00:00:28,500
一旦完成，
Once this is done,

12
00:00:28,500 --> 00:00:30,630
我们尝试将一些功能组合在一起
we try to batch together some features

13
00:00:30,630 --> 00:00:33,180
通过填充训练数据集
of the training dataset by padding them

14
00:00:33,180 --> 00:00:34,330
并返回一个张量。
and returning a tensor.

15
00:00:36,810 --> 00:00:39,510
如果我们尝试在终端中执行我们的脚本
If we try to execute our scripts in a terminal

16
00:00:39,510 --> 00:00:40,413
我们得到一个错误。
we get an error.

17
00:00:42,630 --> 00:00:44,260
请注意，我们在这里使用 PyTorch
Note that we use PyTorch here

18
00:00:44,260 --> 00:00:45,600
我们返回张量等于可惜。
we return tensors equal pity.

19
00:00:45,600 --> 00:00:47,753
但是你会在 TensorFlow 中遇到同样的错误。
But you would get the same error with TensorFlow.

20
00:00:49,500 --> 00:00:51,990
正如我们在 “如何调试错误？” 中看到的那样视频，
As we have seen in the, 'How to debug an error?' video,

21
00:00:51,990 --> 00:00:54,780
原始消息在最后，它表明我们
The raw message is at the end and it indicates we

22
00:00:54,780 --> 00:00:58,260
应该使用配对，我们实际上正在尝试这样做。
should use pairing, which we're actually trying to do.

23
00:00:58,260 --> 00:01:00,630
所以这没有用，我们需要更深入一点
So this is not useful and we need to go little deeper

24
00:01:00,630 --> 00:01:02,310
调试问题。
to debug the problem.

25
00:01:02,310 --> 00:01:04,830
幸运的是，你可以很容易地使用 Python 调试器
Fortunately, you can use the Python debugger quite easily

26
00:01:04,830 --> 00:01:09,830
在终端中通过使用 Python -m PDB 启动脚本
in a terminal by launching your script with Python -m PDB

27
00:01:09,930 --> 00:01:11,980
然后是训练脚本的名称。
and then the name of the training script.

28
00:01:13,410 --> 00:01:15,030
执行该评论时，你将被发送
When executing that comment, you are sent

29
00:01:15,030 --> 00:01:17,340
到脚本的第一条指令。
to the first instruction of your script.

30
00:01:17,340 --> 00:01:20,733
你可以通过键入 N 并回车来运行下一条指令。
You can run just the next instruction by typing N and enter.

31
00:01:22,530 --> 00:01:27,423
或者你可以通过键入 C 并回车直接继续归零。
Or you can continue directly to zero by typing C and enter.

32
00:01:29,850 --> 00:01:31,560
一旦到了那里，你就会走到最底层
Once there, you go to the very bottom

33
00:01:31,560 --> 00:01:34,050
回溯，你可以键入命令。
of the traceback and you can type commands.

34
00:01:34,050 --> 00:01:36,360
你应该学习的前两个命令是 U 和 D，
The first two commands you should learn are U and D,

35
00:01:36,360 --> 00:01:38,160
对于上下。
for up and down.

36
00:01:38,160 --> 00:01:41,223
这允许你在回溯中上下移动。
This allows you to get up and down in the traceback.

37
00:01:42,990 --> 00:01:46,623
上升两次，我们到达了错误发生的地步。
Going up twice, we get to the point the error was reached.

38
00:01:47,910 --> 00:01:50,190
要学习的第一个命令是 P，表示打印。
The first command to learn is P for print.

39
00:01:50,190 --> 00:01:52,830
它允许你打印任何你想要的值。
It allows you to print any value you want.

40
00:01:52,830 --> 00:01:56,280
例如，这里我们可以看到 return_tensors 的值
For instance, here we can see the value of return_tensors

41
00:01:56,280 --> 00:02:00,210
或 batch_outputs 试图了解触发零的原因。
or batch_outputs to try to understand what triggered zero.

42
00:02:00,210 --> 00:02:03,000
批量输出字典有点难看
The batch outputs dictionary is a bit hard to see

43
00:02:03,000 --> 00:02:05,520
因此，让我们深入研究它的较小部分。
so let's dive into smaller pieces of it.

44
00:02:05,520 --> 00:02:08,460
在调试器内部，你不仅可以打印任何变量
Inside the debugger, you can not only print any variable

45
00:02:08,460 --> 00:02:10,740
还要评估任何表达式，
but also evaluate any expression,

46
00:02:10,740 --> 00:02:13,713
所以我们可以独立地查看输入。
so we can look independently at the inputs.

47
00:02:15,060 --> 00:02:15,993
还有标签。
Also labels.

48
00:02:22,350 --> 00:02:24,300
这些标签肯定很奇怪。
Those labels are definitely weird.

49
00:02:24,300 --> 00:02:26,880
它们有各种尺寸，我们可以确认
They are various size, which we can confirm

50
00:02:26,880 --> 00:02:29,553
通过使用释放压缩打印站点。
by printing the sites using a release compression.

51
00:02:35,880 --> 00:02:37,800
难怪分词器无法创建
No wonder the tokenizer wasn't able to create

52
00:02:37,800 --> 00:02:39,270
与他们的张量。
a tensor with them.

53
00:02:39,270 --> 00:02:41,460
这是因为 pad 方法只关心
This is because the pad method only takes care

54
00:02:41,460 --> 00:02:44,850
分词器的输出、输入 ID、注意掩码
of the tokenizer outputs, the input IDs, the attention mask

55
00:02:44,850 --> 00:02:46,560
和令牌类型 ID。
and the token type IDs.

56
00:02:46,560 --> 00:02:48,390
所以我们必须自己填充关卡
So we have to pad the level ourselves

57
00:02:48,390 --> 00:02:51,300
在尝试与他们一起创建新传感器之前。
before trying to create a new sensor with them.

58
00:02:51,300 --> 00:02:54,030
一旦你准备好执行 Python 调试器，
Once you're ready to execute the Python debugger,

59
00:02:54,030 --> 00:02:56,640
你可以按 Q 退出并进入。
you can press Q for quit and enter.

60
00:02:56,640 --> 00:02:59,790
我们可以通过另一种方式访问 Python 调试器，
Another way we can access the Python debugger,

61
00:02:59,790 --> 00:03:02,310
是在我们的脚本中设置一个断点。
is to put a breaking point in our script.

62
00:03:02,310 --> 00:03:05,913
我们可以使用 PDB 中的 set_trace 方法来做到这一点。
We can do this using the PDB that set_trace method.

63
00:03:07,920 --> 00:03:09,870
只要我们导入 PDB 模块
As long as we import the PDB module

64
00:03:09,870 --> 00:03:11,420
在我们脚本的开头。
at the beginning of our script.

65
00:03:12,510 --> 00:03:17,283
保存并重新启动我们的脚本，仅使用 Python。
Saving and then relaunching our script, with just Python.

66
00:03:19,710 --> 00:03:23,310
我们将在我们设置的断点处停止执行。
We'll stop the execution at the breaking point we set.

67
00:03:23,310 --> 00:03:24,660
我们可以检查所有变量
We can inspect all the variable

68
00:03:24,660 --> 00:03:27,030
在再次执行下一条指令之前。
before the next instruction is executed again.

69
00:03:27,030 --> 00:03:29,253
例如，这里的功能。
For instance, here, the features.

70
00:03:30,270 --> 00:03:33,090
输入 N 并回车执行下一条指令
Typing N and enter execute the next instruction

71
00:03:33,090 --> 00:03:35,700
这将我们带回到回溯中。
which takes us back inside traceback.

72
00:03:35,700 --> 00:03:37,530
当要手动修复零是
When going to fix zero manually is to

73
00:03:37,530 --> 00:03:39,873
将所有标签填充到最长。
pad all the labels to the longest.

74
00:03:42,000 --> 00:03:45,120
另一种方法是使用合适的数据创建器
Another way is to use the data creator suitable

75
00:03:45,120 --> 00:03:46,443
用于令牌分类。
for token classification.

76
00:03:48,330 --> 00:03:50,340
如果你想学习如何使用 Python
If you want to learn how to use the Python

77
00:03:50,340 --> 00:03:53,273
笔记本中的调试器，查看下面链接中的视频。
debugger in a notebook, check out the video in link below.

78
00:03:54,698 --> 00:03:57,781
（风拂过的声音）
(wind swiping sound)