1 00:00:00,459 --> 00:00:03,542 (风拂过的声音) (wind swiping sound) 2 00:00:05,880 --> 00:00:08,910 - [讲师] 在终端中使用 Python 调试器。 - [Instructor] Using the Python debugger in a terminal. 3 00:00:08,910 --> 00:00:11,580 在本视频中,我们将学习如何使用 Python 调试器 In this video, we'll learn how to use a Python debugger 4 00:00:11,580 --> 00:00:13,140 在一个终端。 in a terminal. 5 00:00:13,140 --> 00:00:15,390 对于这个例子,我们正在运行代码 For this example, we're running code 6 00:00:15,390 --> 00:00:17,760 从令牌分类部分, from the token classification section, 7 00:00:17,760 --> 00:00:19,950 下载 Conll 数据集 downloading the Conll dataset 8 00:00:19,950 --> 00:00:23,340 在加载标记器以对其进行预处理之前。 before loading a tokenizer to pre-process it. 9 00:00:23,340 --> 00:00:25,140 查看下面的课程链接部分 Check out the section of the course link below 10 00:00:25,140 --> 00:00:26,223 了解更多信息。 for more information. 11 00:00:27,600 --> 00:00:28,500 一旦完成, Once this is done, 12 00:00:28,500 --> 00:00:30,630 我们尝试将一些功能组合在一起 we try to batch together some features 13 00:00:30,630 --> 00:00:33,180 通过填充训练数据集 of the training dataset by padding them 14 00:00:33,180 --> 00:00:34,330 并返回一个张量。 and returning a tensor. 15 00:00:36,810 --> 00:00:39,510 如果我们尝试在终端中执行我们的脚本 If we try to execute our scripts in a terminal 16 00:00:39,510 --> 00:00:40,413 我们得到一个错误。 we get an error. 17 00:00:42,630 --> 00:00:44,260 请注意,我们在这里使用 PyTorch Note that we use PyTorch here 18 00:00:44,260 --> 00:00:45,600 我们返回张量等于可惜。 we return tensors equal pity. 19 00:00:45,600 --> 00:00:47,753 但是你会在 TensorFlow 中遇到同样的错误。 But you would get the same error with TensorFlow. 20 00:00:49,500 --> 00:00:51,990 正如我们在 “如何调试错误?” 中看到的那样视频, As we have seen in the, 'How to debug an error?' video, 21 00:00:51,990 --> 00:00:54,780 原始消息在最后,它表明我们 The raw message is at the end and it indicates we 22 00:00:54,780 --> 00:00:58,260 应该使用配对,我们实际上正在尝试这样做。 should use pairing, which we're actually trying to do. 23 00:00:58,260 --> 00:01:00,630 所以这没有用,我们需要更深入一点 So this is not useful and we need to go little deeper 24 00:01:00,630 --> 00:01:02,310 调试问题。 to debug the problem. 25 00:01:02,310 --> 00:01:04,830 幸运的是,你可以很容易地使用 Python 调试器 Fortunately, you can use the Python debugger quite easily 26 00:01:04,830 --> 00:01:09,830 在终端中通过使用 Python -m PDB 启动脚本 in a terminal by launching your script with Python -m PDB 27 00:01:09,930 --> 00:01:11,980 然后是训练脚本的名称。 and then the name of the training script. 28 00:01:13,410 --> 00:01:15,030 执行该评论时,你将被发送 When executing that comment, you are sent 29 00:01:15,030 --> 00:01:17,340 到脚本的第一条指令。 to the first instruction of your script. 30 00:01:17,340 --> 00:01:20,733 你可以通过键入 N 并回车来运行下一条指令。 You can run just the next instruction by typing N and enter. 31 00:01:22,530 --> 00:01:27,423 或者你可以通过键入 C 并回车直接继续归零。 Or you can continue directly to zero by typing C and enter. 32 00:01:29,850 --> 00:01:31,560 一旦到了那里,你就会走到最底层 Once there, you go to the very bottom 33 00:01:31,560 --> 00:01:34,050 回溯,你可以键入命令。 of the traceback and you can type commands. 34 00:01:34,050 --> 00:01:36,360 你应该学习的前两个命令是 U 和 D, The first two commands you should learn are U and D, 35 00:01:36,360 --> 00:01:38,160 对于上下。 for up and down. 36 00:01:38,160 --> 00:01:41,223 这允许你在回溯中上下移动。 This allows you to get up and down in the traceback. 37 00:01:42,990 --> 00:01:46,623 上升两次,我们到达了错误发生的地步。 Going up twice, we get to the point the error was reached. 38 00:01:47,910 --> 00:01:50,190 要学习的第一个命令是 P,表示打印。 The first command to learn is P for print. 39 00:01:50,190 --> 00:01:52,830 它允许你打印任何你想要的值。 It allows you to print any value you want. 40 00:01:52,830 --> 00:01:56,280 例如,这里我们可以看到 return_tensors 的值 For instance, here we can see the value of return_tensors 41 00:01:56,280 --> 00:02:00,210 或 batch_outputs 试图了解触发零的原因。 or batch_outputs to try to understand what triggered zero. 42 00:02:00,210 --> 00:02:03,000 批量输出字典有点难看 The batch outputs dictionary is a bit hard to see 43 00:02:03,000 --> 00:02:05,520 因此,让我们深入研究它的较小部分。 so let's dive into smaller pieces of it. 44 00:02:05,520 --> 00:02:08,460 在调试器内部,你不仅可以打印任何变量 Inside the debugger, you can not only print any variable 45 00:02:08,460 --> 00:02:10,740 还要评估任何表达式, but also evaluate any expression, 46 00:02:10,740 --> 00:02:13,713 所以我们可以独立地查看输入。 so we can look independently at the inputs. 47 00:02:15,060 --> 00:02:15,993 还有标签。 Also labels. 48 00:02:22,350 --> 00:02:24,300 这些标签肯定很奇怪。 Those labels are definitely weird. 49 00:02:24,300 --> 00:02:26,880 它们有各种尺寸,我们可以确认 They are various size, which we can confirm 50 00:02:26,880 --> 00:02:29,553 通过使用释放压缩打印站点。 by printing the sites using a release compression. 51 00:02:35,880 --> 00:02:37,800 难怪分词器无法创建 No wonder the tokenizer wasn't able to create 52 00:02:37,800 --> 00:02:39,270 与他们的张量。 a tensor with them. 53 00:02:39,270 --> 00:02:41,460 这是因为 pad 方法只关心 This is because the pad method only takes care 54 00:02:41,460 --> 00:02:44,850 分词器的输出、输入 ID、注意掩码 of the tokenizer outputs, the input IDs, the attention mask 55 00:02:44,850 --> 00:02:46,560 和令牌类型 ID。 and the token type IDs. 56 00:02:46,560 --> 00:02:48,390 所以我们必须自己填充关卡 So we have to pad the level ourselves 57 00:02:48,390 --> 00:02:51,300 在尝试与他们一起创建新传感器之前。 before trying to create a new sensor with them. 58 00:02:51,300 --> 00:02:54,030 一旦你准备好执行 Python 调试器, Once you're ready to execute the Python debugger, 59 00:02:54,030 --> 00:02:56,640 你可以按 Q 退出并进入。 you can press Q for quit and enter. 60 00:02:56,640 --> 00:02:59,790 我们可以通过另一种方式访问 Python 调试器, Another way we can access the Python debugger, 61 00:02:59,790 --> 00:03:02,310 是在我们的脚本中设置一个断点。 is to put a breaking point in our script. 62 00:03:02,310 --> 00:03:05,913 我们可以使用 PDB 中的 set_trace 方法来做到这一点。 We can do this using the PDB that set_trace method. 63 00:03:07,920 --> 00:03:09,870 只要我们导入 PDB 模块 As long as we import the PDB module 64 00:03:09,870 --> 00:03:11,420 在我们脚本的开头。 at the beginning of our script. 65 00:03:12,510 --> 00:03:17,283 保存并重新启动我们的脚本,仅使用 Python。 Saving and then relaunching our script, with just Python. 66 00:03:19,710 --> 00:03:23,310 我们将在我们设置的断点处停止执行。 We'll stop the execution at the breaking point we set. 67 00:03:23,310 --> 00:03:24,660 我们可以检查所有变量 We can inspect all the variable 68 00:03:24,660 --> 00:03:27,030 在再次执行下一条指令之前。 before the next instruction is executed again. 69 00:03:27,030 --> 00:03:29,253 例如,这里的功能。 For instance, here, the features. 70 00:03:30,270 --> 00:03:33,090 输入 N 并回车执行下一条指令 Typing N and enter execute the next instruction 71 00:03:33,090 --> 00:03:35,700 这将我们带回到回溯中。 which takes us back inside traceback. 72 00:03:35,700 --> 00:03:37,530 当要手动修复零是 When going to fix zero manually is to 73 00:03:37,530 --> 00:03:39,873 将所有标签填充到最长。 pad all the labels to the longest. 74 00:03:42,000 --> 00:03:45,120 另一种方法是使用合适的数据创建器 Another way is to use the data creator suitable 75 00:03:45,120 --> 00:03:46,443 用于令牌分类。 for token classification. 76 00:03:48,330 --> 00:03:50,340 如果你想学习如何使用 Python If you want to learn how to use the Python 77 00:03:50,340 --> 00:03:53,273 笔记本中的调试器,查看下面链接中的视频。 debugger in a notebook, check out the video in link below. 78 00:03:54,698 --> 00:03:57,781 (风拂过的声音) (wind swiping sound)