subtitles/en/71_using-a-debugger-in-a-terminal.srt (272 lines of code) (raw):
1
00:00:00,459 --> 00:00:03,542
(wind swiping sound)
2
00:00:05,880 --> 00:00:08,910
- [Instructor] Using the
Python debugger in a terminal.
3
00:00:08,910 --> 00:00:11,580
In this video, we'll learn
how to use a Python debugger
4
00:00:11,580 --> 00:00:13,140
in a terminal.
5
00:00:13,140 --> 00:00:15,390
For this example, we're running code
6
00:00:15,390 --> 00:00:17,760
from the token classification section,
7
00:00:17,760 --> 00:00:19,950
downloading the Conll dataset
8
00:00:19,950 --> 00:00:23,340
before loading a tokenizer
to pre-process it.
9
00:00:23,340 --> 00:00:25,140
Check out the section
of the course link below
10
00:00:25,140 --> 00:00:26,223
for more information.
11
00:00:27,600 --> 00:00:28,500
Once this is done,
12
00:00:28,500 --> 00:00:30,630
we try to batch together some features
13
00:00:30,630 --> 00:00:33,180
of the training dataset by padding them
14
00:00:33,180 --> 00:00:34,330
and returning a tensor.
15
00:00:36,810 --> 00:00:39,510
If we try to execute our
scripts in a terminal
16
00:00:39,510 --> 00:00:40,413
we get an error.
17
00:00:42,630 --> 00:00:44,260
Note that we use PyTorch here
18
00:00:44,260 --> 00:00:45,600
we return tensors equal pity.
19
00:00:45,600 --> 00:00:47,753
But you would get the same
error with TensorFlow.
20
00:00:49,500 --> 00:00:51,990
As we have seen in the, 'How
to debug an error?' video,
21
00:00:51,990 --> 00:00:54,780
The raw message is at the
end and it indicates we
22
00:00:54,780 --> 00:00:58,260
should use pairing, which
we're actually trying to do.
23
00:00:58,260 --> 00:01:00,630
So this is not useful and
we need to go little deeper
24
00:01:00,630 --> 00:01:02,310
to debug the problem.
25
00:01:02,310 --> 00:01:04,830
Fortunately, you can use the
Python debugger quite easily
26
00:01:04,830 --> 00:01:09,830
in a terminal by launching
your script with Python -m PDB
27
00:01:09,930 --> 00:01:11,980
and then the name of the training script.
28
00:01:13,410 --> 00:01:15,030
When executing that comment, you are sent
29
00:01:15,030 --> 00:01:17,340
to the first instruction of your script.
30
00:01:17,340 --> 00:01:20,733
You can run just the next
instruction by typing N and enter.
31
00:01:22,530 --> 00:01:27,423
Or you can continue directly
to zero by typing C and enter.
32
00:01:29,850 --> 00:01:31,560
Once there, you go to the very bottom
33
00:01:31,560 --> 00:01:34,050
of the traceback and
you can type commands.
34
00:01:34,050 --> 00:01:36,360
The first two commands you
should learn are U and D,
35
00:01:36,360 --> 00:01:38,160
for up and down.
36
00:01:38,160 --> 00:01:41,223
This allows you to get up
and down in the traceback.
37
00:01:42,990 --> 00:01:46,623
Going up twice, we get to the
point the error was reached.
38
00:01:47,910 --> 00:01:50,190
The first command to learn is P for print.
39
00:01:50,190 --> 00:01:52,830
It allows you to print any value you want.
40
00:01:52,830 --> 00:01:56,280
For instance, here we can see
the value of return_tensors
41
00:01:56,280 --> 00:02:00,210
or batch_outputs to try to
understand what triggered zero.
42
00:02:00,210 --> 00:02:03,000
The batch outputs dictionary
is a bit hard to see
43
00:02:03,000 --> 00:02:05,520
so let's dive into smaller pieces of it.
44
00:02:05,520 --> 00:02:08,460
Inside the debugger, you can
not only print any variable
45
00:02:08,460 --> 00:02:10,740
but also evaluate any expression,
46
00:02:10,740 --> 00:02:13,713
so we can look
independently at the inputs.
47
00:02:15,060 --> 00:02:15,993
Also labels.
48
00:02:22,350 --> 00:02:24,300
Those labels are definitely weird.
49
00:02:24,300 --> 00:02:26,880
They are various size,
which we can confirm
50
00:02:26,880 --> 00:02:29,553
by printing the sites using
a release compression.
51
00:02:35,880 --> 00:02:37,800
No wonder the tokenizer
wasn't able to create
52
00:02:37,800 --> 00:02:39,270
a tensor with them.
53
00:02:39,270 --> 00:02:41,460
This is because the pad
method only takes care
54
00:02:41,460 --> 00:02:44,850
of the tokenizer outputs, the
input IDs, the attention mask
55
00:02:44,850 --> 00:02:46,560
and the token type IDs.
56
00:02:46,560 --> 00:02:48,390
So we have to pad the level ourselves
57
00:02:48,390 --> 00:02:51,300
before trying to create
a new sensor with them.
58
00:02:51,300 --> 00:02:54,030
Once you're ready to
execute the Python debugger,
59
00:02:54,030 --> 00:02:56,640
you can press Q for quit and enter.
60
00:02:56,640 --> 00:02:59,790
Another way we can access
the Python debugger,
61
00:02:59,790 --> 00:03:02,310
is to put a breaking point in our script.
62
00:03:02,310 --> 00:03:05,913
We can do this using the
PDB that set_trace method.
63
00:03:07,920 --> 00:03:09,870
As long as we import the PDB module
64
00:03:09,870 --> 00:03:11,420
at the beginning of our script.
65
00:03:12,510 --> 00:03:17,283
Saving and then relaunching
our script, with just Python.
66
00:03:19,710 --> 00:03:23,310
We'll stop the execution at
the breaking point we set.
67
00:03:23,310 --> 00:03:24,660
We can inspect all the variable
68
00:03:24,660 --> 00:03:27,030
before the next instruction
is executed again.
69
00:03:27,030 --> 00:03:29,253
For instance, here, the features.
70
00:03:30,270 --> 00:03:33,090
Typing N and enter execute
the next instruction
71
00:03:33,090 --> 00:03:35,700
which takes us back inside traceback.
72
00:03:35,700 --> 00:03:37,530
When going to fix zero manually is to
73
00:03:37,530 --> 00:03:39,873
pad all the labels to the longest.
74
00:03:42,000 --> 00:03:45,120
Another way is to use
the data creator suitable
75
00:03:45,120 --> 00:03:46,443
for token classification.
76
00:03:48,330 --> 00:03:50,340
If you want to learn how to use the Python
77
00:03:50,340 --> 00:03:53,273
debugger in a notebook, check
out the video in link below.
78
00:03:54,698 --> 00:03:57,781
(wind swiping sound)