subtitles/en/75_writing-a-good-issue.srt (257 lines of code) (raw):
1
00:00:05,610 --> 00:00:08,557
- How to write a good issue on GitHub?
2
00:00:08,557 --> 00:00:10,080
GitHub is the main place
3
00:00:10,080 --> 00:00:12,000
for the Hugging Face
open source libraries,
4
00:00:12,000 --> 00:00:14,010
and you should always
go there to report a bug
5
00:00:14,010 --> 00:00:16,020
or ask for a new feature.
6
00:00:16,020 --> 00:00:18,660
For more general questions
or to debug your own code
7
00:00:18,660 --> 00:00:21,707
use the forums, see
the video linked below.
8
00:00:21,707 --> 00:00:23,677
It's very important to write good issues
9
00:00:23,677 --> 00:00:27,232
as it will help the bug you
uncovered be fixed in no time.
10
00:00:27,232 --> 00:00:29,750
For this video, we have created
a version of Transformers
11
00:00:29,750 --> 00:00:31,066
with a bug.
12
00:00:31,066 --> 00:00:33,783
You can install it by executing
this command in a notebook,
13
00:00:33,783 --> 00:00:37,239
remove the exclamation mark
to execute it in a terminal.
14
00:00:37,239 --> 00:00:41,016
In this version, the
following example fails.
15
00:00:41,016 --> 00:00:42,472
The error is rather cryptic
16
00:00:42,472 --> 00:00:45,184
and does not seem to come
from anything in our code,
17
00:00:45,184 --> 00:00:48,157
so it seems we have a bug to report.
18
00:00:48,157 --> 00:00:49,858
The first thing to do in this case
19
00:00:49,858 --> 00:00:52,053
is to try to find the smallest
amount of code possible
20
00:00:52,053 --> 00:00:54,059
that reproduces the bug.
21
00:00:54,059 --> 00:00:56,802
In our case, inspecting the traceback,
22
00:00:56,802 --> 00:00:59,645
we see the failure happens
inside the pipeline function
23
00:00:59,645 --> 00:01:03,158
when it calls
AutoTokenizer.from_pretrained.
24
00:01:03,158 --> 00:01:06,609
Using the debugger, we find the
values passed to that method
25
00:01:06,609 --> 00:01:08,849
and can thus create a small sample of code
26
00:01:08,849 --> 00:01:12,802
that hopefully generates the same error.
27
00:01:12,802 --> 00:01:14,726
It's very important to go though this step
28
00:01:14,726 --> 00:01:16,770
as you may realize the
error was on your side
29
00:01:16,770 --> 00:01:18,360
and not a bug in the library,
30
00:01:18,360 --> 00:01:20,610
but it also will make it
easier for the maintainers
31
00:01:20,610 --> 00:01:22,320
to fix your problem.
32
00:01:22,320 --> 00:01:24,030
Here we can play around
a bit more with this code
33
00:01:24,030 --> 00:01:26,460
and notice the error happens
for different checkpoints
34
00:01:26,460 --> 00:01:28,050
and not just this one,
35
00:01:28,050 --> 00:01:31,262
and that it disappears
when we use use_fast=False
36
00:01:31,262 --> 00:01:33,240
inside our tokenizer call.
37
00:01:33,240 --> 00:01:35,190
The important part is to have something
38
00:01:35,190 --> 00:01:38,640
that does not depend on
any external files or data.
39
00:01:38,640 --> 00:01:40,350
Try to replace your data by fake values
40
00:01:40,350 --> 00:01:41,450
if you can't share it.
41
00:01:42,750 --> 00:01:43,620
With all of this done,
42
00:01:43,620 --> 00:01:46,260
we are ready to start writing our issue.
43
00:01:46,260 --> 00:01:48,600
Click on the button next to Bug Report
44
00:01:48,600 --> 00:01:51,300
and you will discover that
there is a template to fill.
45
00:01:51,300 --> 00:01:53,940
It will only take you a couple of minutes.
46
00:01:53,940 --> 00:01:56,460
The first thing is to
properly name your issue.
47
00:01:56,460 --> 00:01:59,100
Don't pick a title that is too vague.
48
00:01:59,100 --> 00:02:02,160
Then you have to fill your
environment information.
49
00:02:02,160 --> 00:02:03,330
There is a command provided
50
00:02:03,330 --> 00:02:05,700
by the Transformers library to do this.
51
00:02:05,700 --> 00:02:08,550
Just execute it in your
notebook or in a terminal
52
00:02:08,550 --> 00:02:10,203
and copy paste the results.
53
00:02:11,070 --> 00:02:13,530
There are two last
questions to fill manually,
54
00:02:13,530 --> 00:02:16,023
to which the answers are
no and no in our case.
55
00:02:17,550 --> 00:02:20,460
Next, we need to determine who to tag.
56
00:02:20,460 --> 00:02:23,010
There is a full list of
usernames in the template.
57
00:02:23,010 --> 00:02:25,043
Since our issue has to do with tokenizers,
58
00:02:25,043 --> 00:02:28,170
we pick the maintainer
associated with them.
59
00:02:28,170 --> 00:02:30,210
There is no point tagging
more than 3 people,
60
00:02:30,210 --> 00:02:32,010
they will redirect you to the right person
61
00:02:32,010 --> 00:02:33,110
if you made a mistake.
62
00:02:34,410 --> 00:02:36,600
Next, we have to give
the information necessary
63
00:02:36,600 --> 00:02:38,220
to reproduce the bug.
64
00:02:38,220 --> 00:02:41,010
We paste our sample, and
put it between two lines
65
00:02:41,010 --> 00:02:44,073
with three backticks so that
it's formatted properly.
66
00:02:45,210 --> 00:02:47,010
We also paste the full traceback,
67
00:02:47,010 --> 00:02:49,740
still between two lines
of three backticks.
68
00:02:49,740 --> 00:02:52,650
Lastly, we can add any
additional information
69
00:02:52,650 --> 00:02:55,200
about what we tried to
debug the issue at hand.
70
00:02:55,200 --> 00:02:57,030
With all of this, you
should expect an answer
71
00:02:57,030 --> 00:02:59,880
to your issue pretty fast
and hopefully a quick fix.
72
00:02:59,880 --> 00:03:01,770
Note that all the advise in this video
73
00:03:01,770 --> 00:03:04,203
applies for almost every
open-source project.