subtitles/en/75_writing-a-good-issue.srt (257 lines of code) (raw):

1 00:00:05,610 --> 00:00:08,557 - How to write a good issue on GitHub? 2 00:00:08,557 --> 00:00:10,080 GitHub is the main place 3 00:00:10,080 --> 00:00:12,000 for the Hugging Face open source libraries, 4 00:00:12,000 --> 00:00:14,010 and you should always go there to report a bug 5 00:00:14,010 --> 00:00:16,020 or ask for a new feature. 6 00:00:16,020 --> 00:00:18,660 For more general questions or to debug your own code 7 00:00:18,660 --> 00:00:21,707 use the forums, see the video linked below. 8 00:00:21,707 --> 00:00:23,677 It's very important to write good issues 9 00:00:23,677 --> 00:00:27,232 as it will help the bug you uncovered be fixed in no time. 10 00:00:27,232 --> 00:00:29,750 For this video, we have created a version of Transformers 11 00:00:29,750 --> 00:00:31,066 with a bug. 12 00:00:31,066 --> 00:00:33,783 You can install it by executing this command in a notebook, 13 00:00:33,783 --> 00:00:37,239 remove the exclamation mark to execute it in a terminal. 14 00:00:37,239 --> 00:00:41,016 In this version, the following example fails. 15 00:00:41,016 --> 00:00:42,472 The error is rather cryptic 16 00:00:42,472 --> 00:00:45,184 and does not seem to come from anything in our code, 17 00:00:45,184 --> 00:00:48,157 so it seems we have a bug to report. 18 00:00:48,157 --> 00:00:49,858 The first thing to do in this case 19 00:00:49,858 --> 00:00:52,053 is to try to find the smallest amount of code possible 20 00:00:52,053 --> 00:00:54,059 that reproduces the bug. 21 00:00:54,059 --> 00:00:56,802 In our case, inspecting the traceback, 22 00:00:56,802 --> 00:00:59,645 we see the failure happens inside the pipeline function 23 00:00:59,645 --> 00:01:03,158 when it calls AutoTokenizer.from_pretrained. 24 00:01:03,158 --> 00:01:06,609 Using the debugger, we find the values passed to that method 25 00:01:06,609 --> 00:01:08,849 and can thus create a small sample of code 26 00:01:08,849 --> 00:01:12,802 that hopefully generates the same error. 27 00:01:12,802 --> 00:01:14,726 It's very important to go though this step 28 00:01:14,726 --> 00:01:16,770 as you may realize the error was on your side 29 00:01:16,770 --> 00:01:18,360 and not a bug in the library, 30 00:01:18,360 --> 00:01:20,610 but it also will make it easier for the maintainers 31 00:01:20,610 --> 00:01:22,320 to fix your problem. 32 00:01:22,320 --> 00:01:24,030 Here we can play around a bit more with this code 33 00:01:24,030 --> 00:01:26,460 and notice the error happens for different checkpoints 34 00:01:26,460 --> 00:01:28,050 and not just this one, 35 00:01:28,050 --> 00:01:31,262 and that it disappears when we use use_fast=False 36 00:01:31,262 --> 00:01:33,240 inside our tokenizer call. 37 00:01:33,240 --> 00:01:35,190 The important part is to have something 38 00:01:35,190 --> 00:01:38,640 that does not depend on any external files or data. 39 00:01:38,640 --> 00:01:40,350 Try to replace your data by fake values 40 00:01:40,350 --> 00:01:41,450 if you can't share it. 41 00:01:42,750 --> 00:01:43,620 With all of this done, 42 00:01:43,620 --> 00:01:46,260 we are ready to start writing our issue. 43 00:01:46,260 --> 00:01:48,600 Click on the button next to Bug Report 44 00:01:48,600 --> 00:01:51,300 and you will discover that there is a template to fill. 45 00:01:51,300 --> 00:01:53,940 It will only take you a couple of minutes. 46 00:01:53,940 --> 00:01:56,460 The first thing is to properly name your issue. 47 00:01:56,460 --> 00:01:59,100 Don't pick a title that is too vague. 48 00:01:59,100 --> 00:02:02,160 Then you have to fill your environment information. 49 00:02:02,160 --> 00:02:03,330 There is a command provided 50 00:02:03,330 --> 00:02:05,700 by the Transformers library to do this. 51 00:02:05,700 --> 00:02:08,550 Just execute it in your notebook or in a terminal 52 00:02:08,550 --> 00:02:10,203 and copy paste the results. 53 00:02:11,070 --> 00:02:13,530 There are two last questions to fill manually, 54 00:02:13,530 --> 00:02:16,023 to which the answers are no and no in our case. 55 00:02:17,550 --> 00:02:20,460 Next, we need to determine who to tag. 56 00:02:20,460 --> 00:02:23,010 There is a full list of usernames in the template. 57 00:02:23,010 --> 00:02:25,043 Since our issue has to do with tokenizers, 58 00:02:25,043 --> 00:02:28,170 we pick the maintainer associated with them. 59 00:02:28,170 --> 00:02:30,210 There is no point tagging more than 3 people, 60 00:02:30,210 --> 00:02:32,010 they will redirect you to the right person 61 00:02:32,010 --> 00:02:33,110 if you made a mistake. 62 00:02:34,410 --> 00:02:36,600 Next, we have to give the information necessary 63 00:02:36,600 --> 00:02:38,220 to reproduce the bug. 64 00:02:38,220 --> 00:02:41,010 We paste our sample, and put it between two lines 65 00:02:41,010 --> 00:02:44,073 with three backticks so that it's formatted properly. 66 00:02:45,210 --> 00:02:47,010 We also paste the full traceback, 67 00:02:47,010 --> 00:02:49,740 still between two lines of three backticks. 68 00:02:49,740 --> 00:02:52,650 Lastly, we can add any additional information 69 00:02:52,650 --> 00:02:55,200 about what we tried to debug the issue at hand. 70 00:02:55,200 --> 00:02:57,030 With all of this, you should expect an answer 71 00:02:57,030 --> 00:02:59,880 to your issue pretty fast and hopefully a quick fix. 72 00:02:59,880 --> 00:03:01,770 Note that all the advise in this video 73 00:03:01,770 --> 00:03:04,203 applies for almost every open-source project.