1 00:00:04,200 --> 00:00:06,210 - [Instructor] In this video, we're going to understand how 2 00:00:06,210 --> 00:00:08,280 to manage a model repository 3 00:00:08,280 --> 00:00:10,053 on the Hugging Face Hub Model Hub. 4 00:00:10,920 --> 00:00:13,020 In order to handle a repository 5 00:00:13,020 --> 00:00:15,450 you should first have a Hugging Face account. 6 00:00:15,450 --> 00:00:17,610 A link to create a new account is available 7 00:00:17,610 --> 00:00:18,573 in the description. 8 00:00:20,130 --> 00:00:22,980 Once you are logged in, you can create a new repository 9 00:00:22,980 --> 00:00:25,890 by clicking on the new model option. 10 00:00:25,890 --> 00:00:29,400 You should be facing a similar modal to the following. 11 00:00:29,400 --> 00:00:33,240 In the owner input, you can put either your own namespace 12 00:00:33,240 --> 00:00:35,703 or any of your organization's namespaces. 13 00:00:36,660 --> 00:00:39,330 The model name is the model identifier 14 00:00:39,330 --> 00:00:40,320 that will then be used 15 00:00:40,320 --> 00:00:43,143 to identify your model on the chosen namespace. 16 00:00:44,130 --> 00:00:47,700 The final choice is between public and private. 17 00:00:47,700 --> 00:00:49,950 Public models are accessible by anyone. 18 00:00:49,950 --> 00:00:51,840 This is the recommended free option, 19 00:00:51,840 --> 00:00:54,960 as this makes your model easily accessible and shareable. 20 00:00:54,960 --> 00:00:57,630 The owners of your namespace are the only ones 21 00:00:57,630 --> 00:00:59,523 who can update and change your model. 22 00:01:00,450 --> 00:01:03,660 A more advanced option is the private option. 23 00:01:03,660 --> 00:01:04,560 In this case, 24 00:01:04,560 --> 00:01:06,000 only the owners of your namespace 25 00:01:06,000 --> 00:01:08,280 will have visibility over your model. 26 00:01:08,280 --> 00:01:10,260 Other users won't know it exists 27 00:01:10,260 --> 00:01:11,810 and will not be able to use it. 28 00:01:15,030 --> 00:01:17,030 Let's create a dummy model to play with. 29 00:01:18,180 --> 00:01:19,710 Once your model is created, 30 00:01:19,710 --> 00:01:22,230 comes the management of that model. 31 00:01:22,230 --> 00:01:24,360 Three tabs are available to you. 32 00:01:24,360 --> 00:01:27,960 You're facing the first one, which is the model card page. 33 00:01:27,960 --> 00:01:29,970 This is the page you use to showcase your model 34 00:01:29,970 --> 00:01:31,110 to the world. 35 00:01:31,110 --> 00:01:33,260 We'll see how it can be completed in a bit. 36 00:01:34,500 --> 00:01:37,503 The second one is the files and versions tab. 37 00:01:38,340 --> 00:01:40,920 Your model itself is a Git repository. 38 00:01:40,920 --> 00:01:43,230 If you're unaware of what is a Git repository, 39 00:01:43,230 --> 00:01:46,320 you can think of it as a folder containing files. 40 00:01:46,320 --> 00:01:48,120 If you have never used Git before, 41 00:01:48,120 --> 00:01:50,100 we recommend looking at an introduction 42 00:01:50,100 --> 00:01:52,600 like the one provided in this video's description. 43 00:01:53,850 --> 00:01:56,910 The Git repository allows you to see the changes happening 44 00:01:56,910 --> 00:02:00,900 over time in this folder, hence the term versions. 45 00:02:00,900 --> 00:02:03,453 We'll see how to add files and versions in a bit. 46 00:02:07,020 --> 00:02:09,570 The final tab is the settings tab, 47 00:02:09,570 --> 00:02:12,120 which allows you to manage your model's visibility 48 00:02:12,120 --> 00:02:13,203 and availability. 49 00:02:14,790 --> 00:02:17,673 Let's first start by adding files to the repository. 50 00:02:18,540 --> 00:02:19,560 Files can be added 51 00:02:19,560 --> 00:02:23,340 through the web interface thanks to the add file button. 52 00:02:23,340 --> 00:02:27,060 The added files can be of any type, python, JSON, text, 53 00:02:27,060 --> 00:02:27,893 you name it. 54 00:02:28,740 --> 00:02:31,170 Alongside your added file and its content, 55 00:02:31,170 --> 00:02:33,363 you should name your change or commit. 56 00:02:36,330 --> 00:02:38,400 Generally, adding files is simpler 57 00:02:38,400 --> 00:02:40,770 by using the Hugging Face Hub Python library 58 00:02:40,770 --> 00:02:43,050 or by using the command-line. 59 00:02:43,050 --> 00:02:44,310 We'll showcase how to do this 60 00:02:44,310 --> 00:02:46,290 using the Hugging Face Hub Python library, 61 00:02:46,290 --> 00:02:48,060 and there is a link in the description 62 00:02:48,060 --> 00:02:49,800 to the previous version of this video, 63 00:02:49,800 --> 00:02:52,743 showcasing how to do this using Git and the command-line. 64 00:02:53,610 --> 00:02:54,840 First, make sure you're logged 65 00:02:54,840 --> 00:02:56,460 into your Hugging Face account, 66 00:02:56,460 --> 00:02:59,523 either through the command-line or in a Python runtime. 67 00:03:04,634 --> 00:03:06,390 The first approach we'll take a look at 68 00:03:06,390 --> 00:03:08,880 is using the upload file method. 69 00:03:08,880 --> 00:03:10,770 This offers an extremely simple API 70 00:03:10,770 --> 00:03:12,630 to upload files through the hub. 71 00:03:12,630 --> 00:03:14,190 The three required parameters 72 00:03:14,190 --> 00:03:16,083 are the current location of the file, 73 00:03:18,570 --> 00:03:21,300 the path of that file in the repository, 74 00:03:21,300 --> 00:03:24,050 and the idea of the repository to which you're pushing. 75 00:03:25,650 --> 00:03:27,930 There are a few additional parameters. 76 00:03:27,930 --> 00:03:29,100 The token parameter, 77 00:03:29,100 --> 00:03:31,200 if you would like to specify a different token 78 00:03:31,200 --> 00:03:33,650 than the one saved in your cache with your login, 79 00:03:34,830 --> 00:03:36,750 the repo type parameter, 80 00:03:36,750 --> 00:03:40,503 if you would like to push to a data set or a space. 81 00:03:42,300 --> 00:03:45,690 We'll upload a file called readme.md to the repository 82 00:03:45,690 --> 00:03:47,190 using this method. 83 00:03:47,190 --> 00:03:49,710 We first start by saving a file with that name, 84 00:03:49,710 --> 00:03:51,210 which contains some information 85 00:03:51,210 --> 00:03:52,920 about the repository itself. 86 00:03:52,920 --> 00:03:54,243 Here, a title. 87 00:03:55,950 --> 00:03:57,420 Now that the file is saved, 88 00:03:57,420 --> 00:04:00,513 let's use the upload file method to upload it to the hub. 89 00:04:01,560 --> 00:04:03,540 If we switch to the web interface for a second 90 00:04:03,540 --> 00:04:07,080 and refresh the page, we'll see that the README is shown. 91 00:04:07,080 --> 00:04:08,883 The file upload was a success. 92 00:04:10,170 --> 00:04:13,500 Alongside this method exists a delete file method 93 00:04:13,500 --> 00:04:16,170 so that you may manage your repository fully. 94 00:04:16,170 --> 00:04:18,820 We'll use it to delete the file we have just created. 95 00:04:22,860 --> 00:04:25,320 If we refresh the page once again, good, 96 00:04:25,320 --> 00:04:26,973 the file was indeed deleted. 97 00:04:29,070 --> 00:04:32,730 This approach using only these two methods is super simple. 98 00:04:32,730 --> 00:04:35,400 It doesn't need Git or Git LFS installed, 99 00:04:35,400 --> 00:04:37,650 but it does come with a limitation. 100 00:04:37,650 --> 00:04:39,600 The maximum file size one can upload 101 00:04:39,600 --> 00:04:41,313 is limited to five gigabytes. 102 00:04:42,360 --> 00:04:43,890 To overcome this limit, 103 00:04:43,890 --> 00:04:45,540 let's take a look at the second method 104 00:04:45,540 --> 00:04:47,643 which is the repository utility. 105 00:04:48,600 --> 00:04:51,840 This class is a wrapper over Git and Git LFS methods, 106 00:04:51,840 --> 00:04:53,850 which abstracts most of the complexity 107 00:04:53,850 --> 00:04:55,500 and offers a flexible API 108 00:04:55,500 --> 00:04:57,990 to manage your online repositories. 109 00:04:57,990 --> 00:04:59,690 Let's take a look at how it works. 110 00:05:03,870 --> 00:05:08,369 We first start by instantiating the repository utility. 111 00:05:08,369 --> 00:05:10,380 We provide the clone from parameter, 112 00:05:10,380 --> 00:05:13,383 in order to clone the repository we just created. 113 00:05:14,400 --> 00:05:18,750 The repository is now cloned in the local folder. 114 00:05:18,750 --> 00:05:22,200 The repo object that we have just initialized 115 00:05:22,200 --> 00:05:24,873 offers quite a few methods which are useful for us. 116 00:05:25,920 --> 00:05:28,800 We're interested in pushing a model to the hub. 117 00:05:28,800 --> 00:05:31,170 I'll start by loading a model and tokenizer 118 00:05:31,170 --> 00:05:32,643 I trained a few hours ago. 119 00:05:34,380 --> 00:05:36,810 We'll now follow the traditional Git approach 120 00:05:36,810 --> 00:05:38,670 by first pulling latest changes 121 00:05:38,670 --> 00:05:40,053 using the Git pull method. 122 00:05:40,980 --> 00:05:43,170 We just cloned the repository, 123 00:05:43,170 --> 00:05:45,780 so unless this is a super active repository, 124 00:05:45,780 --> 00:05:48,660 it's unlikely that new changes are available. 125 00:05:48,660 --> 00:05:51,000 But it's always a good idea to pull the latest changes 126 00:05:51,000 --> 00:05:52,300 before doing anything new. 127 00:05:53,220 --> 00:05:55,200 Now that we have pulled the repository, 128 00:05:55,200 --> 00:05:58,500 I'll save the model and tokenizer inside that folder. 129 00:05:58,500 --> 00:06:01,200 This includes the model weights, configuration file, 130 00:06:01,200 --> 00:06:02,673 and tokenizer files. 131 00:06:04,440 --> 00:06:05,820 Now that the model is saved, 132 00:06:05,820 --> 00:06:07,890 we'll continue with the traditional Git approach 133 00:06:07,890 --> 00:06:10,620 and push it to the remote repository. 134 00:06:10,620 --> 00:06:12,150 If we were using the command-line, 135 00:06:12,150 --> 00:06:14,250 there are a few Git LFS specific commands 136 00:06:14,250 --> 00:06:15,600 we would have to invoke. 137 00:06:15,600 --> 00:06:17,940 But here, the Hugging Face hub package 138 00:06:17,940 --> 00:06:20,070 takes care of all of that. 139 00:06:20,070 --> 00:06:24,420 We'll start by staging the files using the Git add method. 140 00:06:24,420 --> 00:06:27,600 We'll then commit these changes using Git commit method, 141 00:06:27,600 --> 00:06:30,690 and providing a helpful commit message. 142 00:06:30,690 --> 00:06:33,210 Finally, we'll push the changes to the remote, 143 00:06:33,210 --> 00:06:34,953 using the Git push method. 144 00:06:45,090 --> 00:06:47,430 If we go back to the files and versions tab, 145 00:06:47,430 --> 00:06:49,950 we can now see the newly committed files. 146 00:06:49,950 --> 00:06:52,600 We can even play with the model in the inference API. 147 00:06:53,790 --> 00:06:55,770 Unfortunately, the front page of our model 148 00:06:55,770 --> 00:06:57,540 is still very empty. 149 00:06:57,540 --> 00:06:59,280 Let's add a README markdown file 150 00:06:59,280 --> 00:07:00,753 to complete it a little bit. 151 00:07:01,710 --> 00:07:04,200 This README is known as the model card 152 00:07:04,200 --> 00:07:06,030 and it's arguably as important 153 00:07:06,030 --> 00:07:09,330 as the model and tokenizer files in the model repository. 154 00:07:09,330 --> 00:07:11,280 It is the central definition 155 00:07:11,280 --> 00:07:13,200 and documentation of your model, 156 00:07:13,200 --> 00:07:16,440 ensuring reusability by fellow community members 157 00:07:16,440 --> 00:07:18,480 and reproducibility of results. 158 00:07:18,480 --> 00:07:20,760 Providing a platform on which other members 159 00:07:20,760 --> 00:07:22,293 may build their artifacts. 160 00:07:23,220 --> 00:07:25,590 We'll only add a title and a small description here 161 00:07:25,590 --> 00:07:27,060 for simplicity's sake, 162 00:07:27,060 --> 00:07:29,370 but we encourage you to add information relevant 163 00:07:29,370 --> 00:07:30,990 to how was the model trained, 164 00:07:30,990 --> 00:07:33,120 it's intended use and limitations, 165 00:07:33,120 --> 00:07:36,180 as well as it's identified potential biases, 166 00:07:36,180 --> 00:07:37,440 evaluation results, 167 00:07:37,440 --> 00:07:39,843 and code samples on how to use your model. 168 00:07:41,460 --> 00:07:44,130 Great work contributing a model to the Model Hub. 169 00:07:44,130 --> 00:07:46,440 This model can now be used in downstream libraries 170 00:07:46,440 --> 00:07:48,783 simply by specifying your model identifier.