1 00:00:04,200 --> 00:00:06,210 - [Instructor] 在这段视频中,我们将了解如何 - [Instructor] In this video, we're going to understand how 2 00:00:06,210 --> 00:00:08,280 管理模型仓库 to manage a model repository 3 00:00:08,280 --> 00:00:10,053 在 Hugging Face Hub 模型中心。 on the Hugging Face Hub Model Hub. 4 00:00:10,920 --> 00:00:13,020 为了处理仓库 In order to handle a repository 5 00:00:13,020 --> 00:00:15,450 你应该首先拥有一个 Hugging Face 帐户。 you should first have a Hugging Face account. 6 00:00:15,450 --> 00:00:17,610 在描述中有创建新帐户 A link to create a new account is available 7 00:00:17,610 --> 00:00:18,573 的链接。 in the description. 8 00:00:20,130 --> 00:00:22,980 登录后,你可以创建一个新的仓库 Once you are logged in, you can create a new repository 9 00:00:22,980 --> 00:00:25,890 通过单击 New Model 选项。 by clicking on the New Model option. 10 00:00:25,890 --> 00:00:29,400 你会看到类似下面的模型。 You should be facing a similar model to the following. 11 00:00:29,400 --> 00:00:33,240 在 Owner 输入框中,你可以放置自己的命名空间 In the owner input, you can put either your own namespace 12 00:00:33,240 --> 00:00:35,703 或你组织的任何命名空间。 or any of your organization's namespaces. 13 00:00:36,660 --> 00:00:39,330 Model name 是模型标识符 The Model name is the model identifier 14 00:00:39,330 --> 00:00:40,320 它将被用于 that will then be used 15 00:00:40,320 --> 00:00:43,143 在所选命名空间上识别你的模型。 to identify your model on the chosen namespace. 16 00:00:44,130 --> 00:00:47,700 最后可以在 Public(公共) 和 Private(私有) 之间选择。 The final choice is between public and private. 17 00:00:47,700 --> 00:00:49,950 任何人都可以访问公共模型。 Public models are accessible by anyone. 18 00:00:49,950 --> 00:00:51,840 这是推荐的免费选项, This is the recommended free option, 19 00:00:51,840 --> 00:00:54,960 因为这使你的模型易于访问和共享。 as this makes your model easily accessible and shareable. 20 00:00:54,960 --> 00:00:57,630 你的命名空间的所有者 The owners of your namespace are the only ones 21 00:00:57,630 --> 00:00:59,523 是唯一可以更新和更改你的模型。 who can update and change your model. 22 00:01:00,450 --> 00:01:03,660 一个更高级的选项是私有选项。 A more advanced option is the private option. 23 00:01:03,660 --> 00:01:04,560 在这种情况下, In this case, 24 00:01:04,560 --> 00:01:06,000 只有你的命名空间的所有者 only the owners of your namespace 25 00:01:06,000 --> 00:01:08,280 对你的模型有可见性。 will have visibility over your model. 26 00:01:08,280 --> 00:01:10,260 其他用户不会知道它的存在 Other users won't know it exists 27 00:01:10,260 --> 00:01:11,810 并且将无法使用它。 and will not be able to use it. 28 00:01:15,030 --> 00:01:17,030 让我们创建一个虚拟模型来试试看。 Let's create a dummy model to play with. 29 00:01:18,180 --> 00:01:19,710 创建模型后, Once your model is created, 30 00:01:19,710 --> 00:01:22,230 来自该模型的管理。 comes the management of that model. 31 00:01:22,230 --> 00:01:24,360 你可以使用三个选项卡。 Three tabs are available to you. 32 00:01:24,360 --> 00:01:27,960 你面对的是第一个,这是 Model card 页面。 You're facing the first one, which is the Model card page. 33 00:01:27,960 --> 00:01:29,970 这是你用来向全世界展示模型 This is the page you use to showcase your model 34 00:01:29,970 --> 00:01:31,110 的页面。 to the world. 35 00:01:31,110 --> 00:01:33,260 我们稍后会看到它是如何完成的。 We'll see how it can be completed in a bit. 36 00:01:34,500 --> 00:01:37,503 第二个是 Files and Versions 选项卡。 The second one is the Files and Versions tab. 37 00:01:38,340 --> 00:01:40,920 你的模型本身就是一个 Git 仓库。 Your model itself is a Git repository. 38 00:01:40,920 --> 00:01:43,230 如果你不知道什么是 Git 仓库, If you're unaware of what is a Git repository, 39 00:01:43,230 --> 00:01:46,320 你可以将其视为包含文件的文件夹。 you can think of it as a folder containing files. 40 00:01:46,320 --> 00:01:48,120 如果你以前从未使用过 Git, If you have never used Git before, 41 00:01:48,120 --> 00:01:50,100 我们建议观看视频描述中 we recommend looking at an introduction 42 00:01:50,100 --> 00:01:52,600 提供的介绍内容。 like the one provided in this video's description. 43 00:01:53,850 --> 00:01:56,910 Git 仓库支持按照时间推移 The Git repository allows you to see the changes happening 44 00:01:56,910 --> 00:02:00,900 查看本文件夹中的变化,也就是版本。 over time in this folder, hence the term versions. 45 00:02:00,900 --> 00:02:03,453 我们稍后会看到如何添加文件和版本。 We'll see how to add files and versions in a bit. 46 00:02:07,020 --> 00:02:09,570 最后一个选项卡是 Settings 选项卡, The final tab is the settings tab, 47 00:02:09,570 --> 00:02:12,120 可以管理模型的可见性 which allows you to manage your model's visibility 48 00:02:12,120 --> 00:02:13,203 和可用性。 and availability. 49 00:02:14,790 --> 00:02:17,673 让我们首先从将文件添加到仓库开始。 Let's first start by adding files to the repository. 50 00:02:18,540 --> 00:02:19,560 还好有 add file 按钮 Files can be added 51 00:02:19,560 --> 00:02:23,340 通过网页操作即可添加文件。 through the web interface thanks to the add file button. 52 00:02:23,340 --> 00:02:27,060 添加的文件可以是任何类型,python,JSON,纯文本, The added files can be of any type, python, JSON, text, 53 00:02:27,060 --> 00:02:27,893 任君选择。 you name it. 54 00:02:28,740 --> 00:02:31,170 除了你添加的文件及其内容, Alongside your added file and its content, 55 00:02:31,170 --> 00:02:33,363 你还应该命名你的 change 或 commit。 you should name your change or commit. 56 00:02:36,330 --> 00:02:38,400 通常,使用 Hugging Face Hub Python 库 Generally, adding files is simpler 57 00:02:38,400 --> 00:02:40,770 或使用命令行添加文件 by using the Hugging Face Hub Python library 58 00:02:40,770 --> 00:02:43,050 比较简单。 or by using the command-line. 59 00:02:43,050 --> 00:02:44,310 我们将展示如何使用 We'll showcase how to do this 60 00:02:44,310 --> 00:02:46,290 Hugging Face Hub Python 库做到这一点 using the Hugging Face Hub Python library, 61 00:02:46,290 --> 00:02:48,060 并且在描述中有一个链接 and there is a link in the description 62 00:02:48,060 --> 00:02:49,800 可以指向这个视频的前一个版本, to the previous version of this video, 63 00:02:49,800 --> 00:02:52,743 展示如何使用 Git 和命令行执行此操作。 showcasing how to do this using Git and the command-line. 64 00:02:53,610 --> 00:02:54,840 首先,确保你已登录 First, make sure you're logged 65 00:02:54,840 --> 00:02:56,460 进入你的 Hugging Face 帐户, into your Hugging Face account, 66 00:02:56,460 --> 00:02:59,523 可以通过命令行或者 Python 运行时中操作。 either through the command-line or in a Python runtime. 67 00:03:04,634 --> 00:03:06,390 我们要看的第一种方法 The first approach we'll take a look at 68 00:03:06,390 --> 00:03:08,880 正在使用 upload_file 方法。 is using the upload file method. 69 00:03:08,880 --> 00:03:10,770 这提供了一个极其简单的 API This offers an extremely simple API 70 00:03:10,770 --> 00:03:12,630 通过 hub 上传文件。 to upload files through the hub. 71 00:03:12,630 --> 00:03:14,190 其中三个必需的参数 The three required parameters 72 00:03:14,190 --> 00:03:16,083 是文件的当前位置, are the current location of the file, 73 00:03:18,570 --> 00:03:21,300 该文件在仓库中的路径, the path of that file in the repository, 74 00:03:21,300 --> 00:03:24,050 以及你要推送到的仓库的标识符。 and the id of the repository to which you're pushing. 75 00:03:25,650 --> 00:03:27,930 还有一些额外的参数。 There are a few additional parameters. 76 00:03:27,930 --> 00:03:29,100 token 参数, The token parameter, 77 00:03:29,100 --> 00:03:31,200 如果你想指定一个和登录时 if you would like to specify a different token 78 00:03:31,200 --> 00:03:33,650 所保存的不同的 token, than the one saved in your cache with your login, 79 00:03:34,830 --> 00:03:36,750 repo_type 参数, the repo type parameter, 80 00:03:36,750 --> 00:03:40,503 如果你想推送到 dataset 或 space。 if you would like to push to a dataset or a space. 81 00:03:42,300 --> 00:03:45,690 我们将使用这种方法上传一个名为 readme.md 的文件 We'll upload a file called readme.md to the repository 82 00:03:45,690 --> 00:03:47,190 到仓库。 using this method. 83 00:03:47,190 --> 00:03:49,710 我们首先用那个名字保存一个文件, We first start by saving a file with that name, 84 00:03:49,710 --> 00:03:51,210 其中包含一些关于 which contains some information 85 00:03:51,210 --> 00:03:52,920 仓库本身的信息。 about the repository itself. 86 00:03:52,920 --> 00:03:54,243 在这里,一个标题。 Here, a title. 87 00:03:55,950 --> 00:03:57,420 现在文件已保存, Now that the file is saved, 88 00:03:57,420 --> 00:04:00,513 让我们使用 upload_file 方法将其上传到 hub。 let's use the upload file method to upload it to the hub. 89 00:04:01,560 --> 00:04:03,540 如果我们切换到 Web 界面一秒钟 If we switch to the web interface for a second 90 00:04:03,540 --> 00:04:07,080 并刷新页面,我们会看到显示了 README。 and refresh the page, we'll see that the README is shown. 91 00:04:07,080 --> 00:04:08,883 文件上传成功。 The file upload was a success. 92 00:04:10,170 --> 00:04:13,500 除了这个方法之外还有一个 delete_file 方法 Alongside this method exists a delete file method 93 00:04:13,500 --> 00:04:16,170 这样你就可以完全管理你的仓库。 so that you may manage your repository fully. 94 00:04:16,170 --> 00:04:18,820 我们将使用它来删除我们刚刚创建的文件。 We'll use it to delete the file we have just created. 95 00:04:22,860 --> 00:04:25,320 如果我们再次刷新页面,很好, If we refresh the page once again, good, 96 00:04:25,320 --> 00:04:26,973 该文件确实被删除了。 the file was indeed deleted. 97 00:04:29,070 --> 00:04:32,730 这两种方法操作起来非常简单。 This approach using only these two methods is super simple. 98 00:04:32,730 --> 00:04:35,400 它不需要安装 Git 或 Git LFS, It doesn't need Git or Git LFS installed, 99 00:04:35,400 --> 00:04:37,650 但它确实有一个限制。 but it does come with a limitation. 100 00:04:37,650 --> 00:04:39,600 一个人可以上传的最大文件大小 The maximum file size one can upload 101 00:04:39,600 --> 00:04:41,313 限制为 5 GB。 is limited to five gigabytes. 102 00:04:42,360 --> 00:04:43,890 为了克服这个限制, To overcome this limit, 103 00:04:43,890 --> 00:04:45,540 我们来看看第二种方法 let's take a look at the second method 104 00:04:45,540 --> 00:04:47,643 这是仓库实用程序。 which is the repository utility. 105 00:04:48,600 --> 00:04:51,840 该类封装了 Git 和 Git LFS 方法, This class is a wrapper over Git and Git LFS methods, 106 00:04:51,840 --> 00:04:53,850 它抽象了大部分的复杂性 which abstracts most of the complexity 107 00:04:53,850 --> 00:04:55,500 并提供灵活的 API and offers a flexible API 108 00:04:55,500 --> 00:04:57,990 管理你的在线仓库。 to manage your online repositories. 109 00:04:57,990 --> 00:04:59,690 让我们来看看它是如何工作的。 Let's take a look at how it works. 110 00:05:03,870 --> 00:05:08,369 我们首先从实例化仓库实用程序开始。 We first start by instantiating the repository utility. 111 00:05:08,369 --> 00:05:10,380 为了克隆我们刚刚创建的仓库 We provide the clone from parameter, 112 00:05:10,380 --> 00:05:13,383 我们可以通过传递参数进行克隆。 in order to clone the repository we just created. 113 00:05:14,400 --> 00:05:18,750 仓库现已克隆到本地文件夹中。 The repository is now cloned in the local folder. 114 00:05:18,750 --> 00:05:22,200 我们刚刚初始化的 repo 对象 The repo object that we have just initialized 115 00:05:22,200 --> 00:05:24,873 提供了很多对我们有用的方法。 offers quite a few methods which are useful for us. 116 00:05:25,920 --> 00:05:28,800 我们有兴趣将模型推送到 hub。 We're interested in pushing a model to the hub. 117 00:05:28,800 --> 00:05:31,170 我将从加载模型和分词器开始 I'll start by loading a model and tokenizer 118 00:05:31,170 --> 00:05:32,643 这是几个小时前训练过的。 I trained a few hours ago. 119 00:05:34,380 --> 00:05:36,810 我们现在将遵循传统的 Git 方法 We'll now follow the traditional Git approach 120 00:05:36,810 --> 00:05:38,670 首先 pull 最新的更改内容 by first pulling latest changes 121 00:05:38,670 --> 00:05:40,053 使用 git_pull 方法。 using the git_pull method. 122 00:05:40,980 --> 00:05:43,170 我们刚刚克隆了仓库, We just cloned the repository, 123 00:05:43,170 --> 00:05:45,780 所以除非这是一个超级活跃的仓库, so unless this is a super active repository, 124 00:05:45,780 --> 00:05:48,660 否则不太可能内容的变化。 it's unlikely that new changes are available. 125 00:05:48,660 --> 00:05:51,000 但在做任何新的事情之前养成 pull 最新内容 But it's always a good idea to pull the latest changes 126 00:05:51,000 --> 00:05:52,300 的好习惯也是不错的。 before doing anything new. 127 00:05:53,220 --> 00:05:55,200 现在我们已经 pull 了仓库, Now that we have pulled the repository, 128 00:05:55,200 --> 00:05:58,500 我会将模型和分词器保存在该文件夹中。 I'll save the model and tokenizer inside that folder. 129 00:05:58,500 --> 00:06:01,200 这包括模型权重、配置文件、 This includes the model weights, configuration file, 130 00:06:01,200 --> 00:06:02,673 和分词器文件。 and tokenizer files. 131 00:06:04,440 --> 00:06:05,820 现在模型已保存, Now that the model is saved, 132 00:06:05,820 --> 00:06:07,890 我们将继续使用传统的 Git 方法 we'll continue with the traditional Git approach 133 00:06:07,890 --> 00:06:10,620 并将其推送到远程仓库。 and push it to the remote repository. 134 00:06:10,620 --> 00:06:12,150 如果我们使用命令行, If we were using the command-line, 135 00:06:12,150 --> 00:06:14,250 我们将不得不调用一些 there are a few Git LFS specific commands 136 00:06:14,250 --> 00:06:15,600 特定的 Git LFS 命令。 we would have to invoke. 137 00:06:15,600 --> 00:06:17,940 但是在这里,huggingface_hub 包 But here, the Hugging Face hub package 138 00:06:17,940 --> 00:06:20,070 会处理所有这些。 takes care of all of that. 139 00:06:20,070 --> 00:06:24,420 我们将从使用 git_add 方法暂存文件开始。 We'll start by staging the files using the git_add method. 140 00:06:24,420 --> 00:06:27,600 然后我们将使用 git_commit 方法提交这些更改, We'll then commit these changes using Git commit method, 141 00:06:27,600 --> 00:06:30,690 并提供有用的 commit 信息。 and providing a helpful commit message. 142 00:06:30,690 --> 00:06:33,210 最后,我们将更改推送到远端, Finally, we'll push the changes to the remote, 143 00:06:33,210 --> 00:06:34,953 使用 git_push 方法。 using the Git push method. 144 00:06:45,090 --> 00:06:47,430 如果我们回到 Files and Versions 选项卡, If we go back to the Files and Versions tab, 145 00:06:47,430 --> 00:06:49,950 我们现在可以看到新提交的文件。 we can now see the newly committed files. 146 00:06:49,950 --> 00:06:52,600 我们甚至可以在 inference API 中使用模型。 We can even play with the model in the inference API. 147 00:06:53,790 --> 00:06:55,770 不幸的是,我们模型的首页 Unfortunately, the front page of our model 148 00:06:55,770 --> 00:06:57,540 还是显得非常空。 is still very empty. 149 00:06:57,540 --> 00:06:59,280 让我们添加一个 README markdown 文件 Let's add a README markdown file 150 00:06:59,280 --> 00:07:00,753 让整体显得完整一点点。 to complete it a little bit. 151 00:07:01,710 --> 00:07:04,200 这个 README 被称为 Model card This README is known as the Model card 152 00:07:04,200 --> 00:07:06,030 可以说它同样重要 and it's arguably as important 153 00:07:06,030 --> 00:07:09,330 作为模型仓库中的模型和分词器文件。 as the model and tokenizer files in the model repository. 154 00:07:09,330 --> 00:07:11,280 这是你的模型的综合定义 It is the central definition 155 00:07:11,280 --> 00:07:13,200 和模型文档, and documentation of your model, 156 00:07:13,200 --> 00:07:16,440 确保社区成员的可重用性 ensuring reusability by fellow community members 157 00:07:16,440 --> 00:07:18,480 和结果的可重复性。 and reproducibility of results. 158 00:07:18,480 --> 00:07:20,760 提供一个平台,让其他成员 Providing a platform on which other members 159 00:07:20,760 --> 00:07:22,293 可以构建他们的工件。 may build their artifacts. 160 00:07:23,220 --> 00:07:25,590 为了简单起见我们只会在此处添加标题 We'll only add a title and a small description here 161 00:07:25,590 --> 00:07:27,060 和简短描述, for simplicity's sake, 162 00:07:27,060 --> 00:07:29,370 但我们鼓励你添加相关信息 but we encourage you to add information relevant 163 00:07:29,370 --> 00:07:30,990 说明模型是如何训练的, to how was the model trained, 164 00:07:30,990 --> 00:07:33,120 它的预期用途和限制, it's intended use and limitations, 165 00:07:33,120 --> 00:07:36,180 以及目前一直的潜在偏差, as well as it's identified potential biases, 166 00:07:36,180 --> 00:07:37,440 评估结果, evaluation results, 167 00:07:37,440 --> 00:07:39,843 以及有关如何使用你的模型的代码示例。 and code samples on how to use your model. 168 00:07:41,460 --> 00:07:44,130 为 Model Hub 贡献出色的模型。 Great work contributing a model to the Model Hub. 169 00:07:44,130 --> 00:07:46,440 该模型现在可以在下游库中使用 This model can now be used in downstream libraries 170 00:07:46,440 --> 00:07:48,783 只需指定你的模型标识符。 simply by specifying your model identifier.