subtitles/zh-CN/32_managing-a-repo-on-the-model-hub.srt (680 lines of code) (raw):
1
00:00:04,200 --> 00:00:06,210
- [Instructor] 在这段视频中,我们将了解如何
- [Instructor] In this video, we're going to understand how
2
00:00:06,210 --> 00:00:08,280
管理模型仓库
to manage a model repository
3
00:00:08,280 --> 00:00:10,053
在 Hugging Face Hub 模型中心。
on the Hugging Face Hub Model Hub.
4
00:00:10,920 --> 00:00:13,020
为了处理仓库
In order to handle a repository
5
00:00:13,020 --> 00:00:15,450
你应该首先拥有一个 Hugging Face 帐户。
you should first have a Hugging Face account.
6
00:00:15,450 --> 00:00:17,610
在描述中有创建新帐户
A link to create a new account is available
7
00:00:17,610 --> 00:00:18,573
的链接。
in the description.
8
00:00:20,130 --> 00:00:22,980
登录后,你可以创建一个新的仓库
Once you are logged in, you can create a new repository
9
00:00:22,980 --> 00:00:25,890
通过单击 New Model 选项。
by clicking on the New Model option.
10
00:00:25,890 --> 00:00:29,400
你会看到类似下面的模型。
You should be facing a similar model to the following.
11
00:00:29,400 --> 00:00:33,240
在 Owner 输入框中,你可以放置自己的命名空间
In the owner input, you can put either your own namespace
12
00:00:33,240 --> 00:00:35,703
或你组织的任何命名空间。
or any of your organization's namespaces.
13
00:00:36,660 --> 00:00:39,330
Model name 是模型标识符
The Model name is the model identifier
14
00:00:39,330 --> 00:00:40,320
它将被用于
that will then be used
15
00:00:40,320 --> 00:00:43,143
在所选命名空间上识别你的模型。
to identify your model on the chosen namespace.
16
00:00:44,130 --> 00:00:47,700
最后可以在 Public(公共) 和 Private(私有) 之间选择。
The final choice is between public and private.
17
00:00:47,700 --> 00:00:49,950
任何人都可以访问公共模型。
Public models are accessible by anyone.
18
00:00:49,950 --> 00:00:51,840
这是推荐的免费选项,
This is the recommended free option,
19
00:00:51,840 --> 00:00:54,960
因为这使你的模型易于访问和共享。
as this makes your model easily accessible and shareable.
20
00:00:54,960 --> 00:00:57,630
你的命名空间的所有者
The owners of your namespace are the only ones
21
00:00:57,630 --> 00:00:59,523
是唯一可以更新和更改你的模型。
who can update and change your model.
22
00:01:00,450 --> 00:01:03,660
一个更高级的选项是私有选项。
A more advanced option is the private option.
23
00:01:03,660 --> 00:01:04,560
在这种情况下,
In this case,
24
00:01:04,560 --> 00:01:06,000
只有你的命名空间的所有者
only the owners of your namespace
25
00:01:06,000 --> 00:01:08,280
对你的模型有可见性。
will have visibility over your model.
26
00:01:08,280 --> 00:01:10,260
其他用户不会知道它的存在
Other users won't know it exists
27
00:01:10,260 --> 00:01:11,810
并且将无法使用它。
and will not be able to use it.
28
00:01:15,030 --> 00:01:17,030
让我们创建一个虚拟模型来试试看。
Let's create a dummy model to play with.
29
00:01:18,180 --> 00:01:19,710
创建模型后,
Once your model is created,
30
00:01:19,710 --> 00:01:22,230
来自该模型的管理。
comes the management of that model.
31
00:01:22,230 --> 00:01:24,360
你可以使用三个选项卡。
Three tabs are available to you.
32
00:01:24,360 --> 00:01:27,960
你面对的是第一个,这是 Model card 页面。
You're facing the first one, which is the Model card page.
33
00:01:27,960 --> 00:01:29,970
这是你用来向全世界展示模型
This is the page you use to showcase your model
34
00:01:29,970 --> 00:01:31,110
的页面。
to the world.
35
00:01:31,110 --> 00:01:33,260
我们稍后会看到它是如何完成的。
We'll see how it can be completed in a bit.
36
00:01:34,500 --> 00:01:37,503
第二个是 Files and Versions 选项卡。
The second one is the Files and Versions tab.
37
00:01:38,340 --> 00:01:40,920
你的模型本身就是一个 Git 仓库。
Your model itself is a Git repository.
38
00:01:40,920 --> 00:01:43,230
如果你不知道什么是 Git 仓库,
If you're unaware of what is a Git repository,
39
00:01:43,230 --> 00:01:46,320
你可以将其视为包含文件的文件夹。
you can think of it as a folder containing files.
40
00:01:46,320 --> 00:01:48,120
如果你以前从未使用过 Git,
If you have never used Git before,
41
00:01:48,120 --> 00:01:50,100
我们建议观看视频描述中
we recommend looking at an introduction
42
00:01:50,100 --> 00:01:52,600
提供的介绍内容。
like the one provided in this video's description.
43
00:01:53,850 --> 00:01:56,910
Git 仓库支持按照时间推移
The Git repository allows you to see the changes happening
44
00:01:56,910 --> 00:02:00,900
查看本文件夹中的变化,也就是版本。
over time in this folder, hence the term versions.
45
00:02:00,900 --> 00:02:03,453
我们稍后会看到如何添加文件和版本。
We'll see how to add files and versions in a bit.
46
00:02:07,020 --> 00:02:09,570
最后一个选项卡是 Settings 选项卡,
The final tab is the settings tab,
47
00:02:09,570 --> 00:02:12,120
可以管理模型的可见性
which allows you to manage your model's visibility
48
00:02:12,120 --> 00:02:13,203
和可用性。
and availability.
49
00:02:14,790 --> 00:02:17,673
让我们首先从将文件添加到仓库开始。
Let's first start by adding files to the repository.
50
00:02:18,540 --> 00:02:19,560
还好有 add file 按钮
Files can be added
51
00:02:19,560 --> 00:02:23,340
通过网页操作即可添加文件。
through the web interface thanks to the add file button.
52
00:02:23,340 --> 00:02:27,060
添加的文件可以是任何类型,python,JSON,纯文本,
The added files can be of any type, python, JSON, text,
53
00:02:27,060 --> 00:02:27,893
任君选择。
you name it.
54
00:02:28,740 --> 00:02:31,170
除了你添加的文件及其内容,
Alongside your added file and its content,
55
00:02:31,170 --> 00:02:33,363
你还应该命名你的 change 或 commit。
you should name your change or commit.
56
00:02:36,330 --> 00:02:38,400
通常,使用 Hugging Face Hub Python 库
Generally, adding files is simpler
57
00:02:38,400 --> 00:02:40,770
或使用命令行添加文件
by using the Hugging Face Hub Python library
58
00:02:40,770 --> 00:02:43,050
比较简单。
or by using the command-line.
59
00:02:43,050 --> 00:02:44,310
我们将展示如何使用
We'll showcase how to do this
60
00:02:44,310 --> 00:02:46,290
Hugging Face Hub Python 库做到这一点
using the Hugging Face Hub Python library,
61
00:02:46,290 --> 00:02:48,060
并且在描述中有一个链接
and there is a link in the description
62
00:02:48,060 --> 00:02:49,800
可以指向这个视频的前一个版本,
to the previous version of this video,
63
00:02:49,800 --> 00:02:52,743
展示如何使用 Git 和命令行执行此操作。
showcasing how to do this using Git and the command-line.
64
00:02:53,610 --> 00:02:54,840
首先,确保你已登录
First, make sure you're logged
65
00:02:54,840 --> 00:02:56,460
进入你的 Hugging Face 帐户,
into your Hugging Face account,
66
00:02:56,460 --> 00:02:59,523
可以通过命令行或者 Python 运行时中操作。
either through the command-line or in a Python runtime.
67
00:03:04,634 --> 00:03:06,390
我们要看的第一种方法
The first approach we'll take a look at
68
00:03:06,390 --> 00:03:08,880
正在使用 upload_file 方法。
is using the upload file method.
69
00:03:08,880 --> 00:03:10,770
这提供了一个极其简单的 API
This offers an extremely simple API
70
00:03:10,770 --> 00:03:12,630
通过 hub 上传文件。
to upload files through the hub.
71
00:03:12,630 --> 00:03:14,190
其中三个必需的参数
The three required parameters
72
00:03:14,190 --> 00:03:16,083
是文件的当前位置,
are the current location of the file,
73
00:03:18,570 --> 00:03:21,300
该文件在仓库中的路径,
the path of that file in the repository,
74
00:03:21,300 --> 00:03:24,050
以及你要推送到的仓库的标识符。
and the id of the repository to which you're pushing.
75
00:03:25,650 --> 00:03:27,930
还有一些额外的参数。
There are a few additional parameters.
76
00:03:27,930 --> 00:03:29,100
token 参数,
The token parameter,
77
00:03:29,100 --> 00:03:31,200
如果你想指定一个和登录时
if you would like to specify a different token
78
00:03:31,200 --> 00:03:33,650
所保存的不同的 token,
than the one saved in your cache with your login,
79
00:03:34,830 --> 00:03:36,750
repo_type 参数,
the repo type parameter,
80
00:03:36,750 --> 00:03:40,503
如果你想推送到 dataset 或 space。
if you would like to push to a dataset or a space.
81
00:03:42,300 --> 00:03:45,690
我们将使用这种方法上传一个名为 readme.md 的文件
We'll upload a file called readme.md to the repository
82
00:03:45,690 --> 00:03:47,190
到仓库。
using this method.
83
00:03:47,190 --> 00:03:49,710
我们首先用那个名字保存一个文件,
We first start by saving a file with that name,
84
00:03:49,710 --> 00:03:51,210
其中包含一些关于
which contains some information
85
00:03:51,210 --> 00:03:52,920
仓库本身的信息。
about the repository itself.
86
00:03:52,920 --> 00:03:54,243
在这里,一个标题。
Here, a title.
87
00:03:55,950 --> 00:03:57,420
现在文件已保存,
Now that the file is saved,
88
00:03:57,420 --> 00:04:00,513
让我们使用 upload_file 方法将其上传到 hub。
let's use the upload file method to upload it to the hub.
89
00:04:01,560 --> 00:04:03,540
如果我们切换到 Web 界面一秒钟
If we switch to the web interface for a second
90
00:04:03,540 --> 00:04:07,080
并刷新页面,我们会看到显示了 README。
and refresh the page, we'll see that the README is shown.
91
00:04:07,080 --> 00:04:08,883
文件上传成功。
The file upload was a success.
92
00:04:10,170 --> 00:04:13,500
除了这个方法之外还有一个 delete_file 方法
Alongside this method exists a delete file method
93
00:04:13,500 --> 00:04:16,170
这样你就可以完全管理你的仓库。
so that you may manage your repository fully.
94
00:04:16,170 --> 00:04:18,820
我们将使用它来删除我们刚刚创建的文件。
We'll use it to delete the file we have just created.
95
00:04:22,860 --> 00:04:25,320
如果我们再次刷新页面,很好,
If we refresh the page once again, good,
96
00:04:25,320 --> 00:04:26,973
该文件确实被删除了。
the file was indeed deleted.
97
00:04:29,070 --> 00:04:32,730
这两种方法操作起来非常简单。
This approach using only these two methods is super simple.
98
00:04:32,730 --> 00:04:35,400
它不需要安装 Git 或 Git LFS,
It doesn't need Git or Git LFS installed,
99
00:04:35,400 --> 00:04:37,650
但它确实有一个限制。
but it does come with a limitation.
100
00:04:37,650 --> 00:04:39,600
一个人可以上传的最大文件大小
The maximum file size one can upload
101
00:04:39,600 --> 00:04:41,313
限制为 5 GB。
is limited to five gigabytes.
102
00:04:42,360 --> 00:04:43,890
为了克服这个限制,
To overcome this limit,
103
00:04:43,890 --> 00:04:45,540
我们来看看第二种方法
let's take a look at the second method
104
00:04:45,540 --> 00:04:47,643
这是仓库实用程序。
which is the repository utility.
105
00:04:48,600 --> 00:04:51,840
该类封装了 Git 和 Git LFS 方法,
This class is a wrapper over Git and Git LFS methods,
106
00:04:51,840 --> 00:04:53,850
它抽象了大部分的复杂性
which abstracts most of the complexity
107
00:04:53,850 --> 00:04:55,500
并提供灵活的 API
and offers a flexible API
108
00:04:55,500 --> 00:04:57,990
管理你的在线仓库。
to manage your online repositories.
109
00:04:57,990 --> 00:04:59,690
让我们来看看它是如何工作的。
Let's take a look at how it works.
110
00:05:03,870 --> 00:05:08,369
我们首先从实例化仓库实用程序开始。
We first start by instantiating the repository utility.
111
00:05:08,369 --> 00:05:10,380
为了克隆我们刚刚创建的仓库
We provide the clone from parameter,
112
00:05:10,380 --> 00:05:13,383
我们可以通过传递参数进行克隆。
in order to clone the repository we just created.
113
00:05:14,400 --> 00:05:18,750
仓库现已克隆到本地文件夹中。
The repository is now cloned in the local folder.
114
00:05:18,750 --> 00:05:22,200
我们刚刚初始化的 repo 对象
The repo object that we have just initialized
115
00:05:22,200 --> 00:05:24,873
提供了很多对我们有用的方法。
offers quite a few methods which are useful for us.
116
00:05:25,920 --> 00:05:28,800
我们有兴趣将模型推送到 hub。
We're interested in pushing a model to the hub.
117
00:05:28,800 --> 00:05:31,170
我将从加载模型和分词器开始
I'll start by loading a model and tokenizer
118
00:05:31,170 --> 00:05:32,643
这是几个小时前训练过的。
I trained a few hours ago.
119
00:05:34,380 --> 00:05:36,810
我们现在将遵循传统的 Git 方法
We'll now follow the traditional Git approach
120
00:05:36,810 --> 00:05:38,670
首先 pull 最新的更改内容
by first pulling latest changes
121
00:05:38,670 --> 00:05:40,053
使用 git_pull 方法。
using the git_pull method.
122
00:05:40,980 --> 00:05:43,170
我们刚刚克隆了仓库,
We just cloned the repository,
123
00:05:43,170 --> 00:05:45,780
所以除非这是一个超级活跃的仓库,
so unless this is a super active repository,
124
00:05:45,780 --> 00:05:48,660
否则不太可能内容的变化。
it's unlikely that new changes are available.
125
00:05:48,660 --> 00:05:51,000
但在做任何新的事情之前养成 pull 最新内容
But it's always a good idea to pull the latest changes
126
00:05:51,000 --> 00:05:52,300
的好习惯也是不错的。
before doing anything new.
127
00:05:53,220 --> 00:05:55,200
现在我们已经 pull 了仓库,
Now that we have pulled the repository,
128
00:05:55,200 --> 00:05:58,500
我会将模型和分词器保存在该文件夹中。
I'll save the model and tokenizer inside that folder.
129
00:05:58,500 --> 00:06:01,200
这包括模型权重、配置文件、
This includes the model weights, configuration file,
130
00:06:01,200 --> 00:06:02,673
和分词器文件。
and tokenizer files.
131
00:06:04,440 --> 00:06:05,820
现在模型已保存,
Now that the model is saved,
132
00:06:05,820 --> 00:06:07,890
我们将继续使用传统的 Git 方法
we'll continue with the traditional Git approach
133
00:06:07,890 --> 00:06:10,620
并将其推送到远程仓库。
and push it to the remote repository.
134
00:06:10,620 --> 00:06:12,150
如果我们使用命令行,
If we were using the command-line,
135
00:06:12,150 --> 00:06:14,250
我们将不得不调用一些
there are a few Git LFS specific commands
136
00:06:14,250 --> 00:06:15,600
特定的 Git LFS 命令。
we would have to invoke.
137
00:06:15,600 --> 00:06:17,940
但是在这里,huggingface_hub 包
But here, the Hugging Face hub package
138
00:06:17,940 --> 00:06:20,070
会处理所有这些。
takes care of all of that.
139
00:06:20,070 --> 00:06:24,420
我们将从使用 git_add 方法暂存文件开始。
We'll start by staging the files using the git_add method.
140
00:06:24,420 --> 00:06:27,600
然后我们将使用 git_commit 方法提交这些更改,
We'll then commit these changes using Git commit method,
141
00:06:27,600 --> 00:06:30,690
并提供有用的 commit 信息。
and providing a helpful commit message.
142
00:06:30,690 --> 00:06:33,210
最后,我们将更改推送到远端,
Finally, we'll push the changes to the remote,
143
00:06:33,210 --> 00:06:34,953
使用 git_push 方法。
using the Git push method.
144
00:06:45,090 --> 00:06:47,430
如果我们回到 Files and Versions 选项卡,
If we go back to the Files and Versions tab,
145
00:06:47,430 --> 00:06:49,950
我们现在可以看到新提交的文件。
we can now see the newly committed files.
146
00:06:49,950 --> 00:06:52,600
我们甚至可以在 inference API 中使用模型。
We can even play with the model in the inference API.
147
00:06:53,790 --> 00:06:55,770
不幸的是,我们模型的首页
Unfortunately, the front page of our model
148
00:06:55,770 --> 00:06:57,540
还是显得非常空。
is still very empty.
149
00:06:57,540 --> 00:06:59,280
让我们添加一个 README markdown 文件
Let's add a README markdown file
150
00:06:59,280 --> 00:07:00,753
让整体显得完整一点点。
to complete it a little bit.
151
00:07:01,710 --> 00:07:04,200
这个 README 被称为 Model card
This README is known as the Model card
152
00:07:04,200 --> 00:07:06,030
可以说它同样重要
and it's arguably as important
153
00:07:06,030 --> 00:07:09,330
作为模型仓库中的模型和分词器文件。
as the model and tokenizer files in the model repository.
154
00:07:09,330 --> 00:07:11,280
这是你的模型的综合定义
It is the central definition
155
00:07:11,280 --> 00:07:13,200
和模型文档,
and documentation of your model,
156
00:07:13,200 --> 00:07:16,440
确保社区成员的可重用性
ensuring reusability by fellow community members
157
00:07:16,440 --> 00:07:18,480
和结果的可重复性。
and reproducibility of results.
158
00:07:18,480 --> 00:07:20,760
提供一个平台,让其他成员
Providing a platform on which other members
159
00:07:20,760 --> 00:07:22,293
可以构建他们的工件。
may build their artifacts.
160
00:07:23,220 --> 00:07:25,590
为了简单起见我们只会在此处添加标题
We'll only add a title and a small description here
161
00:07:25,590 --> 00:07:27,060
和简短描述,
for simplicity's sake,
162
00:07:27,060 --> 00:07:29,370
但我们鼓励你添加相关信息
but we encourage you to add information relevant
163
00:07:29,370 --> 00:07:30,990
说明模型是如何训练的,
to how was the model trained,
164
00:07:30,990 --> 00:07:33,120
它的预期用途和限制,
it's intended use and limitations,
165
00:07:33,120 --> 00:07:36,180
以及目前一直的潜在偏差,
as well as it's identified potential biases,
166
00:07:36,180 --> 00:07:37,440
评估结果,
evaluation results,
167
00:07:37,440 --> 00:07:39,843
以及有关如何使用你的模型的代码示例。
and code samples on how to use your model.
168
00:07:41,460 --> 00:07:44,130
为 Model Hub 贡献出色的模型。
Great work contributing a model to the Model Hub.
169
00:07:44,130 --> 00:07:46,440
该模型现在可以在下游库中使用
This model can now be used in downstream libraries
170
00:07:46,440 --> 00:07:48,783
只需指定你的模型标识符。
simply by specifying your model identifier.