subtitles/zh-CN/01_the-pipeline-function.srt (400 lines of code) (raw):

1 00:00:00,069 --> 00:00:01,341 (屏幕嗖嗖声) (screen whooshes) 2 00:00:01,341 --> 00:00:02,449 (面部标志呼啸而过) (face logo whooshes) 3 00:00:02,449 --> 00:00:05,880 (屏幕嗖嗖声) (screen whooshes) 4 00:00:05,880 --> 00:00:07,080 - 本节课内容是: pipeline 函数 - The pipeline function. 5 00:00:09,540 --> 00:00:12,020 pipeline 函数是 Transformers 库中的 The pipeline function is the most high level API 6 00:00:12,020 --> 00:00:14,010 最顶层的 API of the Transformers library. 7 00:00:14,010 --> 00:00:16,050 它将所有步骤重新组合在一起 It regroups together all the steps 8 00:00:16,050 --> 00:00:18,873 从而实现从原始文本到可用预测的转换。 to go from raw texts to usable predictions. 9 00:00:20,228 --> 00:00:22,980 使用的模型是 pipeline 的核心, The model used is at the core of a pipeline, 10 00:00:22,980 --> 00:00:24,390 但 pipeline 还包括 but the pipeline also include 11 00:00:24,390 --> 00:00:26,610 所有必要的预处理, all the necessary pre-processing, 12 00:00:26,610 --> 00:00:30,240 因为模型不期望得到文本,而是数字, since the model does not expect texts, but number, 13 00:00:30,240 --> 00:00:32,040 以及一些后期处理, as well as some post-processing, 14 00:00:32,040 --> 00:00:34,533 使模型的输出可读。 to make the output of the model human-readable. 15 00:00:35,910 --> 00:00:37,593 让我们通过一个情绪分析 pipeline Let's look at a first example 16 00:00:37,593 --> 00:00:39,693 的例子来解释一下。 with the sentiment analysis pipeline. 17 00:00:40,740 --> 00:00:44,670 此 pipeline 对给定的输入执行文本分类 This pipeline performs text classification on a given input 18 00:00:44,670 --> 00:00:46,953 并确定它的情绪是正面的还是负面的。 and determines if it's positive or negative. 19 00:00:47,910 --> 00:00:51,750 在这里,它为给定文本标记正面情绪的标签, Here, it attributed the positive label on the given text, 20 00:00:51,750 --> 00:00:54,413 置信度为 95%。 with a confidence of 95%. 21 00:00:55,650 --> 00:00:58,470 您可以将多个文本传递到同一个 pipeline, You can pass multiple texts to the same pipeline, 22 00:00:58,470 --> 00:01:00,270 它们将被处理并 which will be processed and passed 23 00:01:00,270 --> 00:01:02,673 作为一个批次传递给模型。 through the model together as a batch. 24 00:01:03,570 --> 00:01:05,970 输出是包含单个结果的列表 The output is a list of individual results 25 00:01:05,970 --> 00:01:07,923 与输入文本的顺序相同。 in the same order as the input texts. 26 00:01:08,790 --> 00:01:12,270 在这里,我们为第一个文本找到了相同的标签和分数, Here we find the same label and score for the first text, 27 00:01:12,270 --> 00:01:14,443 第二个文本被判断为否定情绪 and the second text is judged negative 28 00:01:14,443 --> 00:01:17,243 置信度为 99.9%。 with a confidence of 99.9%. 29 00:01:18,720 --> 00:01:20,700 零样本分类 pipeline The zero-shot classification pipeline 30 00:01:20,700 --> 00:01:23,610 是一个更通用的文本分类 pipeline, is a more general text-classification pipeline, 31 00:01:23,610 --> 00:01:26,370 它允许您提供所需的标签。 it allows you to provide the labels you want. 32 00:01:26,370 --> 00:01:29,850 在这里,我们想根据标签对输入文本进行分类, Here we want to classify our input text along the labels, 33 00:01:29,850 --> 00:01:32,643 标签有教育、政治和商业。 education, politics, and business. 34 00:01:33,540 --> 00:01:35,580 pipeline 都能够成功识别 The pipeline successfully recognizes 35 00:01:35,580 --> 00:01:38,280 与其他标签相比,它更多地是关于教育, it's more about education than the other labels, 36 00:01:38,280 --> 00:01:40,643 置信度为 84%。 with a confidence of 84%. 37 00:01:41,670 --> 00:01:43,110 继续执行其他任务, Moving on to other tasks, 38 00:01:43,110 --> 00:01:45,030 文本生成 pipeline 将 the text generation pipeline will 39 00:01:45,030 --> 00:01:46,533 自动完成给定的提示。 auto-complete a given prompt. 40 00:01:47,460 --> 00:01:49,980 输出带有一点随机性, The output is generated with a bit of randomness, 41 00:01:49,980 --> 00:01:52,800 所以每次在给定的提示上调用生成器对象时 so it changes each time you call the generator object 42 00:01:52,800 --> 00:01:53,763 这个结果都会改变。 on a given prompt. 43 00:01:54,990 --> 00:01:57,123 到目前为止,我们已经使用了 pipeline API Up until now, we've used the the pipeline API 44 00:01:57,123 --> 00:02:00,360 结合与每个任务关联的默认模型来演示, with the default model associated to each task, 45 00:02:00,360 --> 00:02:02,880 但您可以将它与任何经过预训练的模型一起使用 but you can use it with any model that has been pretrained 46 00:02:02,880 --> 00:02:04,263 或根据此任务进行微调。 or fine-tuned on this task. 47 00:02:06,540 --> 00:02:10,350 进入模型中心 huggingface.co/models Going on the model hub, huggingface.co/models 48 00:02:10,350 --> 00:02:13,350 您可以按任务过滤可用模型。 you can filter the available models by task. 49 00:02:13,350 --> 00:02:17,190 我们之前示例中使用的默认模型是 gpt2, The default model used in our previous example was gpt2, 50 00:02:17,190 --> 00:02:19,290 但还有更多模型可供选择, but there are many more models available, 51 00:02:19,290 --> 00:02:20,523 且不仅仅是英语。 and not just in English. 52 00:02:21,450 --> 00:02:23,670 让我们回到文本生成 pipeline Let's go back to the text generation pipeline 53 00:02:23,670 --> 00:02:26,193 并用另一个模型 distilgpt2 加载它。 and load it with another model, distilgpt2. 54 00:02:27,060 --> 00:02:28,950 这是 gpt2 的轻量级版本 This is a lighter version of gpt2 55 00:02:28,950 --> 00:02:30,603 由 Hugging Face 团队创建。 created by the Hugging Face team. 56 00:02:31,740 --> 00:02:34,110 将 pipeline 应用于给定提示时, When applying the pipeline to a given prompt, 57 00:02:34,110 --> 00:02:36,360 我们可以指定几个参数 we can specify several arguments 58 00:02:36,360 --> 00:02:39,240 例如生成文本的最大长度, such as the maximum length of the generated texts, 59 00:02:39,240 --> 00:02:41,700 或者我们想要返回的句子数量, or the number of sentences we want to return, 60 00:02:41,700 --> 00:02:44,150 因为这一代有一些随机性。 since there is some randomness in the generation. 61 00:02:46,080 --> 00:02:48,750 通过猜测句子中的下一个单词来生成文本 Generating texts by guessing the next word in a sentence 62 00:02:48,750 --> 00:02:51,450 是 GPT-2 的预训练目标。 was the pretraining objective of GPT-2. 63 00:02:51,450 --> 00:02:55,140 掩码填充 pipeline 是 BERT 的预训练目标, The fill mask pipeline is the pretraining objective of BERT, 64 00:02:55,140 --> 00:02:57,363 这是猜测掩码词的值。 which is to guess the value of masked word. 65 00:02:58,260 --> 00:03:01,020 在这种情况下,我们询问两个最可能的值 In this case, we ask the two most likely values 66 00:03:01,020 --> 00:03:03,660 对于缺失的词,根据模型, for the missing words, according to the model, 67 00:03:03,660 --> 00:03:07,053 并通过数学计算推测可能的答案。 and get mathematical or computational as possible answers. 68 00:03:08,280 --> 00:03:10,170 Transformers 模型可以执行的另一项任务 Another task Transformers model can perform 69 00:03:10,170 --> 00:03:12,660 就是对句子中的每一个词进行分类 is to classify each word in the sentence 70 00:03:12,660 --> 00:03:14,970 而不是整个句子。 instead of the sentence as a whole. 71 00:03:14,970 --> 00:03:18,390 其中一个例子是命名实体识别, One example of this is Named Entity Recognition, 72 00:03:18,390 --> 00:03:20,820 这是识别实体的任务, which is the task of identifying entities, 73 00:03:20,820 --> 00:03:25,323 例如句子中的人、组织或地点。 such as persons, organizations or locations in a sentence. 74 00:03:26,400 --> 00:03:30,570 在这里,模型正确地找到了人 Sylvain, Here, the model correctly finds the person, Sylvain, 75 00:03:30,570 --> 00:03:32,453 组织,是 Hugging Face the organization, Hugging Face, 76 00:03:32,453 --> 00:03:35,010 以及位置,布鲁克林, as well as the location, Brooklyn, 77 00:03:35,010 --> 00:03:36,303 在输入文本中。 inside the input text. 78 00:03:37,661 --> 00:03:40,230 使用的 grouped_entities=True 参数 The grouped_entities=True argument used 79 00:03:40,230 --> 00:03:42,330 就是把 pipeline 组合在一起 is to make the pipeline group together 80 00:03:42,330 --> 00:03:44,790 链接到同一实体的不同词, the different words linked to the same entity, 81 00:03:44,790 --> 00:03:46,353 比如这里的 Hugging 和 Face。 such as Hugging and Face here. 82 00:03:48,270 --> 00:03:50,670 pipeline API 可用的另一个任务 Another task available with the pipeline API 83 00:03:50,670 --> 00:03:52,920 是抽取式问答。 is extractive question answering. 84 00:03:52,920 --> 00:03:55,380 提供上下文和问题, Providing a context and a question, 85 00:03:55,380 --> 00:03:58,290 该模型将识别上下文中的文本范围 the model will identify the span of text in the context 86 00:03:58,290 --> 00:04:00,190 包含问题的答案。 containing the answer to the question. 87 00:04:01,650 --> 00:04:03,960 获取长文的简短摘要 Getting short summaries of very long articles 88 00:04:03,960 --> 00:04:06,540 也是 Transformers 库可以提供的功能, is also something the Transformers library can help with, 89 00:04:06,540 --> 00:04:08,140 这是摘要 pipeline 提供的功能。 with the summarization pipeline. 90 00:04:09,480 --> 00:04:12,570 pipeline API 支持的最后一个任务 Finally, the last task supported by the pipeline API 91 00:04:12,570 --> 00:04:14,130 是翻译。 is translation. 92 00:04:14,130 --> 00:04:16,170 这里我们使用在模型中心 (Model Hub) 提供的 Here we use a French/English model 93 00:04:16,170 --> 00:04:17,460 法语/英语模型 found on the model hub 94 00:04:17,460 --> 00:04:19,893 获取我们输入文本的英文版本。 to get the English version of our input text. 95 00:04:21,600 --> 00:04:23,490 这是所有任务的简要总结 Here is a brief summary of all the tasks 96 00:04:23,490 --> 00:04:25,500 我们在这段视频中进行了调查。 we've looked into in this video. 97 00:04:25,500 --> 00:04:27,390 然后通过模型中心中提供的 Try then out through the inference widgets 98 00:04:27,390 --> 00:04:28,327 推理小部件进行尝试。 in the model hub. 99 00:04:30,459 --> 00:04:33,475 (屏幕嗖嗖声) (screen whooshes) 100 00:04:33,475 --> 00:04:35,175 (徽标嗖嗖声) (logo whooshes)