vi/6_synthetic_datasets/notebooks/preference_dpo

{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Tạo tập dữ liệu để điều chỉnh ưu tiên (preference alignment)\n", "\n", "Notebook này sẽ hướng dẫn bạn qua quá trình tạo tập dữ liệu để điều chỉnh ưu tiên. Chúng ta sẽ sử dụng thư viện `distilabel` để tạo tập dữ liệu cho việc điều chỉnh ưu tiên.\n", "\n", "Vì vậy, hãy cùng tìm hiểu sâu hơn về một số tập dữ liệu điều chỉnh ưu tiên.\n", "\n", " <div style='background-color: lightblue; padding: 10px; border-radius: 5px; margin-bottom: 20px; color:black'>\n", " <h2 style='margin: 0;color:blue'>Bài tập: Tạo tập dữ liệu để điều chỉnh sở thích</h2>\n", " <p>Bây giờ bạn đã biết cách tạo tập dữ liệu để điều chỉnh sở thích, hãy thử tạo tập dữ liệu để điều chỉnh sở thích.</p>\n", " <p><b>Các mức độ khó</b></p>\n", " <p>🐢 Tạo tập dữ liệu để điều chỉnh sở thích</p>\n", " <p>🐕 Tạo tập dữ liệu để điều chỉnh sở thích với sự tiến hóa phản hồi (response evolution)</p>\n", " <p>🦁 Tạo tập dữ liệu để điều chỉnh sở thích với sự tiến hóa phản hồi và tổng hợp mô hình (model pooling)</p>\n", " </div>" ] }, { "cell_type": "markdown", "metadata": { "vscode": { "languageId": "plaintext" } }, "source": [ "## Cài đặt các thư viện\n", "\n", "Thay vì `transformers`, bạn cũng có thể cài đặt `vllm` hoặc `hf-inference-endpoints`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!pip install \"distilabel[hf-transformers,outlines,instructor]\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Bắt đầu tổng hợp\n", "\n", "Như chúng ta đã thấy trong notebook trước, chúng ta có thể tạo một quy trình (pipeline) distilabel để tạo tập dữ liệu sở thích. Quy trình tối thiểu đã được cung cấp. Bạn có thể tiếp tục làm việc trên quy trình này để tạo tập dữ liệu lớn cho việc điều chỉnh sở thích. Thay đổi các mô hình, nhà cung cấp mô hình và các đối số sinh (generation arguments) để xem chúng ảnh hưởng như thế nào đến chất lượng của tập dữ liệu. Thử nghiệm nhỏ, mở rộng quy mô sau.\n", "\n", "Hãy xem [thư viện các thành phần distilabel](https://distilabel.argilla.io/latest/components-gallery/) để biết thông tin về các lớp xử lý và cách sử dụng chúng.\n", "\n", "Một ví dụ về việc tải dữ liệu từ Hub thay vì các dictionary được cung cấp bên dưới.\n", "\n", "```python\n", "from datasets import load_dataset\n", "\n", "with Pipeline(...) as pipeline:\n", " ...\n", "\n", "if __name__ == \"__main__\":\n", " dataset = load_dataset(\"my-dataset\", split=\"train\")\n", " distiset = pipeline.run(dataset=dataset)\n", "```\n", "\n", "Đừng quên đẩy tập dữ liệu của bạn lên Hub sau khi chạy quy trình!" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from distilabel.llms import TransformersLLM\n", "from distilabel.pipeline import Pipeline\n", "from distilabel.steps import GroupColumns, LoadDataFromDicts\n", "from distilabel.steps.tasks import TextGeneration\n", "\n", "with Pipeline() as pipeline:\n", " data = LoadDataFromDicts(data=[{\"instruction\": \"Dữ liệu giả lập (synthetic data) là gì?\"}])\n", " llm_a = TransformersLLM(model=\"HuggingFaceTB/SmolLM2-1.7B-Instruct\")\n", " gen_a = TextGeneration(llm=llm_a)\n", " llm_b = TransformersLLM(model=\"Qwen/Qwen2.5-1.5B-Instruct\")\n", " gen_b = TextGeneration(llm=llm_b)\n", " group = GroupColumns(columns=[\"generation\"])\n", " data >> [gen_a, gen_b] >> group\n", "\n", "if __name__ == \"__main__\":\n", " distiset = pipeline.run()\n", " distiset.push_to_hub(\"huggingface-smol-course-preference-tuning-dataset\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 🌯 Kết thúc\n", "\n", "Bây giờ bạn đã biết cách tạo tập dữ liệu để điều chỉnh ưu tiên. Bạn có thể sử dụng điều này để:\n", "\n", "- Tạo tập dữ liệu để điều chỉnh ưu tiên.\n", "- Tạo tập dữ liệu đánh giá để điều chỉnh ưu tiên.\n", "\n", "Tiếp theo\n", "\n", "🏋️‍♂️ Tinh chỉnh một mô hình với điều chỉnh ưu tiên với tập dữ liệu giả lập dựa trên [chương tinh chỉnh ưu tiên](../../2_preference_alignment/README.md)" ] } ], "metadata": { "language_info": { "name": "python" } }, "nbformat": 4, "nbformat_minor": 2 }

vi/6_synthetic_datasets/notebooks/preference_dpo_dataset.ipynb (120 lines of code) (raw):