transformers_doc/ja/semantic_segmentation.ipynb

{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Semantic segmentation" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "hide_input": true }, "outputs": [ { "data": { "text/html": [ "<iframe width=\"560\" height=\"315\" src=\"https://www.youtube.com/embed/dKE8SIt9C-w?rel=0&controls=0&showinfo=0\" frameborder=\"0\" allowfullscreen></iframe>" ], "text/plain": [ "<IPython.core.display.HTML object>" ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#@title\n", "from IPython.display import HTML\n", "\n", "HTML('<iframe width=\"560\" height=\"315\" src=\"https://www.youtube.com/embed/dKE8SIt9C-w?rel=0&controls=0&showinfo=0\" frameborder=\"0\" allowfullscreen></iframe>')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "セマンティックセグメンテーションでは、画像の個々のピクセルにラベルまたはクラスを割り当てます。セグメンテーションにはいくつかのタイプがありますが、セマンティックセグメンテーションの場合、同じオブジェクトの一意のインスタンス間の区別は行われません。両方のオブジェクトに同じラベルが付けられます (たとえば、`car-1`と`car-2`の代わりに`car`)。セマンティックセグメンテーションの一般的な現実世界のアプリケーションには、歩行者や重要な交通情報を識別するための自動運転車のトレーニング、医療画像内の細胞と異常の識別、衛星画像からの環境変化の監視などが含まれます。\n", "\n", "このガイドでは、次の方法を説明します。\n", "\n", "1. [SceneParse150](https://huggingface.co/datasets/scene_parse_150) データセットの [SegFormer](https://huggingface.co/docs/transformers/main/en/model_doc/segformer#segformer) を微調整します。\n", "2. 微調整したモデルを推論に使用します。\n", "\n", "<Tip>\n", "\n", "このタスクと互換性のあるすべてのアーキテクチャとチェックポイントを確認するには、[タスクページ](https://huggingface.co/tasks/image-segmentation) を確認することをお勧めします。\n", "\n", "</Tip>\n", "\n", "始める前に、必要なライブラリがすべてインストールされていることを確認してください。\n", "\n", "```bash\n", "pip install -q datasets transformers evaluate\n", "```\n", "\n", "モデルをアップロードしてコミュニティと共有できるように、Hugging Face アカウントにログインすることをお勧めします。プロンプトが表示されたら、トークンを入力してログインします。" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from huggingface_hub import notebook_login\n", "\n", "notebook_login()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Load SceneParse150 dataset" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "まず、SceneParse150 データセットの小さいサブセットを 🤗 データセットライブラリから読み込みます。これにより、完全なデータセットのトレーニングにさらに時間を費やす前に、実験してすべてが機能することを確認する機会が得られます。" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from datasets import load_dataset\n", "\n", "ds = load_dataset(\"scene_parse_150\", split=\"train[:50]\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`train_test_split` メソッドを使用して、データセットの `train` 分割をトレインセットとテストセットに分割します。" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ds = ds.train_test_split(test_size=0.2)\n", "train_ds = ds[\"train\"]\n", "test_ds = ds[\"test\"]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "次に、例を見てみましょう。" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'image': <PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=512x683 at 0x7F9B0C201F90>,\n", " 'annotation': <PIL.PngImagePlugin.PngImageFile image mode=L size=512x683 at 0x7F9B0C201DD0>,\n", " 'scene_category': 368}" ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "train_ds[0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- `image`: シーンの PIL イメージ。\n", "- `annotation`: セグメンテーションマップの PIL イメージ。モデルのターゲットでもあります。\n", "- `scene_category`: \"kitchen\"や\"office\"などの画像シーンを説明するカテゴリ ID。このガイドでは、`image`と`annotation`のみが必要になります。どちらも PIL イメージです。\n", "\n", "また、ラベル ID をラベルクラスにマップする辞書を作成することもできます。これは、後でモデルを設定するときに役立ちます。ハブからマッピングをダウンロードし、`id2label` および `label2id` ディクショナリを作成します。" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import json\n", "from pathlib import Path\n", "from huggingface_hub import hf_hub_download\n", "\n", "repo_id = \"huggingface/label-files\"\n", "filename = \"ade20k-id2label.json\"\n", "id2label = json.loads(Path(hf_hub_download(repo_id, filename, repo_type=\"dataset\")).read_text())\n", "id2label = {int(k): v for k, v in id2label.items()}\n", "label2id = {v: k for k, v in id2label.items()}\n", "num_labels = len(id2label)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Preprocess" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "次のステップでは、SegFormer 画像プロセッサをロードして、モデルの画像と注釈を準備します。このデータセットのような一部のデータセットは、バックグラウンドクラスとしてゼロインデックスを使用します。ただし、実際には背景クラスは 150 個のクラスに含まれていないため、`do_reduce_labels=True`を設定してすべてのラベルから 1 つを引く必要があります。ゼロインデックスは `255` に置き換えられるため、SegFormer の損失関数によって無視されます。" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from transformers import AutoImageProcessor\n", "\n", "checkpoint = \"nvidia/mit-b0\"\n", "image_processor = AutoImageProcessor.from_pretrained(checkpoint, do_reduce_labels=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "モデルを過学習に対してより堅牢にするために、画像データセットにいくつかのデータ拡張を適用するのが一般的です。このガイドでは、[torchvision](https://pytorch.org/vision/stable/index.html) の [`ColorJitter`](https://pytorch.org/vision/stable/generated/torchvision.transforms.ColorJitter.html) 関数を使用します。 ) を使用して画像の色のプロパティをランダムに変更しますが、任意の画像ライブラリを使用することもできます。" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from torchvision.transforms import ColorJitter\n", "\n", "jitter = ColorJitter(brightness=0.25, contrast=0.25, saturation=0.25, hue=0.1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "次に、モデルの画像と注釈を準備するための 2 つの前処理関数を作成します。これらの関数は、画像を`pixel_values`に変換し、注釈を`labels`に変換します。トレーニングセットの場合、画像を画像プロセッサに提供する前に `jitter` が適用されます。テストセットの場合、テスト中にデータ拡張が適用されないため、画像プロセッサは`images`を切り取って正規化し、`ラベル`のみを切り取ります。" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def train_transforms(example_batch):\n", " images = [jitter(x) for x in example_batch[\"image\"]]\n", " labels = [x for x in example_batch[\"annotation\"]]\n", " inputs = image_processor(images, labels)\n", " return inputs\n", "\n", "\n", "def val_transforms(example_batch):\n", " images = [x for x in example_batch[\"image\"]]\n", " labels = [x for x in example_batch[\"annotation\"]]\n", " inputs = image_processor(images, labels)\n", " return inputs" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "データセット全体に`jitter`を適用するには、🤗 Datasets `set_transform` 関数を使用します。変換はオンザフライで適用されるため、高速で消費するディスク容量が少なくなります。" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "train_ds.set_transform(train_transforms)\n", "test_ds.set_transform(val_transforms)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "モデルを過学習に対してより堅牢にするために、画像データセットにいくつかのデータ拡張を適用するのが一般的です。\n", "このガイドでは、[`tf.image`](https://www.tensorflow.org/api_docs/python/tf/image) を使用して画像の色のプロパティをランダムに変更しますが、任意のプロパティを使用することもできます。画像\n", "好きな図書館。\n", "2 つの別々の変換関数を定義します。\n", "- 画像拡張を含むトレーニングデータ変換\n", "- 🤗 Transformers のコンピュータービジョンモデルはチャネル優先のレイアウトを想定しているため、画像を転置するだけの検証データ変換" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import tensorflow as tf\n", "\n", "\n", "def aug_transforms(image):\n", " image = tf.keras.utils.img_to_array(image)\n", " image = tf.image.random_brightness(image, 0.25)\n", " image = tf.image.random_contrast(image, 0.5, 2.0)\n", " image = tf.image.random_saturation(image, 0.75, 1.25)\n", " image = tf.image.random_hue(image, 0.1)\n", " image = tf.transpose(image, (2, 0, 1))\n", " return image\n", "\n", "\n", "def transforms(image):\n", " image = tf.keras.utils.img_to_array(image)\n", " image = tf.transpose(image, (2, 0, 1))\n", " return image" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "次に、モデルの画像と注釈のバッチを準備する 2 つの前処理関数を作成します。これらの機能が適用されます\n", "画像変換を行い、以前にロードされた `image_processor` を使用して画像を `pixel_values` に変換し、\n", "`labels`への注釈。 `ImageProcessor` は、画像のサイズ変更と正規化も処理します。" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def train_transforms(example_batch):\n", " images = [aug_transforms(x.convert(\"RGB\")) for x in example_batch[\"image\"]]\n", " labels = [x for x in example_batch[\"annotation\"]]\n", " inputs = image_processor(images, labels)\n", " return inputs\n", "\n", "\n", "def val_transforms(example_batch):\n", " images = [transforms(x.convert(\"RGB\")) for x in example_batch[\"image\"]]\n", " labels = [x for x in example_batch[\"annotation\"]]\n", " inputs = image_processor(images, labels)\n", " return inputs" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "データセット全体に前処理変換を適用するには、🤗 Datasets `set_transform` 関数を使用します。\n", "変換はオンザフライで適用されるため、高速で消費するディスク容量が少なくなります。" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "train_ds.set_transform(train_transforms)\n", "test_ds.set_transform(val_transforms)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Evaluate" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "トレーニング中にメトリクスを含めると、多くの場合、モデルのパフォーマンスを評価するのに役立ちます。 🤗 [Evaluate](https://huggingface.co/docs/evaluate/index) ライブラリを使用して、評価メソッドをすばやくロードできます。このタスクでは、[Mean Intersection over Union](https://huggingface.co/spaces/evaluate-metric/accuracy) (IoU) メトリックをロードします (🤗 Evaluate [クイックツアー](https://huggingface.co/docs/evaluate/a_quick_tour) を参照して、メトリクスをロードして計算する方法の詳細を確認してください)。" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import evaluate\n", "\n", "metric = evaluate.load(\"mean_iou\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "次に、メトリクスを `compute` する関数を作成します。予測を次のように変換する必要があります\n", "最初にロジットを作成し、次に `compute` を呼び出す前にラベルのサイズに一致するように再形成します。" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import torch\n", "from torch import nn\n", "\n", "def compute_metrics(eval_pred):\n", " with torch.no_grad():\n", " logits, labels = eval_pred\n", " logits_tensor = torch.from_numpy(logits)\n", " logits_tensor = nn.functional.interpolate(\n", " logits_tensor,\n", " size=labels.shape[-2:],\n", " mode=\"bilinear\",\n", " align_corners=False,\n", " ).argmax(dim=1)\n", "\n", " pred_labels = logits_tensor.detach().cpu().numpy()\n", " metrics = metric.compute(\n", " predictions=pred_labels,\n", " references=labels,\n", " num_labels=num_labels,\n", " ignore_index=255,\n", " reduce_labels=False,\n", " )\n", " for key, value in metrics.items():\n", " if type(value) is np.ndarray:\n", " metrics[key] = value.tolist()\n", " return metrics" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def compute_metrics(eval_pred):\n", " logits, labels = eval_pred\n", " logits = tf.transpose(logits, perm=[0, 2, 3, 1])\n", " logits_resized = tf.image.resize(\n", " logits,\n", " size=tf.shape(labels)[1:],\n", " method=\"bilinear\",\n", " )\n", "\n", " pred_labels = tf.argmax(logits_resized, axis=-1)\n", " metrics = metric.compute(\n", " predictions=pred_labels,\n", " references=labels,\n", " num_labels=num_labels,\n", " ignore_index=-1,\n", " reduce_labels=image_processor.do_reduce_labels,\n", " )\n", "\n", " per_category_accuracy = metrics.pop(\"per_category_accuracy\").tolist()\n", " per_category_iou = metrics.pop(\"per_category_iou\").tolist()\n", "\n", " metrics.update({f\"accuracy_{id2label[i]}\": v for i, v in enumerate(per_category_accuracy)})\n", " metrics.update({f\"iou_{id2label[i]}\": v for i, v in enumerate(per_category_iou)})\n", " return {\"val_\" + k: v for k, v in metrics.items()}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "これで`compute_metrics`関数の準備が整いました。トレーニングをセットアップするときにこの関数に戻ります。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Train" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<Tip>\n", "\n", "[Trainer](https://huggingface.co/docs/transformers/main/ja/main_classes/trainer#transformers.Trainer) を使用したモデルの微調整に慣れていない場合は、[ここ](https://huggingface.co/docs/transformers/main/ja/tasks/../training#finetune-with-trainer) の基本的なチュートリアルをご覧ください。\n", "\n", "</Tip>\n", "\n", "これでモデルのトレーニングを開始する準備が整いました。 [AutoModelForSemanticSegmentation](https://huggingface.co/docs/transformers/main/ja/model_doc/auto#transformers.AutoModelForSemanticSegmentation) を使用して SegFormer をロードし、ラベル ID とラベルクラス間のマッピングをモデルに渡します。" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from transformers import AutoModelForSemanticSegmentation, TrainingArguments, Trainer\n", "\n", "model = AutoModelForSemanticSegmentation.from_pretrained(checkpoint, id2label=id2label, label2id=label2id)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "この時点で残っている手順は次の 3 つだけです。\n", "\n", "1. [TrainingArguments](https://huggingface.co/docs/transformers/main/ja/main_classes/trainer#transformers.TrainingArguments) でトレーニングハイパーパラメータを定義します。 `image` 列が削除されるため、未使用の列を削除しないことが重要です。 `image` 列がないと、`pixel_values` を作成できません。この動作を防ぐには、`remove_unused_columns=False`を設定してください。他に必要なパラメータは、モデルの保存場所を指定する `output_dir` だけです。 `push_to_hub=True`を設定して、このモデルをハブにプッシュします (モデルをアップロードするには、Hugging Face にサインインする必要があります)。各エポックの終了時に、[Trainer](https://huggingface.co/docs/transformers/main/ja/main_classes/trainer#transformers.Trainer) は IoU メトリックを評価し、トレーニングチェックポイントを保存します。\n", "2. トレーニング引数を、モデル、データセット、トークナイザー、データ照合器、および `compute_metrics` 関数とともに [Trainer](https://huggingface.co/docs/transformers/main/ja/main_classes/trainer#transformers.Trainer) に渡します。\n", "3. [train()](https://huggingface.co/docs/transformers/main/ja/main_classes/trainer#transformers.Trainer.train) を呼び出してモデルを微調整します。" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "training_args = TrainingArguments(\n", " output_dir=\"segformer-b0-scene-parse-150\",\n", " learning_rate=6e-5,\n", " num_train_epochs=50,\n", " per_device_train_batch_size=2,\n", " per_device_eval_batch_size=2,\n", " save_total_limit=3,\n", " eval_strategy=\"steps\",\n", " save_strategy=\"steps\",\n", " save_steps=20,\n", " eval_steps=20,\n", " logging_steps=1,\n", " eval_accumulation_steps=5,\n", " remove_unused_columns=False,\n", " push_to_hub=True,\n", ")\n", "\n", "trainer = Trainer(\n", " model=model,\n", " args=training_args,\n", " train_dataset=train_ds,\n", " eval_dataset=test_ds,\n", " compute_metrics=compute_metrics,\n", ")\n", "\n", "trainer.train()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "トレーニングが完了したら、 [push_to_hub()](https://huggingface.co/docs/transformers/main/ja/main_classes/trainer#transformers.Trainer.push_to_hub) メソッドを使用してモデルをハブに共有し、誰もがモデルを使用できるようにします。" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "trainer.push_to_hub()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<Tip>\n", "\n", "Keras を使用したモデルの微調整に慣れていない場合は、まず [基本チュートリアル](https://huggingface.co/docs/transformers/main/ja/tasks/./training#train-a-tensorflow-model-with-keras) を確認してください。\n", "\n", "</Tip>\n", "\n", "TensorFlow でモデルを微調整するには、次の手順に従います。\n", "1. トレーニングのハイパーパラメータを定義し、オプティマイザーと学習率スケジュールを設定します。\n", "2. 事前トレーニングされたモデルをインスタンス化します。\n", "3. 🤗 データセットを `tf.data.Dataset` に変換します。\n", "4. モデルをコンパイルします。\n", "5. コールバックを追加してメトリクスを計算し、モデルを 🤗 Hub にアップロードします\n", "6. `fit()` メソッドを使用してトレーニングを実行します。\n", "\n", "まず、ハイパーパラメーター、オプティマイザー、学習率スケジュールを定義します。" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from transformers import create_optimizer\n", "\n", "batch_size = 2\n", "num_epochs = 50\n", "num_train_steps = len(train_ds) * num_epochs\n", "learning_rate = 6e-5\n", "weight_decay_rate = 0.01\n", "\n", "optimizer, lr_schedule = create_optimizer(\n", " init_lr=learning_rate,\n", " num_train_steps=num_train_steps,\n", " weight_decay_rate=weight_decay_rate,\n", " num_warmup_steps=0,\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "次に、ラベルマッピングとともに [TFAutoModelForSemanticSegmentation](https://huggingface.co/docs/transformers/main/ja/model_doc/auto#transformers.TFAutoModelForSemanticSegmentation) を使用して SegFormer をロードし、それをコンパイルします。\n", "オプティマイザ。 Transformers モデルにはすべてデフォルトのタスク関連の損失関数があるため、次の場合を除き、損失関数を指定する必要はないことに注意してください。" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from transformers import TFAutoModelForSemanticSegmentation\n", "\n", "model = TFAutoModelForSemanticSegmentation.from_pretrained(\n", " checkpoint,\n", " id2label=id2label,\n", " label2id=label2id,\n", ")\n", "model.compile(optimizer=optimizer) # No loss argument!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`to_tf_dataset` と `DefaultDataCollator` を使用して、データセットを `tf.data.Dataset` 形式に変換します。" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from transformers import DefaultDataCollator\n", "\n", "data_collator = DefaultDataCollator(return_tensors=\"tf\")\n", "\n", "tf_train_dataset = train_ds.to_tf_dataset(\n", " columns=[\"pixel_values\", \"label\"],\n", " shuffle=True,\n", " batch_size=batch_size,\n", " collate_fn=data_collator,\n", ")\n", "\n", "tf_eval_dataset = test_ds.to_tf_dataset(\n", " columns=[\"pixel_values\", \"label\"],\n", " shuffle=True,\n", " batch_size=batch_size,\n", " collate_fn=data_collator,\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "予測から精度を計算し、モデルを 🤗 ハブにプッシュするには、[Keras callbacks](https://huggingface.co/docs/transformers/main/ja/tasks/../main_classes/keras_callbacks) を使用します。\n", "`compute_metrics` 関数を [KerasMetricCallback](https://huggingface.co/docs/transformers/main/ja/main_classes/keras_callbacks#transformers.KerasMetricCallback) に渡します。\n", "そして [PushToHubCallback](https://huggingface.co/docs/transformers/main/ja/main_classes/keras_callbacks#transformers.PushToHubCallback) を使用してモデルをアップロードします。" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from transformers.keras_callbacks import KerasMetricCallback, PushToHubCallback\n", "\n", "metric_callback = KerasMetricCallback(\n", " metric_fn=compute_metrics, eval_dataset=tf_eval_dataset, batch_size=batch_size, label_cols=[\"labels\"]\n", ")\n", "\n", "push_to_hub_callback = PushToHubCallback(output_dir=\"scene_segmentation\", tokenizer=image_processor)\n", "\n", "callbacks = [metric_callback, push_to_hub_callback]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "ついに、モデルをトレーニングする準備が整いました。トレーニングおよび検証データセット、エポック数、\n", "モデルを微調整するためのコールバック:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "model.fit(\n", " tf_train_dataset,\n", " validation_data=tf_eval_dataset,\n", " callbacks=callbacks,\n", " epochs=num_epochs,\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "おめでとう！モデルを微調整し、🤗 Hub で共有しました。これで推論に使用できるようになりました。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Inference" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "モデルを微調整したので、それを推論に使用できるようになりました。\n", "\n", "推論のために画像をロードします。" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "image = ds[0][\"image\"]\n", "image" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<div class=\"flex justify-center\">\n", " <img src=\"https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/semantic-seg-image.png\" alt=\"Image of bedroom\"/>\n", "</div>\n", "\n", "\n", "推論用に微調整されたモデルを試す最も簡単な方法は、それを [pipeline()](https://huggingface.co/docs/transformers/main/ja/main_classes/pipelines#transformers.pipeline) で使用することです。モデルを使用して画像セグメンテーション用の `pipeline`をインスタンス化し、それに画像を渡します。" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[{'score': None,\n", " 'label': 'wall',\n", " 'mask': <PIL.Image.Image image mode=L size=640x427 at 0x7FD5B2062690>},\n", " {'score': None,\n", " 'label': 'sky',\n", " 'mask': <PIL.Image.Image image mode=L size=640x427 at 0x7FD5B2062A50>},\n", " {'score': None,\n", " 'label': 'floor',\n", " 'mask': <PIL.Image.Image image mode=L size=640x427 at 0x7FD5B2062B50>},\n", " {'score': None,\n", " 'label': 'ceiling',\n", " 'mask': <PIL.Image.Image image mode=L size=640x427 at 0x7FD5B2062A10>},\n", " {'score': None,\n", " 'label': 'bed ',\n", " 'mask': <PIL.Image.Image image mode=L size=640x427 at 0x7FD5B2062E90>},\n", " {'score': None,\n", " 'label': 'windowpane',\n", " 'mask': <PIL.Image.Image image mode=L size=640x427 at 0x7FD5B2062390>},\n", " {'score': None,\n", " 'label': 'cabinet',\n", " 'mask': <PIL.Image.Image image mode=L size=640x427 at 0x7FD5B2062550>},\n", " {'score': None,\n", " 'label': 'chair',\n", " 'mask': <PIL.Image.Image image mode=L size=640x427 at 0x7FD5B2062D90>},\n", " {'score': None,\n", " 'label': 'armchair',\n", " 'mask': <PIL.Image.Image image mode=L size=640x427 at 0x7FD5B2062E10>}]" ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from transformers import pipeline\n", "\n", "segmenter = pipeline(\"image-segmentation\", model=\"my_awesome_seg_model\")\n", "segmenter(image)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "必要に応じて、`pipeline`の結果を手動で複製することもできます。画像を画像プロセッサで処理し、`pixel_values` を GPU に配置します。" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "device = torch.device(\"cuda\" if torch.cuda.is_available() else \"cpu\") # use GPU if available, otherwise use a CPU\n", "encoding = image_processor(image, return_tensors=\"pt\")\n", "pixel_values = encoding.pixel_values.to(device)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "入力をモデルに渡し、`logits`を返します。" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "outputs = model(pixel_values=pixel_values)\n", "logits = outputs.logits.cpu()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "次に、ロジットを元の画像サイズに再スケールします。" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "upsampled_logits = nn.functional.interpolate(\n", " logits,\n", " size=image.size[::-1],\n", " mode=\"bilinear\",\n", " align_corners=False,\n", ")\n", "\n", "pred_seg = upsampled_logits.argmax(dim=1)[0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "画像プロセッサをロードして画像を前処理し、入力を TensorFlow テンソルとして返します。" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from transformers import AutoImageProcessor\n", "\n", "image_processor = AutoImageProcessor.from_pretrained(\"MariaK/scene_segmentation\")\n", "inputs = image_processor(image, return_tensors=\"tf\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "入力をモデルに渡し、`logits`を返します。" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from transformers import TFAutoModelForSemanticSegmentation\n", "\n", "model = TFAutoModelForSemanticSegmentation.from_pretrained(\"MariaK/scene_segmentation\")\n", "logits = model(**inputs).logits" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "次に、ロジットを元の画像サイズに再スケールし、クラス次元に argmax を適用します。" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "logits = tf.transpose(logits, [0, 2, 3, 1])\n", "\n", "upsampled_logits = tf.image.resize(\n", " logits,\n", " # We reverse the shape of `image` because `image.size` returns width and height.\n", " image.size[::-1],\n", ")\n", "\n", "pred_seg = tf.math.argmax(upsampled_logits, axis=-1)[0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "結果を視覚化するには、[データセットカラーパレット](https://github.com/tensorflow/models/blob/3f1ca33afe3c1631b733ea7e40c294273b9e406d/research/deeplab/utils/get_dataset_colormap.py#L51) を、それぞれをマップする `ade_palette()` としてロードします。クラスを RGB 値に変換します。次に、画像と予測されたセグメンテーションマップを組み合わせてプロットできます。" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n", "import numpy as np\n", "\n", "color_seg = np.zeros((pred_seg.shape[0], pred_seg.shape[1], 3), dtype=np.uint8)\n", "palette = np.array(ade_palette())\n", "for label, color in enumerate(palette):\n", " color_seg[pred_seg == label, :] = color\n", "color_seg = color_seg[..., ::-1] # convert to BGR\n", "\n", "img = np.array(image) * 0.5 + color_seg * 0.5 # plot the image with the segmentation map\n", "img = img.astype(np.uint8)\n", "\n", "plt.figure(figsize=(15, 10))\n", "plt.imshow(img)\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<div class=\"flex justify-center\">\n", " <img src=\"https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/semantic-seg-preds.png\" alt=\"Image of bedroom overlaid with segmentation map\"/>\n", "</div>" ] } ], "metadata": {}, "nbformat": 4, "nbformat_minor": 4 }

transformers_doc/ja/semantic_segmentation.ipynb (952 lines of code) (raw):