diffusers/CLIP_Guided_Stable_diffusion_with_diffusers.ipynb (1,067 lines of code) (raw):

{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "view-in-github", "colab_type": "text" }, "source": [ "<a href=\"https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/CLIP_Guided_Stable_diffusion_with_diffusers.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>" ] }, { "cell_type": "markdown", "metadata": { "id": "TcdayjMYqWIS" }, "source": [ "# CLIP Guided Stable Diffusion using [d🧨ffusers](https://github.com/huggingface/diffusers)" ] }, { "cell_type": "markdown", "metadata": { "id": "j_nBMAWyqw1Q" }, "source": [ "This notebook shows how to do CLIP guidance with Stable diffusion using diffusers libray. This allows you to use newly released [CLIP models by LAION AI.](https://huggingface.co/laion).\n", "\n", "This notebook is based on the following amazing repos, all credits to the original authors!\n", "\n", "- https://github.com/Jack000/glid-3-xl\n", "- https://github.dev/crowsonkb/k-diffusion\n" ] }, { "cell_type": "markdown", "metadata": { "id": "QpddPEUbqK_0" }, "source": [ "### Initial Setup" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "id": "SF2J9AC6h8RW", "colab": { "base_uri": "https://localhost:8080/" }, "outputId": "5960a0b0-0f6e-4c39-ac9f-a8b45a3f630a" }, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "\u001b[K |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 229 kB 8.9 MB/s \n", "\u001b[K |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 4.9 MB 68.2 MB/s \n", "\u001b[K |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 53 kB 2.2 MB/s \n", "\u001b[K |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 5.3 MB 51.1 MB/s \n", "\u001b[K |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 163 kB 68.4 MB/s \n", "\u001b[K |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 6.6 MB 46.5 MB/s \n", "\u001b[K |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 55 kB 4.4 MB/s \n", "\u001b[K |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2.3 MB 56.0 MB/s \n", "\u001b[K |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 57 kB 6.2 MB/s \n", "\u001b[K |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 270 kB 48.4 MB/s \n", "\u001b[K |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 112 kB 58.6 MB/s \n", "\u001b[K |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 84 kB 4.2 MB/s \n", "\u001b[K |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 54 kB 3.8 MB/s \n", "\u001b[K |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 84 kB 2.4 MB/s \n", "\u001b[K |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 212 kB 12.2 MB/s \n", "\u001b[K |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 63 kB 2.7 MB/s \n", "\u001b[K |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 80 kB 11.6 MB/s \n", "\u001b[K |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 68 kB 8.0 MB/s \n", "\u001b[K |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 46 kB 5.1 MB/s \n", "\u001b[K |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 594 kB 68.0 MB/s \n", "\u001b[K |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 4.0 MB 52.4 MB/s \n", "\u001b[K |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 856 kB 66.2 MB/s \n", "\u001b[?25h Building wheel for ffmpy (setup.py) ... \u001b[?25l\u001b[?25hdone\n", " Building wheel for python-multipart (setup.py) ... \u001b[?25l\u001b[?25hdone\n" ] } ], "source": [ "#@title Instal dependancies\n", "!pip install -qqq diffusers==0.11.1 transformers ftfy gradio accelerate" ] }, { "cell_type": "markdown", "metadata": { "id": "YWFOIN0qiT0O" }, "source": [ "#### Authenticate with Hugging Face Hub\n", "\n", "To use private and gated models on πŸ€— Hugging Face Hub, login is required. If you are only using a public checkpoint (such as `CompVis/stable-diffusion-v1-4` in this notebook), you can skip this step." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "cellView": "form", "id": "vdkeoxWviL9o", "colab": { "base_uri": "https://localhost:8080/", "height": 299, "referenced_widgets": [ "2daa67ed0f8349e884458f323e0fe7c4", "1274282597bd42de88af452218757e0b", "0af52b759c404cc5b87331a12822f0b5", "e6b4e0d00e1f4df782745c4fa1371063", "921e3f7a3e8341479c9c8e90876f513f", "8cecd10427a944478ee1afa50ebb031f", "08576df5cdb14c479ea86d33fa31f811", "e7816a219e474125b1a52cef31ceb4c3", "d7a4538b87f74dfcbda1674ce20b5ef0", "1ff8c99ac8454e71b0ff404e5942a0e6", "65660ab2d07347efb093765743d019a9", "4c55e4c896eb43e79cf13e1ddcff8ef4", "4064fa7d7feb4e1fa704fe5585add6a6", "0cebee34b01e481497416bbde47b173a" ] }, "outputId": "2a69ea70-fb10-4818-b9bd-1225320dc861" }, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "Login successful\n", "Your token has been saved to /root/.huggingface/token\n" ] } ], "source": [ "#@title Login\n", "from huggingface_hub import notebook_login\n", "\n", "notebook_login()" ] }, { "cell_type": "markdown", "metadata": { "id": "C8Y5zTuhprVo" }, "source": [ "### CLIP Guided Stable Diffusion" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "z9Kglma6hjki", "cellView": "form" }, "outputs": [], "source": [ "#@title Load the pipeline\n", "import torch\n", "from PIL import Image\n", "\n", "from diffusers import LMSDiscreteScheduler, DiffusionPipeline, PNDMScheduler\n", "from transformers import CLIPFeatureExtractor, CLIPModel\n", "\n", "model_id = \"CompVis/stable-diffusion-v1-4\" #@param {type: \"string\"}\n", "clip_model_id = \"laion/CLIP-ViT-B-32-laion2B-s34B-b79K\" #@param [\"laion/CLIP-ViT-B-32-laion2B-s34B-b79K\", \"laion/CLIP-ViT-L-14-laion2B-s32B-b82K\", \"laion/CLIP-ViT-H-14-laion2B-s32B-b79K\", \"laion/CLIP-ViT-g-14-laion2B-s12B-b42K\", \"openai/clip-vit-base-patch32\", \"openai/clip-vit-base-patch16\", \"openai/clip-vit-large-patch14\"] {allow-input: true}\n", "scheduler = \"plms\" #@param ['plms', 'lms']\n", "\n", "\n", "def image_grid(imgs, rows, cols):\n", " assert len(imgs) == rows*cols\n", "\n", " w, h = imgs[0].size\n", " grid = Image.new('RGB', size=(cols*w, rows*h))\n", " grid_w, grid_h = grid.size\n", " \n", " for i, img in enumerate(imgs):\n", " grid.paste(img, box=(i%cols*w, i//cols*h))\n", " return grid\n", "\n", "if scheduler == \"lms\":\n", " scheduler = LMSDiscreteScheduler(beta_start=0.00085, beta_end=0.012, beta_schedule=\"scaled_linear\")\n", "else:\n", " scheduler = PNDMScheduler.from_config(model_id, subfolder=\"scheduler\")\n", "\n", "\n", "feature_extractor = CLIPFeatureExtractor.from_pretrained(clip_model_id)\n", "clip_model = CLIPModel.from_pretrained(clip_model_id, torch_dtype=torch.float16)\n", "\n", "\n", "guided_pipeline = DiffusionPipeline.from_pretrained(\n", " model_id,\n", " custom_pipeline=\"clip_guided_stable_diffusion\",\n", " custom_revision=\"main\", # TODO: remove if diffusers>=0.12.0\n", " clip_model=clip_model,\n", " feature_extractor=feature_extractor,\n", " scheduler=scheduler,\n", " torch_dtype=torch.float16,\n", ")\n", "guided_pipeline = guided_pipeline.to(\"cuda\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "4-_1Ld_eWnsX", "cellView": "form" }, "outputs": [], "source": [ "#@title Generate with Gradio Demo\n", "\n", "import gradio as gr\n", "\n", "import torch\n", "from torch import autocast\n", "from diffusers import StableDiffusionPipeline\n", "from PIL import Image \n", "\n", "\n", "last_model = \"laion/CLIP-ViT-B-32-laion2B-s34B-b79K\"\n", "def infer(prompt, clip_prompt, samples, steps, clip_scale, scale, seed, clip_model, use_cutouts, num_cutouts):\n", " global last_model\n", " print(last_model)\n", " if(last_model == clip_model):\n", " guided_pipeline = create_clip_guided_pipeline(model_id, clip_model_id)\n", " guided_pipeline = guided_pipeline.to(\"cuda\")\n", " last_model = clip_model\n", " prompt = prompt\n", " clip_prompt = clip_prompt\n", " num_samples = samples\n", " num_inference_steps = steps\n", " guidance_scale = scale\n", " clip_guidance_scale = clip_scale \n", " if(use_cutouts):\n", " use_cutouts = \"True\"\n", " else:\n", " use_cutouts = \"False\"\n", " unfreeze_unet = \"True\" \n", " unfreeze_vae = \"True\" \n", " seed = seed\n", "\n", " if unfreeze_unet == \"True\":\n", " guided_pipeline.unfreeze_unet()\n", " else:\n", " guided_pipeline.freeze_unet()\n", "\n", " if unfreeze_vae == \"True\":\n", " guided_pipeline.unfreeze_vae()\n", " else:\n", " guided_pipeline.freeze_vae()\n", "\n", " generator = torch.Generator(device=\"cuda\").manual_seed(seed)\n", "\n", " images = []\n", " for i in range(num_samples):\n", " image = guided_pipeline(\n", " prompt,\n", " clip_prompt=clip_prompt if clip_prompt.strip() != \"\" else None,\n", " num_inference_steps=num_inference_steps,\n", " guidance_scale=guidance_scale, \n", " clip_guidance_scale=clip_guidance_scale,\n", " num_cutouts=num_cutouts,\n", " use_cutouts=use_cutouts == \"True\",\n", " generator=generator,\n", " ).images[0]\n", " images.append(image)\n", "\n", " #image_grid(images, 1, num_samples)\n", " return images\n", " \n", "css = \"\"\"\n", " .gradio-container {\n", " font-family: 'IBM Plex Sans', sans-serif;\n", " }\n", " .gr-button {\n", " color: white;\n", " border-color: black;\n", " background: black;\n", " }\n", " input[type='range'] {\n", " accent-color: black;\n", " }\n", " .dark input[type='range'] {\n", " accent-color: #dfdfdf;\n", " }\n", " .container {\n", " max-width: 730px;\n", " margin: auto;\n", " padding-top: 1.5rem;\n", " }\n", " #gallery {\n", " min-height: 22rem;\n", " margin-bottom: 15px;\n", " margin-left: auto;\n", " margin-right: auto;\n", " border-bottom-right-radius: .5rem !important;\n", " border-bottom-left-radius: .5rem !important;\n", " }\n", " #gallery>div>.h-full {\n", " min-height: 20rem;\n", " }\n", " .details:hover {\n", " text-decoration: underline;\n", " }\n", " .gr-button {\n", " white-space: nowrap;\n", " }\n", " .gr-button:focus {\n", " border-color: rgb(147 197 253 / var(--tw-border-opacity));\n", " outline: none;\n", " box-shadow: var(--tw-ring-offset-shadow), var(--tw-ring-shadow), var(--tw-shadow, 0 0 #0000);\n", " --tw-border-opacity: 1;\n", " --tw-ring-offset-shadow: var(--tw-ring-inset) 0 0 0 var(--tw-ring-offset-width) var(--tw-ring-offset-color);\n", " --tw-ring-shadow: var(--tw-ring-inset) 0 0 0 calc(3px var(--tw-ring-offset-width)) var(--tw-ring-color);\n", " --tw-ring-color: rgb(191 219 254 / var(--tw-ring-opacity));\n", " --tw-ring-opacity: .5;\n", " }\n", " #advanced-btn {\n", " font-size: .7rem !important;\n", " line-height: 19px;\n", " margin-top: 12px;\n", " margin-bottom: 12px;\n", " padding: 2px 8px;\n", " border-radius: 14px !important;\n", " }\n", " #advanced-options {\n", " display: none;\n", " margin-bottom: 20px;\n", " }\n", " .footer {\n", " margin-bottom: 45px;\n", " margin-top: 35px;\n", " text-align: center;\n", " border-bottom: 1px solid #e5e5e5;\n", " }\n", " .footer>p {\n", " font-size: .8rem;\n", " display: inline-block;\n", " padding: 0 10px;\n", " transform: translateY(10px);\n", " background: white;\n", " }\n", " .dark .footer {\n", " border-color: #303030;\n", " }\n", " .dark .footer>p {\n", " background: #0b0f19;\n", " }\n", " .acknowledgments h4{\n", " margin: 1.25em 0 .25em 0;\n", " font-weight: bold;\n", " font-size: 115%;\n", " }\n", "\"\"\"\n", "\n", "block = gr.Blocks(css=css)\n", "\n", "examples = [\n", " [\n", " 'A high tech solarpunk utopia in the Amazon rainforest',\n", " 2,\n", " 45,\n", " 7.5,\n", " 1024,\n", " ],\n", " [\n", " 'A pikachu fine dining with a view to the Eiffel Tower',\n", " 2,\n", " 45,\n", " 7,\n", " 1024,\n", " ],\n", " [\n", " 'A mecha robot in a favela in expressionist style',\n", " 2,\n", " 45,\n", " 7,\n", " 1024,\n", " ],\n", " [\n", " 'an insect robot preparing a delicious meal',\n", " 2,\n", " 45,\n", " 7,\n", " 1024,\n", " ],\n", " [\n", " \"A small cabin on top of a snowy mountain in the style of Disney, artstation\",\n", " 2,\n", " 45,\n", " 7,\n", " 1024,\n", " ],\n", "]\n", "\n", "with block:\n", " gr.HTML(\n", " \"\"\"\n", " <div style=\"text-align: center; max-width: 650px; margin: 0 auto;\">\n", " <div\n", " style=\"\n", " display: inline-flex;\n", " align-items: center;\n", " gap: 0.8rem;\n", " font-size: 1.75rem;\n", " \"\n", " >\n", " <svg\n", " width=\"0.65em\"\n", " height=\"0.65em\"\n", " viewBox=\"0 0 115 115\"\n", " fill=\"none\"\n", " xmlns=\"http://www.w3.org/2000/svg\"\n", " >\n", " <rect width=\"23\" height=\"23\" fill=\"white\"></rect>\n", " <rect y=\"69\" width=\"23\" height=\"23\" fill=\"white\"></rect>\n", " <rect x=\"23\" width=\"23\" height=\"23\" fill=\"#AEAEAE\"></rect>\n", " <rect x=\"23\" y=\"69\" width=\"23\" height=\"23\" fill=\"#AEAEAE\"></rect>\n", " <rect x=\"46\" width=\"23\" height=\"23\" fill=\"white\"></rect>\n", " <rect x=\"46\" y=\"69\" width=\"23\" height=\"23\" fill=\"white\"></rect>\n", " <rect x=\"69\" width=\"23\" height=\"23\" fill=\"black\"></rect>\n", " <rect x=\"69\" y=\"69\" width=\"23\" height=\"23\" fill=\"black\"></rect>\n", " <rect x=\"92\" width=\"23\" height=\"23\" fill=\"#D9D9D9\"></rect>\n", " <rect x=\"92\" y=\"69\" width=\"23\" height=\"23\" fill=\"#AEAEAE\"></rect>\n", " <rect x=\"115\" y=\"46\" width=\"23\" height=\"23\" fill=\"white\"></rect>\n", " <rect x=\"115\" y=\"115\" width=\"23\" height=\"23\" fill=\"white\"></rect>\n", " <rect x=\"115\" y=\"69\" width=\"23\" height=\"23\" fill=\"#D9D9D9\"></rect>\n", " <rect x=\"92\" y=\"46\" width=\"23\" height=\"23\" fill=\"#AEAEAE\"></rect>\n", " <rect x=\"92\" y=\"115\" width=\"23\" height=\"23\" fill=\"#AEAEAE\"></rect>\n", " <rect x=\"92\" y=\"69\" width=\"23\" height=\"23\" fill=\"white\"></rect>\n", " <rect x=\"69\" y=\"46\" width=\"23\" height=\"23\" fill=\"white\"></rect>\n", " <rect x=\"69\" y=\"115\" width=\"23\" height=\"23\" fill=\"white\"></rect>\n", " <rect x=\"69\" y=\"69\" width=\"23\" height=\"23\" fill=\"#D9D9D9\"></rect>\n", " <rect x=\"46\" y=\"46\" width=\"23\" height=\"23\" fill=\"black\"></rect>\n", " <rect x=\"46\" y=\"115\" width=\"23\" height=\"23\" fill=\"black\"></rect>\n", " <rect x=\"46\" y=\"69\" width=\"23\" height=\"23\" fill=\"black\"></rect>\n", " <rect x=\"23\" y=\"46\" width=\"23\" height=\"23\" fill=\"#D9D9D9\"></rect>\n", " <rect x=\"23\" y=\"115\" width=\"23\" height=\"23\" fill=\"#AEAEAE\"></rect>\n", " <rect x=\"23\" y=\"69\" width=\"23\" height=\"23\" fill=\"black\"></rect>\n", " </svg>\n", " <h1 style=\"font-weight: 900; margin-bottom: 7px;\">\n", " CLIP Guided Stable Diffusion Demo\n", " </h1>\n", " </div>\n", " <p style=\"margin-bottom: 10px; font-size: 94%\">\n", " Demo allows you to use newly released <a href=\"https://huggingface.co/laion\" style=\"text-decoration: underline\">CLIP models by LAION AI</a> with Stable Diffusion\n", " </p>\n", " </div>\n", " \"\"\"\n", " )\n", " with gr.Group():\n", " with gr.Box():\n", " with gr.Row().style(mobile_collapse=False, equal_height=True):\n", " text = gr.Textbox(\n", " label=\"Enter your prompt\",\n", " show_label=False,\n", " max_lines=1,\n", " placeholder=\"Enter your prompt\",\n", " ).style(\n", " border=(True, False, True, True),\n", " rounded=(True, False, False, True),\n", " container=False,\n", " )\n", " btn = gr.Button(\"Generate image\").style(\n", " margin=False,\n", " rounded=(False, True, True, False),\n", " )\n", "\n", " gallery = gr.Gallery(\n", " label=\"Generated images\", show_label=False, elem_id=\"gallery\"\n", " ).style(grid=[2], height=\"auto\")\n", "\n", " advanced_button = gr.Button(\"Advanced options\", elem_id=\"advanced-btn\")\n", "\n", " with gr.Row(elem_id=\"advanced-options\"):\n", " with gr.Column():\n", " clip_prompt = gr.Textbox(\n", " label=\"Enter a CLIP prompt if you want it to differ\",\n", " show_label=False,\n", " max_lines=1,\n", " placeholder=\"Enter a CLIP prompt if you want it to differ\",\n", " )\n", " with gr.Row():\n", " samples = gr.Slider(label=\"Images\", minimum=1, maximum=2, value=1, step=1)\n", " steps = gr.Slider(label=\"Steps\", minimum=1, maximum=50, value=45, step=1)\n", " with gr.Row():\n", " use_cutouts = gr.Checkbox(label=\"Use cutouts?\")\n", " num_cutouts = gr.Slider(label=\"Cutouts\", minimum=1, maximum=16, value=4, step=1)\n", " with gr.Row():\n", " with gr.Column():\n", " clip_model = gr.Dropdown([\"laion/CLIP-ViT-B-32-laion2B-s34B-b79K\", \"laion/CLIP-ViT-L-14-laion2B-s32B-b82K\", \"laion/CLIP-ViT-H-14-laion2B-s32B-b79K\", \"laion/CLIP-ViT-g-14-laion2B-s12B-b42K\", \"openai/clip-vit-base-patch32\", \"openai/clip-vit-base-patch16\", \"openai/clip-vit-large-patch14\"], value=\"laion/CLIP-ViT-B-32-laion2B-s34B-b79K\", show_label=False)\n", " with gr.Row():\n", " scale = gr.Slider(\n", " label=\"Guidance Scale\", minimum=0, maximum=50, value=7.5, step=0.1\n", " )\n", " seed = gr.Slider(\n", " label=\"Seed\",\n", " minimum=0,\n", " maximum=2147483647,\n", " step=1,\n", " randomize=True,\n", " )\n", " clip_scale = gr.Slider(\n", " label=\"CLIP Guidance Scale\", minimum=0, maximum=5000, value=100, step=1\n", " )\n", "\n", " ex = gr.Examples(examples=examples, fn=infer, inputs=[text, samples, steps, scale, clip_scale, seed], outputs=gallery, cache_examples=False)\n", " ex.dataset.headers = [\"\"]\n", "\n", " \n", " text.submit(infer, inputs=[text, clip_prompt, samples, steps, scale, clip_scale, seed, clip_model, use_cutouts, num_cutouts], outputs=gallery)\n", " btn.click(infer, inputs=[text, clip_prompt, samples, steps, scale, clip_scale, seed, clip_model, use_cutouts, num_cutouts], outputs=gallery)\n", " advanced_button.click(\n", " None,\n", " [],\n", " text,\n", " _js=\"\"\"\n", " () => {\n", " const options = document.querySelector(\"body > gradio-app\").querySelector(\"#advanced-options\");\n", " options.style.display = [\"none\", \"\"].includes(options.style.display) ? \"flex\" : \"none\";\n", " }\"\"\",\n", " )\n", " gr.HTML(\n", " \"\"\"\n", " <div class=\"footer\">\n", " <p>Model by <a href=\"https://huggingface.co/CompVis\" style=\"text-decoration: underline;\" target=\"_blank\">CompVis</a> and <a href=\"https://huggingface.co/stabilityai\" style=\"text-decoration: underline;\" target=\"_blank\">Stability AI</a> - Gradio Demo by πŸ€— Hugging Face\n", " </p>\n", " </div>\n", " <div class=\"acknowledgments\">\n", " <p><h4>LICENSE</h4>\n", "The model is licensed with a <a href=\"https://huggingface.co/spaces/CompVis/stable-diffusion-license\" style=\"text-decoration: underline;\" target=\"_blank\">CreativeML Open RAIL-M</a> license. The authors claim no rights on the outputs you generate, you are free to use them and are accountable for their use which must not go against the provisions set in this license. The license forbids you from sharing any content that violates any laws, produce any harm to a person, disseminate any personal information that would be meant for harm, spread misinformation and target vulnerable groups. For the full list of restrictions please <a href=\"https://huggingface.co/spaces/CompVis/stable-diffusion-license\" target=\"_blank\" style=\"text-decoration: underline;\" target=\"_blank\">read the license</a></p>\n", " <p><h4>Biases and content acknowledgment</h4>\n", "Despite how impressive being able to turn text into image is, beware to the fact that this model may output content that reinforces or exacerbates societal biases, as well as realistic faces, pornography and violence. The model was trained on the <a href=\"https://laion.ai/blog/laion-5b/\" style=\"text-decoration: underline;\" target=\"_blank\">LAION-5B dataset</a>, which scraped non-curated image-text-pairs from the internet (the exception being the removal of illegal content) and is meant for research purposes. You can read more in the <a href=\"https://huggingface.co/CompVis/stable-diffusion-v1-4\" style=\"text-decoration: underline;\" target=\"_blank\">model card</a></p>\n", " </div>\n", " \"\"\"\n", " )\n", "\n", "block.launch(debug=True)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "id": "AJK9zQDQlXCj" }, "outputs": [], "source": [ "#@title Generate on Colab\n", "\n", "prompt = \"fantasy book cover, full moon, fantasy forest landscape, golden vector elements, fantasy magic, dark light night, intricate, elegant, sharp focus, illustration, highly detailed, digital painting, concept art, matte, art by WLOP and Artgerm and Albert Bierstadt, masterpiece\" #@param {type: \"string\"}\n", "#@markdown `clip_prompt` is optional, if you leave it blank the same prompt is sent to Stable Diffusion and CLIP\n", "clip_prompt = \"\" #@param {type: \"string\"}\n", "num_samples = 1 #@param {type: \"number\"}\n", "num_inference_steps = 50 #@param {type: \"number\"}\n", "guidance_scale = 7.5 #@param {type: \"number\"}\n", "clip_guidance_scale = 100 #@param {type: \"number\"}\n", "num_cutouts = 4 #@param {type: \"number\"}\n", "use_cutouts = \"False\" #@param [\"False\", \"True\"]\n", "unfreeze_unet = \"True\" #@param [\"False\", \"True\"]\n", "unfreeze_vae = \"True\" #@param [\"False\", \"True\"]\n", "seed = 3788086447 #@param {type: \"number\"}\n", "\n", "if unfreeze_unet == \"True\":\n", " guided_pipeline.unfreeze_unet()\n", "else:\n", " guided_pipeline.freeze_unet()\n", "\n", "if unfreeze_vae == \"True\":\n", " guided_pipeline.unfreeze_vae()\n", "else:\n", " guided_pipeline.freeze_vae()\n", "\n", "generator = torch.Generator(device=\"cuda\").manual_seed(seed)\n", "\n", "images = []\n", "for i in range(num_samples):\n", " image = guided_pipeline(\n", " prompt,\n", " clip_prompt=clip_prompt if clip_prompt.strip() != \"\" else None,\n", " num_inference_steps=num_inference_steps,\n", " guidance_scale=guidance_scale, \n", " clip_guidance_scale=clip_guidance_scale,\n", " num_cutouts=num_cutouts,\n", " use_cutouts=use_cutouts == \"True\",\n", " generator=generator,\n", " ).images[0]\n", " images.append(image)\n", "\n", "image_grid(images, 1, num_samples)" ] }, { "cell_type": "code", "source": [], "metadata": { "id": "VWGacPN7NIfx" }, "execution_count": null, "outputs": [] } ], "metadata": { "accelerator": "GPU", "colab": { "collapsed_sections": [], "machine_shape": "hm", "provenance": [], "include_colab_link": true }, "kernelspec": { "display_name": "Python 3", "name": "python3" }, "language_info": { "name": "python" }, "widgets": { "application/vnd.jupyter.widget-state+json": { "2daa67ed0f8349e884458f323e0fe7c4": { "model_module": "@jupyter-widgets/controls", "model_name": "VBoxModel", "model_module_version": "1.5.0", "state": { "_dom_classes": [], "_model_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_model_name": "VBoxModel", "_view_count": null, "_view_module": "@jupyter-widgets/controls", "_view_module_version": "1.5.0", "_view_name": "VBoxView", "box_style": "", "children": [ "IPY_MODEL_1274282597bd42de88af452218757e0b", "IPY_MODEL_0af52b759c404cc5b87331a12822f0b5", "IPY_MODEL_e6b4e0d00e1f4df782745c4fa1371063", "IPY_MODEL_921e3f7a3e8341479c9c8e90876f513f" ], "layout": "IPY_MODEL_8cecd10427a944478ee1afa50ebb031f" } }, "1274282597bd42de88af452218757e0b": { "model_module": "@jupyter-widgets/controls", "model_name": "HTMLModel", "model_module_version": "1.5.0", "state": { "_dom_classes": [], "_model_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_model_name": "HTMLModel", "_view_count": null, "_view_module": "@jupyter-widgets/controls", "_view_module_version": "1.5.0", "_view_name": "HTMLView", "description": "", "description_tooltip": null, "layout": "IPY_MODEL_08576df5cdb14c479ea86d33fa31f811", "placeholder": "​", "style": "IPY_MODEL_e7816a219e474125b1a52cef31ceb4c3", "value": "<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.svg\nalt='Hugging Face'> <br> Copy a token from <a\nhref=\"https://huggingface.co/settings/tokens\" target=\"_blank\">your Hugging Face\ntokens page</a> and paste it below. <br> Immediately click login after copying\nyour token or it might be stored in plain text in this notebook file. </center>" } }, "0af52b759c404cc5b87331a12822f0b5": { "model_module": "@jupyter-widgets/controls", "model_name": "PasswordModel", "model_module_version": "1.5.0", "state": { "_dom_classes": [], "_model_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_model_name": "PasswordModel", "_view_count": null, "_view_module": "@jupyter-widgets/controls", "_view_module_version": "1.5.0", "_view_name": "PasswordView", "continuous_update": true, "description": "Token:", "description_tooltip": null, "disabled": false, "layout": "IPY_MODEL_d7a4538b87f74dfcbda1674ce20b5ef0", "placeholder": "​", "style": "IPY_MODEL_1ff8c99ac8454e71b0ff404e5942a0e6", "value": "" } }, "e6b4e0d00e1f4df782745c4fa1371063": { "model_module": "@jupyter-widgets/controls", "model_name": "ButtonModel", "model_module_version": "1.5.0", "state": { "_dom_classes": [], "_model_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_model_name": "ButtonModel", "_view_count": null, "_view_module": "@jupyter-widgets/controls", "_view_module_version": "1.5.0", "_view_name": "ButtonView", "button_style": "", "description": "Login", "disabled": false, "icon": "", "layout": "IPY_MODEL_65660ab2d07347efb093765743d019a9", "style": "IPY_MODEL_4c55e4c896eb43e79cf13e1ddcff8ef4", "tooltip": "" } }, "921e3f7a3e8341479c9c8e90876f513f": { "model_module": "@jupyter-widgets/controls", "model_name": "HTMLModel", "model_module_version": "1.5.0", "state": { "_dom_classes": [], "_model_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_model_name": "HTMLModel", "_view_count": null, "_view_module": "@jupyter-widgets/controls", "_view_module_version": "1.5.0", "_view_name": "HTMLView", "description": "", "description_tooltip": null, "layout": "IPY_MODEL_4064fa7d7feb4e1fa704fe5585add6a6", "placeholder": "​", "style": "IPY_MODEL_0cebee34b01e481497416bbde47b173a", "value": "\n<b>Pro Tip:</b> If you don't already have one, you can create a dedicated\n'notebooks' token with 'write' access, that you can then easily reuse for all\nnotebooks. </center>" } }, "8cecd10427a944478ee1afa50ebb031f": { "model_module": "@jupyter-widgets/base", "model_name": "LayoutModel", "model_module_version": "1.2.0", "state": { "_model_module": "@jupyter-widgets/base", "_model_module_version": "1.2.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "1.2.0", "_view_name": "LayoutView", "align_content": null, "align_items": "center", "align_self": null, "border": null, "bottom": null, "display": "flex", "flex": null, "flex_flow": "column", "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "overflow_x": null, "overflow_y": null, "padding": null, "right": null, "top": null, "visibility": null, "width": "50%" } }, "08576df5cdb14c479ea86d33fa31f811": { "model_module": "@jupyter-widgets/base", "model_name": "LayoutModel", "model_module_version": "1.2.0", "state": { "_model_module": "@jupyter-widgets/base", "_model_module_version": "1.2.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "1.2.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "overflow_x": null, "overflow_y": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null } }, "e7816a219e474125b1a52cef31ceb4c3": { "model_module": "@jupyter-widgets/controls", "model_name": "DescriptionStyleModel", "model_module_version": "1.5.0", "state": { "_model_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_model_name": "DescriptionStyleModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "1.2.0", "_view_name": "StyleView", "description_width": "" } }, "d7a4538b87f74dfcbda1674ce20b5ef0": { "model_module": "@jupyter-widgets/base", "model_name": "LayoutModel", "model_module_version": "1.2.0", "state": { "_model_module": "@jupyter-widgets/base", "_model_module_version": "1.2.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "1.2.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "overflow_x": null, "overflow_y": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null } }, "1ff8c99ac8454e71b0ff404e5942a0e6": { "model_module": "@jupyter-widgets/controls", "model_name": "DescriptionStyleModel", "model_module_version": "1.5.0", "state": { "_model_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_model_name": "DescriptionStyleModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "1.2.0", "_view_name": "StyleView", "description_width": "" } }, "65660ab2d07347efb093765743d019a9": { "model_module": "@jupyter-widgets/base", "model_name": "LayoutModel", "model_module_version": "1.2.0", "state": { "_model_module": "@jupyter-widgets/base", "_model_module_version": "1.2.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "1.2.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "overflow_x": null, "overflow_y": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null } }, "4c55e4c896eb43e79cf13e1ddcff8ef4": { "model_module": "@jupyter-widgets/controls", "model_name": "ButtonStyleModel", "model_module_version": "1.5.0", "state": { "_model_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_model_name": "ButtonStyleModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "1.2.0", "_view_name": "StyleView", "button_color": null, "font_weight": "" } }, "4064fa7d7feb4e1fa704fe5585add6a6": { "model_module": "@jupyter-widgets/base", "model_name": "LayoutModel", "model_module_version": "1.2.0", "state": { "_model_module": "@jupyter-widgets/base", "_model_module_version": "1.2.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "1.2.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "overflow_x": null, "overflow_y": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null } }, "0cebee34b01e481497416bbde47b173a": { "model_module": "@jupyter-widgets/controls", "model_name": "DescriptionStyleModel", "model_module_version": "1.5.0", "state": { "_model_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_model_name": "DescriptionStyleModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "1.2.0", "_view_name": "StyleView", "description_width": "" } } } } }, "nbformat": 4, "nbformat_minor": 0 }