Introduction to Hugging Face Diffusers

Since the advent of diffusion models, Computer Vision has seen tremendous growth in image generation capabilities. This spans image generation models, techniques, datasets, pipelines, and libraries. Amid all the innovations, the Hugging Face Diffusers library remains at the forefront. In this article, we will explore the Hugging Face Diffusers library for different image generation techniques. By the end of this article, you will be well-equipped to use the Hugging Face Diffusers library for generating images with different techniques. In addition to that, you will also get access to a notebook with all the experiments that we will discuss in this article.

Figure 1. Generate images gif or featured image here…

What is Hugging Face Diffusers?
Setting Up Hugging Face Diffusers
Text-to-Image with Stable Diffusion Pipeline
Stable Diffusion Image-to-Image Pipeline
Hugging Face Diffusers AutoPipeline
AutoPipeline for Image-to-Image
AutoPipeline for Image InPainting
Dynamic Masking for Inpainting
Key Takeaways
Summary and Conclusion

Here, we will take a comprehensive look at the capabilities of Diffusers and several of its pipelines.

What is Hugging Face Diffusers?

The Diffusers library by Hugging Face is one of the best libraries for pretrained diffusion models, image, video, and audio generation pipelines. It’s easy to use, and customizable, with a myriad of options to choose from. We can start generating or modifying images with Stable Diffusion models with just a few lines of code.

Figure 2. The Hugging Face Diffusers library.

The following are some of the highlights of the libraries:

Availability of all the official Stable Diffusion models along with hundreds of fine-tuned models for task and image specific generation.
Option to swap noise schedulers with the same pipeline with just a single line of code change.
Easy integration with Gradio to create seamless UI and shareable applications.

The above are only some of the benefits of the Diffusers library. e will cover a lot more about image generation and dive deep into the primary components of the library.

Setting Up Hugging Face Diffusers

Before we jump into its image generation capabilities, we need to install the library along with some additional dependencies.

Install the Hugging Face Diffusers library using the pip command.

pip install diffusers

Along with that, we also need the Transformers and Accelerate libraries.

pip install transformers
pip install accelerate

That’s all that is needed as part of setting up Diffusers for image generation. Let’s move into the code for generating images using Diffusers.

Download Code To easily follow along this tutorial, please download code by clicking on the button below. It's FREE!

Click here to download the source code to this post

Text-to-Image with Stable Diffusion Pipeline

The first step that needs to be done before generating images is setting the seed. This will ensure the generation of the same images for the same prompt and the number of steps. This makes it easier for us to compare images generated from different models, schedulers, and the number of steps.

seed = 42

Here, we will use a seed value of 42.

The diffusers library provides several pipelines for different tasks to use Stable Diffusion models. The most common use case is text-to-image generation for which the diffusers library provides the StableDiffusionPipeline.

Let’s start with creating an image using the pipeline along with the Stable Diffusion 1.5 model.

Stable Diffusion 1.5

First, import torch, StableDiffusionPipeline, and set the computation device.

import torch

from diffusers import StableDiffusionPipeline

It is advised to run diffusion models on the GPU, which makes the processing much faster compared to the CPU. We define the GPU device in the following code block.

# Set computation device.
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')

Before generating images, initialize the StableDiffusionPipeline using the from_pretrained method. This accepts the model name and a few other parameters, and then we load the pipeline to the GPU device.

pipe = StableDiffusionPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    use_safetensors=True,
    torch_dtype=torch.float16,
).to(device)

In the above code block, load the Stable Diffusion 1.5 model from the runwayml/stable-diffusion-v1-5 tag andalso use Safetensors and Float 16 data type.

safetensors is a new and secure file format for storing and loading tensors of saved models. This new format makes it difficult to store malicious code in saved models, which is becoming increasingly important with the evolution of Generative AI. In the code blocks shown, load safetensors by passing use_safetensors=True.

Typically, model weights are loaded in Float 32 (FP32) format. However, the Float 16 (FP16) format is being adopted widely now as more GPUs have started supporting it. The primary benefit of FP16 over FP32 is less usage of GPU memory, which may become vital when running Diffusion models on resource constrained devices.

Executing the above code block will download the model and configuration files for the first time if not already present.

The next code block defines a prompt and passes it through the pipeline to generate the image. We run the inference for 150 steps defined by the num_inference_steps parameter. Additionally, we pass the seed to the pipeline so that we always get the same image.

# Prompting to generate image.
prompt = "A photo of a large pirate ship sailing in the middle of a storm and lightning, highly detailed, unreal engine effect"
image = pipe(
    prompt, num_inference_steps=150, generator=torch.manual_seed(seed)
).images[0]

image

The following is the generated image.

A 'prate ship in a storm' image generated by Stable Diffusion 1.5. — Figure 3. A ‘pirate ship in a storm’ image generated by Stable Diffusion 1.5.

Stable Diffusion 2.1

We can use any Stable Diffusion model using StableDiffusionPipeline. The following example shows the use of Stable Diffusion 2.1 to generate an image.

pipe = StableDiffusionPipeline.from_pretrained(
    "stabilityai/stable-diffusion-2-1",
    torch_dtype=torch.float16,
    variant="fp16"
)

pipe = pipe.to(device)

While generating the above pipeline, we only need to change the model tag while keeping the other parameters the same.

Then. use the same prompt to generate an image.

prompt = "A photo of a large pirate ship sailing in the middle of a storm and lightning, highly detailed, unreal engine effect"
image = pipe(
    prompt, num_inference_steps=150, generator=torch.manual_seed(seed)
).images[0]
image

A 'prate ship in a storm' image generated by Stable Diffusion 2.1 with the Hugging Face Diffusers library. — Figure 4. A ‘prate ship in a storm’ image generated by Stable Diffusion 2.1 with the Hugging Face Diffusers library.

In most cases, Stable Diffusion 2.1 generates higher fidelity images compared to Stable Diffusion 1.5. It is apparent from the above image as well. Stable Diffusion 2.1 generates images at 768×768 resolution whereas Stable Diffusion 1.5 generates images at 512×512 resolution.

Using Negative Prompt with StableDiffusionPipeline

When generating images, we would want to minimize distorted, blurry, and unattractive images as much as possible. For this, Diffusers’s pipeline provides a negative_prompt argument which accepts a prompt string signifying the parts that we do not want in the image.

Let’s try that out with Stable Diffusion 2.1.

prompt = "A photo of a large pirate ship sailing in the middle of a storm and lightning, highly detailed, unreal engine effect"
image = pipe(
    prompt,
    num_inference_steps=150,
    generator=torch.manual_seed(seed),
    negative_prompt='low resolution, distorted, ugly, deformed, disfigured, poor details'
).images[0]
image

Figure 5. A pirate ship image generated by Stable Diffusion 2.1 along with the usage of negative prompt.

This time, the model generates a much sharper image with more intricate details on the ship and better reflections on the water as well. However, we can also observe loss of detail in the raindrops. This is one of the adverse effects of using negative prompts. Sometimes, they may cause the loss of certain details while enhancing others.

Swapping Schedulers

Stable Diffusion models use a scheduling technique at each time step to generate the images. You can learn more about schedulers in our Denoising Diffusion Probabilistic Models article.

We can also swap these schedulers to generate different images using the same prompt. In general, some schedulers work better than others. However, in most cases, we can safely leave it to the default setting which is DDIMScheduler.

from diffusers import EulerAncestralDiscreteScheduler

The pipe object has a scheduler attribute to assign a new scheduler.

pipe.scheduler = EulerAncestralDiscreteScheduler.from_config(pipe.scheduler.config)

image = pipe(
    prompt,
    num_inference_steps=150,
    generator=torch.manual_seed(seed),
    negative_prompt='low resolution, distorted, ugly, deformed, disfigured, poor details'
).images[0]

image

A pirate ship image generated with EulerAncestralDiscreteScheduler. — Figure 6. A pirate ship image generated EulerAncestralDiscreteScheduler.

Although it is difficult to say that one scheduler will always be better than the other, EulerAncestralDiscreteScheduler generates very high quality images. This is obvious from the above image which shows quality details of the water and lightning.

Stable Diffusion Image-to-Image Pipeline

There are times when we want to transfer the style of one image to another generated image. This is more commonly known as style transfer. We can do so with the Hugging Face Diffusers library using the StableDiffusionImg2ImgPipeline.

First, let’s import the required additional modules.

from PIL import Image

from diffusers import StableDiffusionImg2ImgPipeline

Next, initialize a new pipeline and read a style image.

pipe = StableDiffusionImg2ImgPipeline.from_pretrained(
    'stabilityai/stable-diffusion-2-1',
    torch_dtype=torch.float16
).to(device)

image_path = 'images/abstract_art_1.jpg'

init_image = Image.open(image_path).convert("RGB")
init_image = init_image.resize((768, 768))

init_image

Figure 7. An abstract art image for image-to-image pipeline.

The above is an abstract art image. resize it to 768×768 resolution as the Stable Diffusion 2.1 model will be used for image generation.

While executing the pipeline, we need to pass the init_image and a prompt along with additional arguments needed for image generation.

prompt = "A fantasy landscape, with starry night sky, moonlit river, trending on artstation"

image = pipe(
    prompt=prompt,
    image=init_image,
    num_inference_steps=150,
    strength=1,
    generator=torch.manual_seed(seed)
).images[0]

image

Figure 8. Image generation using Stable Diffusion 2.1 with Image-to-Image pipeline.

The model outputs a beautiful image preserving the color of the initialization image while following the prompt aptly.

Hugging Face Diffusers AutoPipeline

Remembering which pipeline to use can be confusing as individuals and even organizations can contribute models. To tackle this, diffusers provide the AutoPipeline which detects the task based on the models and the arguments provided.

We can pass any model path from the Hugging Face Hub depending on the task, no matter which organization has contributed to it or even if it is a stable diffusion variation or not.

Let’s start with a text-to-image example.

from diffusers import AutoPipelineForText2Image

pipe = AutoPipelineForText2Image.from_pretrained(
    'stabilityai/stable-diffusion-2-1',
    torch_dtype=torch.float16,
    use_safetensors=True
).to(device)

The rest of the code for generating images remains the same as in our previous approach.

prompt = "A white tiger, anime style, realism, detailed line art, fine details, solid lines"

image = pipe(
    prompt,
    num_inference_steps=150,
    generator=torch.manual_seed(seed)
).images[0]

image

Figure 9. Image generated using text-to-image autopipeline.

AutoPipeline for Image-to-Image

We can also use the AutoPipeline for image-to-image generation to transfer the style of one image to the generated image.

from diffusers import AutoPipelineForImage2Image

pipe = AutoPipelineForImage2Image.from_pretrained(
    'stabilityai/stable-diffusion-2-1',
    torch_dtype=torch.float16,
    use_safetensors=True,
).to(device)

image_path = 'images/abstract_art_2.jpg'

init_image = Image.open(image_path).convert("RGB")
init_image

Following is our initialization image.

Figure 10. Initial abstract image for image-to-image autopipeline.

prompt = "a painting of sand dunes in a desert, with sun rising on the horizon, sand particle style, pastel drawing, on canvas"

image = pipe(
    prompt,
    init_image,
    num_inference_steps=150,
    strength=1.0,
    generator=torch.manual_seed(seed)
).images[0]

image

Figure 11. Image generated using Stable Diffusion 2.1 with Image-to-Image Autopipeline.

AutoPipeline for Image InPainting

One of the most interesting use cases of generative models is inpainting. Inpainting is the task of creating a mask on an image, providing a prompt, and expecting the generative AI model to create the object in place of the mask based on the prompt. This may sound simple, however, the training strategy varies compared to training the vanilla image generation model. It requires additional fine-tuning and data. We will delve deeper into the training techniques while fine-tuning our own Stable Diffusion inpainting model in later parts of the series.

For now, let’s focus on using AutoPipeline for image inpainting with Stable Diffusion and Hugging Face Diffusers.

from diffusers import AutoPipelineForInpainting
from diffusers.utils import load_image

For inpainting, we load the same Stable Diffusion 1.5.

pipeline = AutoPipelineForInpainting.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    torch_dtype=torch.float16,
    use_safetensors=True
).to(device)

It’s interesting to note that the same image generation model can be used for inpainting. The rest of the initializations for inpainting are done by the AutoPipelineForInpainting class.

For image inpainting, we need two components:

An input image in RGB format.
A binary mask.

The following code block downloads the image and mask from Dropbox URLs.

img_url = "https://www.dropbox.com/scl/fi/oymummr3q4i86fbx79qv4/car.jpg?rlkey=8aogj5fwb0lrzd5u36rb3ifih&dl=1"
mask_url = "https://www.dropbox.com/scl/fi/le25tqopqapligom78myp/car_mask.png?rlkey=8t556dfxi1mr543j5fdwcg5mc&dl=1"

init_image = load_image(img_url).convert("RGB")
mask_image = load_image(mask_url).convert("RGB")

Following are the input images and masks.

Figure 12. Initial car image for Diffusers image inpainting.

Figure 13. Image mask for Diffusers inpainting.

The mask contains white pixels (255, 255, 255) in the region where the car is present and the rest is black pixels (0, 0, 0). That’s how the Stable Diffusion model knows where to generate the image based on the prompt.

prompt = "A glass house in the middle of a ground"
image = pipeline(
    prompt,
    image=init_image,
    mask_image=mask_image,
    num_inference_steps=150,
    generator=torch.manual_seed(seed)
).images[0]

image

When executing the pipeline, we provide an additional mask_image argument that accepts the binary mask.

Figure 14. Image inpainting result using Stable Diffusion 1.5

In the final image, the glass house is generated only in the masked region without additional texture to cover the remaining mask. Generally, the model will try to best fit the mask with the main subject and fill the remaining area with plausible content.

There is a big limitation to the above approach. We need to have the mask ready (generated separately) and feed it to the model. It can be cumbersome to generate masks separately for each image and load them. However, there is a better way where we can generate the mask directly in Gradio UI.

Dynamic Masking for Inpainting

For this approach, we need to install a specific version of Gradio.

pip install gradio==3.50.2

Next, import the required libraries, define a helper function to process images and masks, and initialize the Gradio interface.

import gradio as gr
import numpy as np
from PIL import Image

def process_and_save_image(uploaded_image):
    # Save the final mask image to disk
    final_mask = Image.fromarray(uploaded_image['mask'])
    final_image = Image.fromarray(uploaded_image['image'])
   
    save_path_mask = './final_mask_image.png'
    save_path_image = './final_image.png'
   
    final_mask.save(save_path_mask)
    final_image.save(save_path_image)

    # Return the original and the final mask image
    return final_image, final_mask

# Create the Gradio interface
iface = gr.Interface(
    fn=process_and_save_image,
    inputs=[
        gr.Image(
            tool="sketch",
            image_mode='RGBA',
            shape=(512, 512),
            brush_color="#ffffff",
            mask_opacity=1.0
        ),
    ],
    outputs=[
        gr.outputs.Image(label="Original Image", type='pil'),
        gr.outputs.Image(label="Final Mask Image", type='pil')
    ]
)

# Launch the interface
iface.launch(share=True)

For a better understanding, here is what the Gradio UI looks like.

Figure 15. Gradio interface for Diffusers inpainting.

There is one input image component, where we upload the image and mask the region of choice. As our input image (inputs in the above code block) is in RGBA format, it contains an alpha mask. It contains a 'mask' and 'image' component that we extract in the first two lines in the process_and_save_image function. The function saves both the final image and the mask to the disk that we load in the next code block and passes them through the inpainting pipeline.

init_image = load_image('final_image.png').convert("RGB")
mask_image = load_image('final_mask_image.png').convert("RGB")

prompt = "A glass house in the middle of the ground, highly detailed, 4k, trending on artstation"
image = pipeline(
    prompt,
    image=init_image,
    mask_image=mask_image,
    num_inference_steps=150,
    generator=torch.manual_seed(seed)
).images[0]

image

Figure 16. Image inpainting result obtained using Gradio UI with Diffusers.

The rest of the code remains the same. The benefit of the above approach is that the Gradio UI can launch directly in a Jupyter Notebook and we can create any mask of our choice as per requirements.

Of course, generating random objects or images of our choice with inpainting may not always work well. We may need to fine-tune a Stable Diffusion inpainting model to generate specific objects like furniture. Although not straightforward, it is certainly possible and we will explore exactly that in one of the future posts of the series.

Key Takeaways

Setting Up Hugging Face Diffusers: Setting up the Diffusers library is easy and straightforward. It can simply be done with the pip command.
Access to Diffusion Model: With Diffusers, users get access to hundreds of diffusion based models for image generation, image-to-image style transformer, image inpainting and more.
Running models and Schedulers: With pipelines, it becomes extremely simple to run models and swap schedulers of different models as well. In fact, with the AutoPipeline, the overhead of the model path and initialization is handled by the library, leaving the user with creative task of generating amazing content.

Summary and Conclusion

In this article, we explored the Diffusers library for image generation tasks. Starting from the introduction of Diffusers to text-to-image, image-to-image, inpainting, and AutoPipeline. There are many more components available in the library, and the official Diffusers documentation is the best place to learn more.

What are you going to build with the knowledge of Generative AI? Let us know in the comments.

Introduction to Hugging Face Diffusers

What is Hugging Face Diffusers?

Setting Up Hugging Face Diffusers

Text-to-Image with Stable Diffusion Pipeline

Stable Diffusion 1.5

Stable Diffusion 2.1

Using Negative Prompt with StableDiffusionPipeline

Swapping Schedulers

Stable Diffusion Image-to-Image Pipeline

Hugging Face Diffusers AutoPipeline

AutoPipeline for Image-to-Image

AutoPipeline for Image InPainting

Dynamic Masking for Inpainting

Key Takeaways

Summary and Conclusion

Get Started with OpenCV

Subscribe to receive the download link, receive updates, and be notified of bug fixes

Which email should I send you the download link?

What is Hugging Face Diffusers?

Setting Up Hugging Face Diffusers

Text-to-Image with Stable Diffusion Pipeline

Stable Diffusion 1.5

Stable Diffusion 2.1

Using Negative Prompt with StableDiffusionPipeline

Swapping Schedulers

Stable Diffusion Image-to-Image Pipeline

Hugging Face Diffusers AutoPipeline

AutoPipeline for Image-to-Image

AutoPipeline for Image InPainting

Dynamic Masking for Inpainting

Key Takeaways

Summary and Conclusion

Subscribe & Download Code

Get Started with OpenCV

Subscribe to receive the download link, receive updates, and be notified of bug fixes

Which email should I send you the download link?