laitimes

ControlNet v1.1 Definitive Guide [Stable Proliferation]

author:The brain in the new tank

ControlNet is a stable diffusion model that lets you copy a composition or body pose from a reference image.

Experienced stable diffusion users know how hard it is to generate the exact ingredients you want. The image is a bit random. All you can do is play the numbers game: generate tons of images and choose the ones you like.

With ControlNet, stable diffusion users finally have a way to precisely control the position and appearance of their subjects!

In this post, we will cover everything you need to know about ControlNet.

ControlNet v1.1 Definitive Guide [Stable Proliferation]
Recommended: Use the NSDT editor to quickly build programmable 3D scenes

1. What is ControlNet?

ControlNet is a neural network model for controlling stable diffusion models. You can use ControlNet with any stable diffusion model.

The most basic form of using a stable diffusion model is text-to-image. It uses a text prompt as a condition to guide image generation in order to produce an image that matches the text prompt.

ControlNet adds a condition in addition to the text prompt. Additional adjustments can take many forms in ControlNet.

Let me show you two examples of what ControlNet can do: using (1) edge detection and (2) human pose detection to control image generation.

1.1 ControlNet edge detection example

As shown in the following figure, ControlNet takes additional input images and detects their contours using the Canny edge detector. The image containing the detected edge is then saved as a control chart. It feeds into the ControlNet model as an additional condition for text prompts.

ControlNet v1.1 Definitive Guide [Stable Proliferation]

Stable diffusion control network with edge detection

The process of extracting specific information (in this case, edges) from the input image is called annotation (in this research article) or preprocessing (in the ControlNet extension).

1.2 ControlNet human posture detection example

Edge detection is not the only way to preprocess images. Openpose is a fast human keypoint detection model that extracts human postures such as the position of hands, legs, and head. See the example below.

ControlNet v1.1 Definitive Guide [Stable Proliferation]

Annotated input image for human pose detection using Openpose

Below is the ControlNet workflow using OpenPose. Use OpenPose to extract keys from the input image and save them as a control chart with key locations. It is then fed to stable diffusion along with text prompts as additional conditions. The image is generated based on these two conditions.

What is the difference between using Canny edge detection and Openpose? The Canny Edge Detector extracts the edges of the subject and background. It tends to translate scenes more faithfully. You can see the dancing man turned into a woman, but the silhouette and hairstyle were preserved.

OpenPose only detects key points in humans, such as the position of the head, arms, etc. Image generation is more free, but follows the original pose.

The example above generates a woman jumping up with her left foot pointing sideways, unlike the original image and the image in the Canny Edge example. The reason is that OpenPose's key point detection does not specify the orientation of the foot.

2. Install stable diffusion ControlNet

Let's take a look at how to install ControlNet in AUTOMATIC1111, a popular and full-featured (and free!) Stable diffusion graphical user interface. We will use this extension (the de facto standard) to use ControlNet.

If you already have ControlNet installed, you can skip to the next section to learn how to use it.

2.1 Install ControlNet in Colab

In our quick start guide, it's easy to use ControlNet with a one-click stable diffusion Colab notebook.

In the Extensions section of the Colab notebook, check ControlNet.

ControlNet v1.1 Definitive Guide [Stable Proliferation]

Press the Play button to launch the AUTOMATIC1111. That's it!

2.2 Installing the ControlNet extension (Windows/Mac)

You can also use ControlNet with AUTOMATIC1111 on a Windows PC or Mac. FOLLOW THE INSTRUCTIONS IN THESE ARTICLES TO INSTALL AUTOMATIC1111 IF YOU HAVE NOT ALREADY DONE SO.

IF YOU ALREADY HAVE AUTOMATIC1111 INSTALLED, MAKE SURE YOUR COPY IS UP TO DATE.

  • Navigate to the Extensions page.
  • Select the Install from URL tab.
  • Place the following URL in the URL of the extension's repository field:
https://github.com/Mikubill/sd-webui-controlnet           
  • Click the Install button.
  • Wait for a confirmation message stating that the extension is installed.
  • Restart AUTOMATIC1111.
  • Visit the ControlNet model page.
  • Download all model files with file names ending in .pth. If you don't want to download them all, you can download the most commonly used OpenPose and Canny models now.
  • Place the model files in the ControlNet extension's model directory:
stable-diffusion-webui\extensionssd-webui-controlnet\models           
  • Restart the AUTOMATIC1111 webui.

If the extension is successfully installed, you'll see a new collapsible section called ControlNet in the txt2img tab. It should be just above the script drop-down menu.

ControlNet v1.1 Definitive Guide [Stable Proliferation]

This indicates that the extension installation was successful.

3. Install the T2I adapter

The T2I adapter is a neural network model that provides additional control over image generation for diffusion models. They are similar in concept to ControlNet, but have a different design.

ControlNet v1.1 Definitive Guide [Stable Proliferation]

The A1111 Control Network extension can use the T2I adapter. You'll need to download the model here. Get those whose file names read like t2iadapter_XXXXX.pth

Many T2I adapters overlap the functionality of ControlNet models. I will introduce only the following two.

  • t2iadapter_color_sd14v1.pth
  • t2iadapter_style_sd14v1.pth

Place them in ControlNet's model folder.

stable-diffusion-webui\extensions\sd-webui-controlnet\models           

4. Update the ControlNet extension

ControlNet is an extension that has undergone rapid growth. It's not uncommon to find your copy of ControlNet outdated.

The update is only required if you are running AUTOMATIC1111 locally on Windows or Mac. The site's Colab notebooks always run the latest ControlNet extensions.

To determine if your version of ControlNet is the latest version, compare the version number in the ControlNet section on the txt2img page with the latest version number.

4.1 Option 1: Update from the web UI

The easiest way to update the ControlNet extension is to use the AUTOMATIC1111 GUI.

  • Go to the Extensions page.
  • In the Installed tab, click Check for updates.
  • Wait for the confirmation message.
  • Completely shut down and restart the AUTOMATIC1111 Web UI.
ControlNet v1.1 Definitive Guide [Stable Proliferation]

4.2 Option 2: Command line

If you're familiar with the command line, you can use this option to update ControlNet so you can rest assured that the web-UI won't do anything else.

Step 1: Open the Terminal app (Mac) or PowerShell app (Windows).

Step 2: Navigate to the folder of the ControlNet extension. (If installed elsewhere, please adjust accordingly)

cd stable-diffusion-webui/extensions/sd-webui-controlnet           

Step 3: Update the extension by running the following command.

git pull           

5. Simple example of using ControlNet

Now that you have ControlNet installed, let's walk through a simple example of using it! You'll see a detailed description of each setting later.

You should install the ControlNet extension to keep up with this section. This can be verified by looking at the Control Network section below.

ControlNet v1.1 Definitive Guide [Stable Proliferation]

Press the caret on the right to expand the ControlNet panel. It shows the full part of the control knob and the image upload canvas.

ControlNet v1.1 Definitive Guide [Stable Proliferation]

I'll use the following diagram to show how to use ControlNet. You can download the image by clicking the download button to follow the tutorial.

ControlNet v1.1 Definitive Guide [Stable Proliferation]

5.1 Text to Image Settings

ControlNet needs to be used with a stable diffusion model. In the Stable Diffusion Checkpoint drop-down menu, select the model you want to use with ControlNet. Select v1-5-pruned-emaonly.ckpt to use the v1.5 base model.

ControlNet v1.1 Definitive Guide [Stable Proliferation]

In the txt2image tab, write hints and (optionally) negative hints for ControlNet. I'll use the following tips.

Prompt:

full-body, a young female, highlights in hair, dancing outside a restaurant, brown eyes, wearing jeans

Negative Tips:

disfigured, ugly, bad, immature

Sets the image size of the image generated by the image. I will use width 512 and height 776 for my demo images. Note that the image size is set in the txt2img section, not in the ControlNet section.

The GUI should look like the following.

ControlNet v1.1 Definitive Guide [Stable Proliferation]

5.2 ControlNet settings

Now let's go to the "ControlNet" panel.

First, upload the image to the image canvas.

Select the Enable check box.

You need to select the preprocessor and model. The preprocessor is just a different name for the annotator mentioned earlier, such as the OpenPose key point detector. Let's choose Openpose as the preprocessor.

The selected ControlNet model must be consistent with the preprocessor. For OpenPose, control_openpose-fp16 should be chosen as the model.

The ControlNet panel should look like the following.

ControlNet v1.1 Definitive Guide [Stable Proliferation]

That's it. Now press Generate to start generating images using the control network.

You should see the resulting image to follow the pose of the input image. The last image comes directly from the preprocessing step. In this case, it is the key point of detection.

ControlNet v1.1 Definitive Guide [Stable Proliferation]

When you're done, uncheck the Enable checkbox to disable the ControlNet extension.

Here are the basics of using ControlNet!

All that remains is to understand:

  • What preprocessors are available (there are many!) )
  • Control net settings

6. Preprocessor and model

The first step in using ControlNet is to select a preprocessor. It's helpful to turn on the preview so you know what the preprocessor is doing. After preprocessing is complete, the original image is discarded and only the preprocessed image is used for ControlNet.

To open the preview:

  • Select Allow preview.
  • Optionally, select Pixel Perfect. ControlNet will use the image height and width that you specify in Text to Image to generate a preprocessed image.
  • Click the exploded icon next to the Preprocessor drop-down menu.
ControlNet v1.1 Definitive Guide [Stable Proliferation]

Some control models can have too much impact on the image. If you see color issues or other artifacts, reduce the weight of the control.

6.1 Choose the right model

After you select a preprocessor, you must select the correct model.

It's easy to tell which model is the right model to use in v1.1. All you need to do is choose a model that has the same starting keyword as the preprocessor.

For example:

Preprocessor model
depth_xxxx control_xxxx_depth
lineart_xxxx control_xxxx_lineart
openpose_xxxx control_xxxx_openpose

6.2 OpenPose preprocessor

There are multiple OpenPose preprocessors.

OpenPose detects key points in humans, such as the position of the head, shoulders, hands, etc. It can be used to replicate human poses without having to replicate other details such as clothing, hairstyles, and backgrounds.

All openpose preprocessors need to be used with the openpose model in ControlNet's Model drop-down menu.

The OpenPose preprocessor includes:

  • OpenPose: eyes, nose, eyes, neck, shoulders, elbows, wrists, knees and ankles.
  • OpenPose_face: OpenPose+ facial details
  • OpenPose_hand: OpenPose+ hands and fingers
  • OpenPose_faceonly: Facial details only
  • OpenPose_full: All of the above
  • dw_openPose_full: An enhanced version of OpenPose_full

OpenPose is the basic OpenPose preprocessor that detects the position of the eyes, nose, eyes, neck, shoulders, elbows, wrists, knees, and ankles.

ControlNet v1.1 Definitive Guide [Stable Proliferation]

OpenPose_face does everything that the OpenPose processor does, but other facial details are detected.

It is useful for replicating facial expressions.

ControlNet v1.1 Definitive Guide [Stable Proliferation]

OpenPose faces detect only faces, not other key points. This is useful for copying only faces and not for other key points.

ControlNet v1.1 Definitive Guide [Stable Proliferation]

OpenPose_hand detection key points are OpenPose and hands and fingers.

ControlNet v1.1 Definitive Guide [Stable Proliferation]

OpenPose full detects all openPose faces and everything that openPose hands do.

ControlNet v1.1 Definitive Guide [Stable Proliferation]

DWPose is a novel posture detection algorithm for efficient whole-body pose estimation based on two-stage distillation in the research paper. It accomplishes the same task as OpenPose Full, but does it better. dw_openpose_full should be used, not openpose_full.

ControlNet v1.1 Definitive Guide [Stable Proliferation]

If you don't see dw_openpose_full in the preprocessor menu, update ControlNet.

DW OpenPose does a better job of detecting hands and fingers.

6.3 Slice resampling

Tile resample models are used to add detail to an image. It is usually used with an upconverter to enlarge images simultaneously.

See ControlNet Tile Scaling methods.

6.4 Referencing the Preprocessor

References are a new set of preprocessors that can be used to generate images similar to referenced images. The image will still be affected by the stable diffusion model and prompts.

Referencing the preprocessor does not use the control model. You only need to select the preprocessor, not the model. In fact, when the reference preprocessor is selected, the Model drop-down menu is hidden.

There are 3 reference preprocessors.

  • Reference ADIN: Style migration through adaptive instance normalization. thesis
  • Reference only: Link the referenced image directly to the attention layer.
  • Reference adain+attn: A combination of the above.

Select one of the preprocessors to use.

Here's an example.

ControlNet v1.1 Definitive Guide [Stable Proliferation]

Reference an image

Use the CLIP interrogator to guess the prompt.

a woman with pink hair and a robot suit on, with a sci – fi, Artgerm, cyberpunk style, cyberpunk art, retrofuturism

Negative prompt:

disfigured, ugly, bad, immature

Model: Protogen v2.2

Reference adain:

ControlNet v1.1 Definitive Guide [Stable Proliferation]

Reference only:

ControlNet v1.1 Definitive Guide [Stable Proliferation]

Reference adain + attn:

ControlNet v1.1 Definitive Guide [Stable Proliferation]

I would say that Reference only works best.

The above images are all from Balance Mode. I don't think changing the style fidelity makes much difference.

6.5 Canny Edge Detector

Canny's Edge Detector is a versatile old-fashioned edge detector. It extracts the outline of the image. It is useful for preserving the composition of the original image.

Select Canny in the Preprocessor and Model drop-down menus.

ControlNet v1.1 Definitive Guide [Stable Proliferation]

The resulting image will follow the outline.

ControlNet v1.1 Definitive Guide [Stable Proliferation]

6.6 Deep Preprocessor

The depth preprocessor guesses the depth information from the reference image.

  • Depth Midas: The classic depth estimator. Also used in the official v2 image depth model.
  • Depth: More detail, but also tends to render the background.
  • Depth Leres++: More details.
  • Zoe: The level of detail is somewhere between Midas and Leres.

Ref image:

ControlNet v1.1 Definitive Guide [Stable Proliferation]

Depth map:

ControlNet v1.1 Definitive Guide [Stable Proliferation]

Prompt:

a woman retrofuturism

Negative prompts:

disfigured, ugly, bad, immature

You can see that the resulting image follows a depth map (Zoe).

ControlNet v1.1 Definitive Guide [Stable Proliferation]

Compare with the more detailed Leres++:

ControlNet v1.1 Definitive Guide [Stable Proliferation]

6.7 Line art

Line Art renders the outline of an image. It tries to convert it into a simple drawing.

There are some line art preprocessors.

  • Line Art Anime: Anime-style lines
  • Line Art Anime Noise Reduction: Anime style lines with less detail.
  • Line Art Realistic: Realistic style of lines.
  • Rough line art: Realistic style lines with heavy weight.

Used with line art control models.

The image below is generated with the control weight set to 0.7.

Line art anime

ControlNet v1.1 Definitive Guide [Stable Proliferation]

Line art anime noise reduction

ControlNet v1.1 Definitive Guide [Stable Proliferation]

Line art is realistic:

ControlNet v1.1 Definitive Guide [Stable Proliferation]

Rough line art:

ControlNet v1.1 Definitive Guide [Stable Proliferation]

6.8 MLSD

M-LSD (Moving Line Segment Detection) is a linear detector. It can be used to extract contours with straight edges, such as interior design, buildings, street scenes, picture frames, and paper edges.

Curves are ignored.

ControlNet v1.1 Definitive Guide [Stable Proliferation]

6.9 Normal maps

A normal map specifies the orientation of the surface. For ControlNet, it is an image that specifies the orientation of the surface on which each pixel is located. Image pixels represent the direction the surface is facing, not the color value.

The use of normal maps is similar to depth maps. They are used to transfer the 3D composition of the reference image.

Normal Map Preprocessor:

  • Normal Midas: Estimate the normal map based on the Midas depth map.
  • Normal Bae: Estimate the normal plot using the normal uncertainty method proposed by Pei et al.

Like Midas depth maps, Midas normal maps are great for isolating the subject from the background.

ControlNet v1.1 Definitive Guide [Stable Proliferation]

Bae normal maps tend to render details in the background and foreground.

ControlNet v1.1 Definitive Guide [Stable Proliferation]

6.10 Graffiti

The Scribble preprocessor turns a picture into a graffiti, just like a hand-drawn one.

  • Scribble HED: Holistic nested edge detection (HED) is an edge detector that excels at generating contours like a real person. According to the authors of ControlNet, HED is suitable for recoloring and redesigning images.
  • Scribble Pidinet: Pidinet detection curves and straight edges. Its results are similar to HED, but often result in sharper lines and less detail.
  • Scribble xdog: Gaussian difference (XDoG) is an edge detection method technique. It is important to adjust the xDoG threshold and observe the preprocessor output.

All of these preprocessors should be used with the Scribble control model.

Scribble HED produces rough graffiti lines.

ControlNet v1.1 Definitive Guide [Stable Proliferation]

Scribble Pidinet tends to produce thick lines with little detail. It is suitable for replicating board outlines without fine details.

ControlNet v1.1 Definitive Guide [Stable Proliferation]

The level of detail can be controlled by adjusting the Scribble XDoG threshold, making xDoG a versatile preprocessor for creating graffiti.

ControlNet v1.1 Definitive Guide [Stable Proliferation]

6.11 Splitting the Preprocessor

The segment preprocessor marks the object type in the reference image.

Below is a running split processor.

ControlNet v1.1 Definitive Guide [Stable Proliferation]

Buildings, sky, trees, people, and sidewalks are all marked with different predefined colors.

Object categories and colors can be found in the color maps of ufade20k and ofade20k here.

There are several segmentation options

  • ufade20k: UniFormer (uf) segmentation trained on the ADE20K dataset.
  • ofade20k: OneFormer (of) segmentation trained on the ADE20k dataset.
  • ofcoco: The former segmentation trained on the COCO dataset.

Note that the color maps for ADE20k and COCO splits are different.

You can use the Split Preprocessor to transfer the position and shape of objects.

The following uses these preprocessors with the same hints and seeds.

Futuristic city, tree, buildings, cyberpunk

UniFormer ADE20k (ufade20k) in this example accurately labels everything.

The OneFormer ADE20k (ofade20k) is a bit louder in this situation, but it doesn't affect the final image.

ControlNet v1.1 Definitive Guide [Stable Proliferation]

OneFormer COCO (ofcoco) performs similarly, but with some labeling errors.

ControlNet v1.1 Definitive Guide [Stable Proliferation]

Segmentation is a powerful technique. You can further manipulate the split map to place objects in precise locations. Use the color map of the ADE20k.

6.12 Shuffle

The Shuffle preprocessor agitates the input image. Let's see the actual effect.

Together with the shuffle control model, the shuffle preprocessor can be used to transfer the color scheme of the reference image.

Input image:

ControlNet v1.1 Definitive Guide [Stable Proliferation]

Shuffle preprocessor:

ControlNet v1.1 Definitive Guide [Stable Proliferation]

Unlike other preprocessors, random preprocessors are random. It will be affected by your seed value.

Use the Random Preprocessor with the Shuffle control model. The random control model can be used with or without a random preprocessor.

The following diagram shows the ControlNet Shuffle preprocessor and Shuffle model (same as the hint in the previous section). The color scheme roughly follows the reference image.

ControlNet v1.1 Definitive Guide [Stable Proliferation]

The following figure contains only the ControlNet Shuffle model (preprocessor: none). The image composition is closer to the original image. The color scheme is similar to shuffling.

ControlNet v1.1 Definitive Guide [Stable Proliferation]

The following image has the same hint without ControlNet. The color scheme is very different.

ControlNet v1.1 Definitive Guide [Stable Proliferation]

6.13 Color grid T2I adapter

The color raster T2i adapter preprocessor scales the reference image down to 64x and then expands it back to its original size. The net effect is a grid-like patch of local average color.

Original reference image:

ControlNet v1.1 Definitive Guide [Stable Proliferation]

Preprocessing with t2ia_color_grid:

ControlNet v1.1 Definitive Guide [Stable Proliferation]

You can then use the preprocessed image with a T2I color adapter (t2iadapter_color) control model.

Image generation will spatially loosely follow a color scheme.

A modern living room
ControlNet v1.1 Definitive Guide [Stable Proliferation]

Increase the ControlNet weight to make it track more closely.

You can also use the preprocessor None for this T2I color model.

In my opinion, it is very similar to image to image.

6.14 Clip vision style T2I adapter

t2ia_style_clipvision convert a reference image to a CLIP visual embedding. This embed contains rich information about the image content and style.

You will need to use the control model t2iadapter_style_XXXX.

Check out the effects of this amazing style shift:

Ref image:

ControlNet v1.1 Definitive Guide [Stable Proliferation]

T2I adapter – CLIP vision:

sci-fi girl
ControlNet v1.1 Definitive Guide [Stable Proliferation]

Here's what this prompt generates when you close the control net.

ControlNet v1.1 Definitive Guide [Stable Proliferation]

The functionality is very similar to the reference control net, but I think T2IA CLIP Vision is more powerful.

6.15 ControlNet InPainting

ControlNet InPainting allows the use of high noise reduction intensity in Inpainting to generate large variations without sacrificing consistency with the entire image.

For example, I used hints for real people.

Model: HenmixReal v4

photo of young woman, highlight hair, sitting outside restaurant, wearing dress, rim lighting, studio lighting, looking at the camera, dslr, ultra quality, sharp focus, tack sharp, dof, film grain, Fujifilm XT3, crystal clear, 8K UHD, highly detailed glossy eyes, high detailed skin, skin pores

Negative prompts:

disfigured, ugly, bad, immature, cartoon, anime, 3d, painting, b&w

I have this image and want to regenerate the face with InPainting.

ControlNet v1.1 Definitive Guide [Stable Proliferation]

If I paint my face with a high denoising intensity (> 0.4), the results may be inconsistent globally. The following is an uncolored image with denoising intensity 1:

ControlNet v1.1 Definitive Guide [Stable Proliferation]

ControlNet Inpainting is the solution.

To repair using ControlNet:

1. It is best to use the same model that generated the image. After generating the image on the txt2img page, click Send to Inpaint to send the image to the Inpaint tab on the Img2img page.

2. Use the brush tool to create a mask on the area you want to regenerate. If you're not familiar, see the beginner's tutorial on Inpainting.

ControlNet v1.1 Definitive Guide [Stable Proliferation]

3. Set "Colored Area" to "Mask Only". (The whole picture also works)

4. Set the noise reduction intensity to 1. Without ControlNet, you usually wouldn't set it that high.

5. Set the following parameters in the "ControlNet" section. There is no need to upload reference images.

  • Enabled: Yes
  • Preprocessor: Inpaint_global_harmonious
  • Model: ControlNet
ControlNet v1.1 Definitive Guide [Stable Proliferation]

6. Start the repair by generating.

Now, even at the maximum noise reduction intensity (1), I get a new face that is consistent with the overall image!

ControlNet v1.1 Definitive Guide [Stable Proliferation]

Currently, there are 3 inpainting preprocessors

  • Inpaint_global_harmonious: Improve global consistency and allow you to use high noise reduction intensities.
  • Inpaint_only: Unmasked areas are not changed. IT WORKS WITH AUTOMATIC1111
  • Inpaint_global_harmonious the same.
  • Inpaint_only+lama: Process images with lama models. It tends to produce cleaner results and favors object deletion.
ControlNet v1.1 Definitive Guide [Stable Proliferation]

7. Complete instructions for ControlNet settings

You've seen a lot of settings in the ControlNet extension! It can be a little scary when you first use it, but let's cover them one by one.

It will be an in-depth dive. Take a break and go to the bathroom if needed...

7.1 Input Control

ControlNet v1.1 Definitive Guide [Stable Proliferation]
  • Image canvas: You can drag and drop input images here. You can also click the canvas and select a file using the file browser. The input image will be processed by the selected preprocessor in the Preprocessor drop-down menu. A control chart is created.
  • Write icon: Create a new canvas with a white image instead of uploading a reference image. It is used to create doodles directly.
  • Camera icon: Take a picture with your device's camera and use it as an input image. You need to give your browser permission to access the camera.

7.2 Model Selection

ControlNet v1.1 Definitive Guide [Stable Proliferation]
  • Enabled: Whether ControlNet is enabled.
  • Low VRAM: For GPUs with less than 8GB of VRAM. This is an experimental feature. Check if the GPU is running out of memory or if you want to increase the number of images processed.
  • Allow preview: Select this option to enable the preview window next to the reference image. I recommend that you choose this option. Use the explode icon next to the preprocessor drop-down menu to preview the effect of the preprocessor.
  • Preprocessor: A preprocessor (called an annotator in research articles) that preprocesses input images, such as detecting edges, depth, and normal maps. The input image is not used as a control chart.
  • Model: The control network model to use. If you select a preprocessor, you typically select the appropriate model. The ControlNet model is used with the stable diffusion model selected at the top of the AUTOMATIC1111 GUI.

7.3 Control Weights

Below the Preprocessor and Model drop-down menus, you'll see three sliders that you can use to control the effect: Control Weight, Start and End Control Step.

ControlNet v1.1 Definitive Guide [Stable Proliferation]

I'll use the following image to illustrate the effect of controlling weights. This is an image of a girl sitting down.

ControlNet v1.1 Definitive Guide [Stable Proliferation]

But in the prompt, I would ask to generate a woman who stands up.

full body, a young female, highlights in hair, standing outside restaurant, blue eyes, wearing a dress, side light

Weight: The degree of emphasis given to the control map relative to the prompt. It is similar to keyword weighting in hints, but applies to control mapping.

The following images were generated using the ControlNet OpenPose preprocessor and the OpenPose model.

ControlNet v1.1 Definitive Guide [Stable Proliferation]

As you can see, Controlnet weights control the degree to which the control mapping is followed relative to the prompt. The lower the weight, the less ControlNet requires the image to follow the control chart.

Start the ControlNet step: Apply the step ControlNet first. 0 indicates the first step.

End Control Network Step: Step Control Network ends. 1 indicates the final step.

Let's fix the starting step fixed to 0 and change the ending ControlNet step to see what happens.

ControlNet v1.1 Definitive Guide [Stable Proliferation]

Because the initial step sets up the global combination (the sampler removes the maximum amount of noise in each step, and it starts with a random tensor in the latent space), the pose is set even if you only apply ControlNet to 20% of the pre-sampling step.

Conversely, changing the end ControlNet step has less effect because the global composition is set in the start step.

7.4 Control Mode

ControlNet v1.1 Definitive Guide [Stable Proliferation]
  • Balanced: The control net is suitable for both regulated and non-regulated sampling steps. This is the standard operating mode.
  • My tip is more important: ControlNet's effect gradually decreases in U-Net injection instances (13 in one sampling step). The end result is that your prompts have more influence than controlling the network.
  • ControlNet is more important: Turn off ControlNet when de-throttling. In fact, the CFG scale can also be used as a multiplier for the effect of the control net.

If you don't fully understand how they actually work, don't worry. Option labels accurately describe the effect.

7.5 Resize Mode

The resize mode controls what to do when the size of the input image or chart differs from the size of the image to be generated. If these options have the same aspect ratio, you don't need to worry about them.

I'll demonstrate the effect of resize mode by setting text to image to generate a landscape image, while the input image/control map is a portrait image.

  • Just resize: scale the width and height of the control map independently to fit the image canvas. This changes the aspect ratio of the control map.

The girl now needs to lean forward so she can still be inside the canvas. You can use this mode to create some interesting effects.

ControlNet v1.1 Definitive Guide [Stable Proliferation]
  • Crop and resize: Fits the image canvas to the control chart. Crop the control map so that it is the same size as the canvas.
  • Because the control map is clipped at the top and bottom, so are our girls.

    ControlNet v1.1 Definitive Guide [Stable Proliferation]
  • Resize and padding: Fits the entire control map to the image canvas. Extend the control map with null values so that it is the same size as the image canvas.
  • There is more space on the sides than in the original input image.

    ControlNet v1.1 Definitive Guide [Stable Proliferation]

    Now (hopefully) you know all the settings. Let's explore some ideas for using ControlNet.

    8. Multiple ControlNets

    You can use ControlNet multiple times to generate an image. Let's look at an example.

    Model: Protogen v2.2

    Prompt:

    An astronaut sitting, alien planet

    Negative prompts:

    disfigured, deformed, ugly

    This hint produces images with various compositions.

    ControlNet v1.1 Definitive Guide [Stable Proliferation]

    Let's say I want to independently control the composition of the astronaut and the background. To do this, we can use multiple (in this case, 2) control nets.

    I will use this reference image to fix the astronaut's pose.

    ControlNet v1.1 Definitive Guide [Stable Proliferation]

    Settings for ControlNet 0:

    • Enabled: Yes
    • Preprocessor: OpenPose
    • Model: control_xxxx_openpose
    • Resize mode: resize and refill (since my original reference image is portrait)
    ControlNet v1.1 Definitive Guide [Stable Proliferation]

    I will use the following reference image as a background.

    ControlNet v1.1 Definitive Guide [Stable Proliferation]

    The depth model is perfect for this purpose.

    ControlNet 1 的設定:

    • Enabled: Yes
    • Control weight: 0.45
    • Preprocessor: depth_zeo
    • Model: control_XXXX_depth
    • Resize mode: Crop and resize
    ControlNet v1.1 Definitive Guide [Stable Proliferation]

    Now I can independently control the composition of the subject and background:

    ControlNet v1.1 Definitive Guide [Stable Proliferation]

    Skill:

    • Adjust the ControlNet weights if one of them does not complete its work.
    • If you have a reference image of a different size for the final image, pay attention to the resize mode.

    9. Imitate human posture

    Perhaps the most common application of ControlNet is to replicate human poses. This is because it is often difficult to control the posture ... Until now! The input image can be either an image generated by a steady diffusion or an image taken from a real camera.

    To transmit human poses using ControlNet, follow the instructions to enable ControlNet in AUTOMATIC1111. Use the following settings.

    • Preprocessor: OpenPose
    • Model: control_...._openpose
    • Make sure Enable is selected.

    Here are some examples.

    9.1 Example 1: Copy pose from an image

    As a basic example, let's replicate the pose below of an image of a woman admiring leaves.

    ControlNet v1.1 Definitive Guide [Stable Proliferation]

    Using a variety of models and cues, you can significantly change things but maintain your posture.

    ControlNet v1.1 Definitive Guide [Stable Proliferation]

    9.2 Example 2: Remix a movie scene

    You can recast the iconic dance scene from Pulp Fiction into some of the yoga practices in the park.

    ControlNet v1.1 Definitive Guide [Stable Proliferation]

    This uses ControlNet with the DreamShaper model.

    ControlNet v1.1 Definitive Guide [Stable Proliferation]

    This is the same tip, but using an inkpunk diffusion model. You need to add the activation keyword nvinkpunk to the prompt:

    ControlNet v1.1 Definitive Guide [Stable Proliferation]

    10. Use ControlNet to stylize images

    Below is the v1.5 model, but various hints are implemented in different styles. ControlNet with various preprocessing is used. It's best to experiment and see which one works best.

    ControlNet v1.1 Definitive Guide [Stable Proliferation]

    You can also use models to style images. Below is generated using the cue "Beethoven's paintings" with Anythingv3, DreamShaper, and OpenJourney models.

    ControlNet v1.1 Definitive Guide [Stable Proliferation]

    11. Use magic poses to control postures

    Sometimes, you may not be able to find an image with the exact pose you need. Custom poses can be created using software tools such as magic poses.

    Step 1: Go to the Magic Poses website.

    ControlNet v1.1 Definitive Guide [Stable Proliferation]

    Step 2: Move the key points of the model to customize the pose.

    Step 3: Press Preview. Take a screenshot of the model. You should get an image like the following.

    ControlNet v1.1 Definitive Guide [Stable Proliferation]

    Step 4: Use the OpenPose ControlNet model. Select the model and prompt of your choice to generate the image.

    Here are some images generated using the 1.5 model and the DreamShaper model. In all cases, the poses are well copied.

    ControlNet v1.1 Definitive Guide [Stable Proliferation]

    12. Interior design ideas

    Linear detector MLSD models with stable diffusion control networks can be used to generate interior design ideas. Here are the ControlNet settings.

    • Preprocessor: MLSD
    • Model: MLSD

    Start with any interior design photos. Let's take the following one as an example.

    ControlNet v1.1 Definitive Guide [Stable Proliferation]

    Prompt:

    award winning living room

    Model: Stable Diffusion v1.5

    Here are some of the design ideas that emerged.

    ControlNet v1.1 Definitive Guide [Stable Proliferation]

    Alternatively, you can use a depth model. It will emphasize preserving depth information, not straight lines.

    • Preprocessor: Depth Midas
    • Model: Depth

    Generated image:

    ControlNet v1.1 Definitive Guide [Stable Proliferation]

    13. The difference between the stable diffusion depth model and ControlNet

    Stability AI, the creator of stable diffusion, has released a depth image model. It has many similarities with ControlNet, but there are important differences.

    Let's talk about the similarities first.

    • They are all stable diffusion models...
    • They both use two conditions (preprocessed image and text prompts).
    • They all use MiDAS to estimate depth maps.

    The differences are:

    • The image depth model is a v2 model. ControlNet can be used with any v1 or v2 model. This is important because the v2 model is notoriously difficult to use. It is difficult for people to produce a good image. The fact that ControlNet can use any v1 model opens up depth tuning not only to the v1.5 base model, but also conditioned reflexes to the thousands of special models released by the community.
    • ControlNet is more versatile. In addition to depth, it can be adjusted by edge detection, pose detection, etc.
    • ControlNet's depth map has a higher resolution than image depth.

    14.How does ControlNet work?

    This tutorial would not be complete without explaining how ControlNet works behind the scenes.

    ControlNet works by attaching trainable network modules to parts of the U-Net (noise predictor) of a stable diffusion model. The weights of the stable diffusion model are locked so that they remain constant during training. Only attached modules are modified during training.

    The model plot in the research paper sums this up well. Initially, the additional network modules all have zero weights, enabling new models to take advantage of both trained and locked models.

    ControlNet v1.1 Definitive Guide [Stable Proliferation]

    During training, each training image provides two conditions together: (1) a text prompt, and (2) a control map, such as an OpenPose key or Canny edge. ControlNet model learning generates images based on these two inputs.

    Each control method is trained independently.

    Original link: http://www.bimant.com/blog/controlnet-v11-ultimate-guide