Running Stable Diffusion Locally

Contents

 

·      Intro

·      System Requirements

·      Software Requirements

·      General Instructions (Nvidia and Apple M-series GPUs)

·      AMD GPU Installation

·      Basic Tips

·      The Web UI

·      Writing Prompts

Intro

While there are many online AI image generators that utilize Stable Diffusion, DALL-E, or Midjourney, most have a limit on how many images you can generate per day or require a subscription. The free way with no limits on what you can generate is to run Stable Diffusion locally. It sounds complicated, but it’s easy with AUTOMATIC1111’s Web UI for SD.

This guide is based off of the official guide provided by AUTOMATIC1111 on their GitHub page and from u/ImTheArtCurator’s post on running the Web UI on Linux with AMD GPUs

 

System Requirements

This can run on:

·      Windows 10/11

·      Linux

o   Ubuntu (confirmed on 22.04 LTS)

o   Debian (I have not confirmed yet, but the creator states it will work on Debian 11)

o   Arch Linux

o   Fedora/RHEL/Rocky Linux (I have confirmed it will work on Fedora 39 and RHEL/Rocky Linux 9.3)

o   SUSE Linux Enterprise 15 SP 4/openSUSE Leap 15.4 and up

o   Possibly other distros if it’s based on one of the distros mentioned above.

·      macOS (confirmed to work on macOS 14 a.k.a Sonoma)

 

Your computer will also need a GPU for the best performance. While it can run on the CPU only, performance is not great, and it can take several minutes to render a 512 x 512 image. With the use of a GPU, the same 512 x 512 image can be generated in seconds. Note: most gaming PC’s and laptops will have either an Nvidia or AMD GPU.

 

The following GPUs can be used:

·      Nvidia RTX 30 or 40 series (Recommended)

o   GPUs with more VRAM (12 GB and up, 16 GB and up recommended) will provide better performance.

o   May still work with RTX 20 series and earlier (May not work that well with GTX 16 series and lower since they lack Tensor cores)

·      AMD Radeon RX 6000 and 7000 series

o   As with the Nvidia GPUs, more VRAM is better (16 GB and up recommended)

o   I’ve read that it can work with RX 5000 and earlier, but I cannot guarantee that it will perform well.

o   Since AMD’s ROCm software is not available in Windows, it’s better to use Linux if you have an AMD GPU

o   Best performance will be with RX 7000 series GPUs since they contain AMD’s AI accelerators

·      Intel ARC

o   See here for Intel CPU/GPU instructions

·      Apple Silicon Macs

o   Not recommended if your system has less than 16 GB of unified RAM

o   Despite all Apple Silicon-based Macs having the Neural Engine (Apple’s term for an AI accelerator/NPU), Stable Diffusion still only runs on the GPU currently.

o   I would personally recommend running this on a Mac with M1 or M2 Pro/Max/Ultra CPU or any variant of the M3.

o   It appears this will not work on Intel-based Macs

Back to Contents

 

Software Requirements

Software-wise, you will need Git and Python (specifically Python 3.10)

·      Windows: Download Git from here. Use the standalone version. Download Python from here.

o   Be sure to select the Add to PATH option when installing Python.

·      Linux: if Git and wget is not already installed, run the following in the terminal:

o   Note: run python –version or python -V to check your currently installed version of Python. If it’s anything other than 3.10.x, specify python3.10 when using Python or installing it, unless noted otherwise.

o   Debian-based distros: sudo apt update && sudo apt install wget git python3 python3-venv

o   RHEL-based distros: sudo dnf install wget git python3

§  You will need to build Python 3.10 from source if using any Enterprise Linux based distro. It can be installed via the package manager in Fedora.

o   Arch-based distros: sudo pacman -S wget git python3

o   SUSE (use python as written): sudo zipper in wget git python310

·      macOS: You will need to install Homebrew if you don’t have it already. You can find it here and using the .pkg installer is preferred.

o   Then run: brew install cmake protobuf rust python@3.10 git wget

Back to Contents

 

General Installation Instructions (Nvidia GPU & Apple Silicon)

·      Ensure the software requirements are met.

·      If you have an AMD GPU, please follow the directions in the AMD section.

·      Run the following command: git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui

·      Windows:

o   Go to the folder where Git pulled the files (Usually C:\Users\[your_ysername]\stable-diffusion-webui)

o   Run webui-user.bat

·      mcaOS and Linux (except Debian-based distros):

o   Run the following command: cd stable-diffusion-webui && ./webui.sh

·      macOS and Linux (Alternative Method: Directly create Python Virtual Environment)

o   cd stable-diffusion-webui

o   python -m venv venv (specify Python 3.10 if necessary)

o   source /venv/bin/activate

o   python -m pip install --upgrade pip wheel

o   python launch.py

§  Optional (Nvidia GPUs only): add the --xformers flag when launching for better performance

·      Now that you up and running, view the Basic tips, the layout of the UI, and some tips for writing prompts that helped me.

Back to Contents

 

AMD Installation Instructions

DirectML performance in Windows is better than using the CPU only, but doesn’t take full advantage of AMD GPUs. Using Linux is recommended for the best performance with an AMD GPU, which will take advantage of AMD’s ROCm API (similar to CUDA on Nvidia GPUs).

·      Windows

1.     Ensure the software requirements are met

2.     Copy and paste the following command into Command Prompt or PowerShell: git clone https://github.com/lshqqytiger/stable-diffusion-webui-directml && cd stable-diffusion-webui-directml && git submodule init && git submodule update

3.     Run .\webui.bat --use-directml

4.     Now that you up and running, view the Basic tips, the layout of the UI, and some tips for writing prompts that helped me.

·      Linux

1.     Ensure the software requirements are met

2.     Run the following command and reboot (Debian-based distros only): sudo apt install libnuma-dev libncurses5

3.     Add yourself to the render and video groups: sudo usermod -aG render ${USER} && sudo usermod -aG video ${USER}

4.     Go to AMD’s Linux Driver page and download the appropriate package for your distro

§  There is no official Debian package, but the Ubuntu packages should work just fine (I haven’t personally tested this since I have Ubuntu).

§  If you’re using Ubuntu 22.04, do not use the one that specifies it includes ROCm.

§  There is not an official driver package from AMD for Arch and Arch-based distros.  Follow the steps here to install ROCm for the best performance.

§  The RHEL 8 & 9 packages will not work on Fedora, since Fedora is updated more frequently than RHEL. Use the following instead:

·      sudo dnf install rocminfo rocm-opencl rocm-hip

·      Once installed, skip to Step 7.

·      If you’re using Fedora 38 and dnf cannot find rocm-hip, the testing repo may need to be enabled.

5.     Navigate to the directory where you downloaded the file and use your package manager to install

§  Debian: sudo apt install ./name_of_file

§  RHEL: sudo dnf install ./name_of_file

§  SUSE: sudo zipper in ./name_of_file

6.     After that completes, run the following command: sudo amdgpu-install --usecase=rocm,hip,mllib,graphics --no-dkms

§  Not including the –no-dkms flag will cause issues if your system if Secure Boot is enabled (possibly even if it’s not enabled too)

7.     Run this command to confirm the installation was successful, then reboot: sudo rocminfo | grep ‘Name’

§  This should provide the name of your GPU and CPU (if also using an AMD CPU)

§  If your GPU does not appear here, reboot and try again.

§  If it still does not appear, uninstall the drivers, and reinstall.

·      Fedora: sudo dnf remove rocminfo rocm-opencl rocm-hip

·      All other distros: sudo amdgpu-uninstall or sudo amdgpu-install --uninstall

8.     If you haven’t already, install Python, Git, python3-venv, and pip (see the software requirements)

§  Ubuntu: run python –version to make sure that the default isn’t Python 2. If it is, run sudo apt install python-is-python3

9.     Clone the Git repo with: git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui.git

10.  Run the following commands:

§  cd stable-diffusion-webui

§  python3.10 -m venv venv

§  source venv/bin/activate

·      Specifying Python3.10 is not needed beyond this point since you will be in the virtual environment (venv) creted by Python 3.10. You will need to specify the Python version once outside of the environment.

§  python -m pip install –-upgrade pip wheel

11.  Run the following command: pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm5.7

·      This may take a moment to download and install all of the necessary packages.

12.  Run the following command to ensure the ROCm versions of PyTorch are installed: pip list | grep ‘torch’

§  If the list does not include rocm in the name, run pip uninstall torch<version_number> and repeat step 10.

13.  If you have a RX 6800 or lesser GPU, run the following command: export HSA_OVERRIDE_GFX_VERSION=10.3.0

§  Not needed for the RX 6800 XT and above

14.  Run this to launch the Web UI: python launch.py --precision full --no-half

§  If your GPU has less than 12 GB of VRAM, add either the --medvram or --lowvram flag after --no-half

15.  Now that you up and running, view the Basic tips, the layout of the UI, and some tips for writing prompts that helped me.

Back to Contents

 

 

Basic Tips

·      The first launch may take a while. Subsequent launches will be faster.

·      If the UI doesn’t open automatically in your web browser, go to one of the following:

o   http://127.0.0.1:7860

o   http://localhost:7860

·      When launching again after using it for the first time, you can quickly access it by navigating to the directory where Git pulled all the files and run:

o   ./webui.sh -–any-needed-flags (Linux and macOS)

o   .\webui.bat --any-needed-flags via Command Prompt/PowerShell (Windows)

·      There are no Stable Diffusion models (called checkpoints) included in the Web UI installation. You can find checkpoints on HuggingFace and CivitAI.

o   Once you have downloaded a checkpoint, they go in the models > Stable-diffusion folder in the folder where the Web UI was installed.

o   These are typically .ckpt or .safetensors files

·      If you run into any errors dealing with running out of VRAM, try using the --medvram or --lowvram flag.

o   Or instead, you can select a Cross attention optimization option under Settings > Optimizations in the Web UI (sub-quadratic has worked well for me)

·      The --skip-torch-cuda-test flag may be needed if you receive an error stating that your GPU is not supported.          

o   This will have the image generation use your CPU instead of your GPU. It will work, but will be significantly slower.

o   If you’re using an AMD GPU and running Windows, make sure you use the --use-directml flag to avoid this issue.

·      When done, close the tab in your web browser, navigate back to your Terminal or PowerShell window, and press CTRL- C to terminate the program.

·      Using the --listen flag allows you to access the web UI on another machine on your local network. Once it’s running, navigate to http://remote_machine_ip:7860 or http://remote_hostname:7860

o   Depending on your machine’s firewall configuration, opening TCP fort 7860 may be necessary.

o   If you’re using extension for the web UI, also call the --enable-insecure-extension-access with the --listen flag.

o   Since HTTPS is not used, it is highly recommended to only use this on a secure local network and not on any public networks or over the internet.

Back to Contents

 

The Web UI

A screenshot of a computer

Description automatically generated

These key areas are the base of what you need to get started on generating images:

1.     Checkpoint List: Allows you to choose from the checkpoints found in the models > Stable-diffusion folder

2.     Prompt Box: This is where you type in what you want to generate.

3.     Negative Prompt Box: This is where you specify things you do not want in the generated image.

4.     Sampling Method: There are different algorithms that can be used when generating images. The default is usually fine unless the creator of a checkpoint specifies a different one.

5.     Sampling Steps: This adjusts how many iterations of an image it will use to create the final image. You may need to change this value if you’re not getting quite what you want (in addition to listing particular things in the negative prompt)

6.     Hires. Fix: This section has options for upscaling generated images to a higher resolution than the originally rendered image

a.     A screenshot of a computer

Description automatically generated

b.     Recommended settings when using Hires. Fix

                                               i.     Upscaler: ESRGAN_4x

                                             ii.     Upscale by: 2

                                            iii.     Hires steps: 0

                                            iv.     Denoising strength: 0.2 to 0.3

c.     It’s best to not use an upscale factor greater than 2 (though I’ve been able to use a factor of 2.5 with no issue, just takes 6-7 minutes to generate instead of 1-2 minutes with a factor of 2)

d.     I would recommend generating images at the default 512x512 to see if I like the generated image before trying to upscale since the 512x512 image will render faster.

                                               i.     Be sure to reuse the same Seed value (see below)

7.     Width and Height sliders: adjusts how large the image will be (before upscaling if Hires. Fix is checked).

a.     Stable Diffusion works best with 512 x 512 or 768 x 768. Unexpected results may happen if image is greater than 768 x 768 before upscaling.

8.     CFG Scale: This is a scale of how strictly the AI will follow the prompt.

a.     Unless a checkpoint creator specifies a different value, the default value of 7 is typically fine.

9.     Seed: A random number that is generated when rendering an image.

a.     Leave as -1 to always generate a new seed value.

b.     The recycle button will reuse the last generated seed value.

c.     The dice button will revert back to a randomly generated value (-1)

d.     If trying to replicate an image you found online and they provided the generation data, you can copy and paste the seed value here.

10.  LoRA: Short for Low-Rank Adaptation. You can think of these as mini-Stable Diffusion models trained on a single subject (such as the look of a particular character from a video game).

a.     LoRAs are also available on HuggingFace and CivitAI.

b.     Once downloaded, place them in the Lora folder in the model folder.

                                               i.     These are usually .safetenssors files like the checkpoints, but are significantly smaller in size (~100-200 MB instead of ~2-4 GB)

c.     To use a LoRA, go to the Lora tab and click on the LoRA you want to use. It will add it to the prompt and give it a weight of 1 (weights explained in the next section)

d.     Multiple LoRAs can be used at the same time.

11.  Image Pane: This is where your image will appear while it’s rendering and once it’s complete.

Back to Contents

 

Writing Prompts

·      It’s typically better to be specific with what you want

·      Different parts of your prompt need to be separated by commas. Small chunks is key

o   Don’t do: 25 year old man with blonde hair and goatee, wearing a pinstripe suit, standing on a busy city street

o   Do this instead: 25 year old man, blonde hair, goatee, wearing pinstripe suit, standing on a busy city street

·      Weights

o   Good for when the images that are generated don’t quite match what you want

o   Weights tell the model to focus more or less on a specified part of the prompt

o   Syntax- Use parenthesis around the portion you want to add weight to, a colon, and a numerical value

§  Values of 1 or greater: Focuses more on the specified part

§  Values less than 1: Less focus on the specified part

§  Not specifying a numerical value will use a weight of 1.1

§  If using a LoRA, it will be given a weight of 1 once selected. Usually, you want a lower weight. The creator will usually provide the ideal range for that specific LoRA.

o   Example:

§  25 year old man, blonde hair, goatee, (wearing pinstripe suit:1.2), standing on a busy city street

·      All of this also applies to the negative prompt

o   Example:

§  ugly, malformed, weird, (blurry), low quality, bad hands

·      Emojis can be used in both the prompt and negative prompt (not sure why you would want to, but you can)

Back to Contents