Getting Started With Stable Diffusion 3.5 On Python
So you want to follow the hype and generate some images with Stablility AI`s shiny new Stable Diffusion 3.5 model (SD 3.5). You find the model on Hugging Face, and hey, there is a code example to try it out. And it works!?
Absolutely not!
Inconveniently there are a lot more steps to take and considerations to make before your python script will generate an AI image, especially on consumer hardware with little VRAM. This article shows how to really do it, even on a laptop GPU with only 6GB of VRAM. As such it is an adapted collection of other material available on the web.
So, let’s get you your:
System Prerequisites
All examples in this article are tested using Endeavour OS on a Thinkpad X1 Extreme Laptop with Intel and Nvidia RTX 3060 Laptop hybrid graphics (6GB VRAM).
Make sure your packages are up to date and you have the Nvidia drivers and cuda installed:
yay -Syu |
Hugging Face Account and License Considerations
We assume you have a Hugging Face account and can log in via command line. If not so, work through the Huggingface Hub Quick Start Guide.
In this article we will use the Stability AI models Stable Diffusion 3.5 large and Stable Diffusion 3.5 medium. You need to visit both model pages and accept their respective license in order to use them locally.
Python and Cuda
In this section we explain how to set up your Python environment to be able to use the Cuda libraries.
Create the Python Environment and Install Dependencies
Most AI image generation apps use slightly older Python versions. Most common are 3.10 or 3.11. If your distro already provides one of these versions as standard, you should be good. If not, all examples in this article work with Pyton 3.13 on Endeavour OS.
We use a fresh virtualenv:
python3.13 -m venv .venv |
For the examples in this article we use the following requirements:
torch |
Install them in the virtual environment:
pip install -r requirements.txt |
Get Python Infos
The following Python script outputs some information about the installed software versions and the available hardware. The script is based on an answer from StackOverflow by texasdave. If it works and your GPU shows up the image generation script should be able to use your GPU through Python.
#!/usr/bin/env python |
Generate Images With Stable Diffusion 3.5 and Python
Now that we can use Cuda on Python we can start generating images. Generating images with Stable Diffusion requires a lot of RAM on the graphics card (VRAM). We will explore different setups for cards with fewer and fewer VRAM.
Run the original Example
This is the original image generation snippet from the huggingface model page. The example requires a GPU with more than 24GB of VRAM. If you have such a GPU, congratulations. Otherwise, consider the subsequent sections.
#!/usr/bin/env python |
Selecting a GPU
If you have multiple GPUs installed in your system or are using an external GPU case through Thunderbolt, you might have to select the external GPU before running the pipeline. The Python info script should give you the index of your external GPU. Then add the following code snippet before the pipeline run.
# Select GPU, use index from the info script above |
Running a Stable Diffusion 3.5 Variant With Less VRAM Requirements
If your GPU has less than 24GB of VRAM, the easiest solution to make it run is to switch to a smaller variant of Stable Diffusion 3.5. There is a medium sized model available, called"stabilityai/stable-diffusion-3.5-medium"
. Simply switch the from_pretrained
statement in the code snippet before to the new model:
pipe = StableDiffusion3Pipeline.from_pretrained("stabilityai/stable-diffusion-3.5-medium", torch_dtype=torch.bfloat16) |
Quantize Stable Diffusion 3.5 for Less VRAM Consumption Using Diffusers
If you have even less VRAM available, you can try quantizing the model before using it. Quantization converts the model weights into a lower precision format to save VRAM and optimize performance. The model page already provides an example quantizing with NF4 and diffusers:
#!/usr/bin/env python |
Quantize Stable Diffusion 3.5 with Quanto for Even Less VRAM Consumption
But what if you have even less VRAM available, say a laptop GPU with only 6GB of VRAM? Well, the above approach will need too much memory for your card. But we can use quanto
to convert the model weights into an even lower precision. Following the guide of Paul and Carvoysier on quantization with quanto we quantize the transformer and third text encoder of SD3.5. The lowest usable quantization is qint4. Using qint2 only generates incoherent images.
#!/usr/bin/env python |
The above approach uses just under 5.9GB of VRAM during inference. Loading the model into VRAM takes a long time and inference itself is also quite slow at 21 seconds per iteration. Using the medium model, the script uses 2.8G of VRAM during inference and takes around 5 seconds per iteration. Unfortunately the pipeline also takes a long time to load.
There we go, Stable Diffusion 3.5 large on a RTX 3060 Laptop GPU.
Troubleshooting
Most likely you will face some issues setting up your image generation. In the following sections we present some tools you can use to get to the bottom of your problems as well as a Thunderbolt specific workaround.
Information From Nvidia Tooling
The nvidia driver package comes with some tools, most notably nvidia-smi
which can show you information about the GPUs detected by the system.
# List all GPUs using the Nvidia tools |
System Information
If you don´t get infos or error messages using nvidia-smi
, or are unsure wether your GPU is detected by the system, also consider the following commands to gather more information about what went wrong:
# Check firmware for first GPU |
Additional Tools
The additional tool nvtop
shows GPU processor and memory usage.
yay -S nvtop |
Alternatively, use a watch
on nvidia-smi
:
watch -d -n 0.5 nvidia-smi |
Thunderbolt Power Management Issues
When using external GPUs vie Thunderbolt, you might experience a behaviour where the external GPU disappears after a few seconds or minutes of working correctly. A subsequent reboot fixes the problem, but the GPU will disappear again after a while. The reason could be a problem with the PCIe power management over Thunderbolt. Check the journalctl
for log messages like these:
Jan 26 16:03:09 MrMoon kernel: pcieport 0000:00:1d.0: AER: Correctable error message received from 0000:58:01.0 |
And dmesg
might show something like this:
Jan 26 16:20:08 MrMoon kernel: pcieport 0000:20:00.0: AER: PCIe Bus Error: severity=Uncorrectable (Fatal), type=Inaccessible, (Unregistered Agent ID) |
Adding pcie_aspm=off
to the kernel boot parameters fixed the problem for us. This setting might affect power draw negatively. But when the use case is generating images, the difference in power draw should be negligible.
Conclusion
The code snippets on Hugging Face really could use more explanation. We hope you now have a working setup. Have fun generating images!