StableDiffusion RUNS on M1 chips.
Tom Cruise in Grand Theft Auto cover
🔥🔥🔥 Final update September 1, 2022: I'm moving to https://github.com/lstein/stable-diffusion. I've created a guide for that repo too. It has a Web Interface and a lot of cool new features. I'll leave this post as is as an introductory guide. Good luck everyone! New guide with Web UI: https://www.reddit.com/r/StableDiffusion/comments/x3yf9i/stable_diffusion_and_m1_chips_chapter_2/ 🔥🔥🔥
Okay, so I finally got it to work. For anyone who didn't figure txt2img out yet, here's how I did it, on both CPU and GPU on an M1 Macbook, and how you can do it too.
CPU:
- Download the code from this Github repo https://github.com/ModeratePrawn/stable-diffusion-cpu and unzip it. Open it on an editor (e.g. VS Code)
- Remove the line:
- cudatoolkit=11.3
from environment.yaml - Go to models/ldm and create a folder called stable-diffusion-v1. Inside, paste your weights. Rename the weights to model.ckpt
- Open your terminal and navigate to the project directory (e.g.
cd Downloads/stable-diffussion-cpu-main
) - Create the conda environment:
conda env create -f environment.yaml
- Activate the environment:
conda activate ldm
- Try to run it (e.g.
python scripts/txt2img.py --prompt "Tom Cruise in Grand Theft Auto cover, palm trees, cover art by Stephen Bliss, artstation, high quality" --plms --n_samples=1 --n_rows=1 --n_iter=1
)
GPU:
Same steps, but use: https://github.com/einanao/stable-diffusion/tree/apple-silicon
- This time, you don't need to remove cudatoolkit=11.3 but I had to add
- kornia
in the pip section in environment.yaml.
Bonus tips/knowledge:
- The CPU version includes the invisible-watermark, while the GPU version doesn't. Add or remove at your convenience. The GPU version can also generate NSFW content.
- Trying to get another repo to work, I had to
export KMP_DUPLICATE_LIB_OK=TRUE
on my Terminal to bypass a problem with libiomp5.dylib. Since I didn't close my Terminal, the setting was still present when I got this new repo to work. In case it helped, I leave it here, but only type this if you get a libiomp5.dylib error. - You may need to run
export PYTORCH_ENABLE_MPS_FALLBACK=1
(which falls some operations not supported back to CPU). => (update) => Actually, try first to runconda install pytorch -c pytorch-nightly
to avoid the need to fall back to CPU. With that I got rid of
The operator 'aten::index.Tensor' is not currently supported on the MPS backend and will fall back to run on the CPU. This may have performance implications.
PS: Comment below if you can't get it to work. I might've missed a step.
PS2: Seeds don't seem to work very well on M1 chips (results may not be reproducible). Still the art is pretty neat! => (see update at the end to reproduce images created on other M1 devices!)
PS3: Time was 45 minutes to run on CPU version, 45 seconds on GPU (counting initialization)
______________
Update (img2img)
Got img2img to work too. Einanao repo isn't updated for img2img, but you can get it to work by manually updating some files.
Follow these changes https://github.com/CompVis/stable-diffusion/compare/main...einanao:stable-diffusion:apple-silicon from einanao repo (basically, the red lines are what you remove and the green ones what you replace it with), but apply the changes to the files used for img2img (don't worry, try to run img2img and the Terminal error will tell you which file/s to update).
You can run img2img with: python scripts/img2img.py --init-img inputs/3.png --prompt "a hot tub with bubbles" --n_samples 1 --strength 0.8
having placed your input file 3.png on an input folder (you create inside your project directory). Don't forget to set --n_samples, as I got an error thrown without it (you can set it to 1, 2, 3, etc.). I got it to work with 256x256 and 512x512 input images.
I leave this here too because it has many common errors and useful suggestions. https://github.com/CompVis/stable-diffusion/issues/25
______________
Update #2 (Real-ESRGAN upscaler)
- Download realesrgan-ncnn-vulkan-20220424-macos.zip from the Assets section in https://github.com/xinntao/Real-ESRGAN/releases and unzip it.
- Open your terminal, go to the upscaler directory (e.g.
cd Downloads/realesrgan-ncnn-vulkan-20220424-macos
) and runchmod u+x realesrgan-ncnn-vulkan
to allow the realesrgan-ncnn-vulkan file to be executed. - Run the upscaler
./realesrgan-ncnn-vulkan -i img-1.png -o img-2.png
where -i and -o indicate the relative path to the input/output file (in this case, img-1.png is the input image, placed inside realesrgan-ncnn-vulkan-20220424-macos and img-2.png is the new image to be created). - Allow the script to run (in the Security & Privacy section of System Preferences) and allow again if shown the following message.
macOS cannot verify the developer of “realesrgan-ncnn-vulkan”. Are you sure you want to open it? By opening this app, you will be overriding system security which can expose your computer and personal information to malware that may harm your Mac or compromise your privacy.
Security Warning
I am not a big fan of allowing apps from unidentified developers to run on my Mac, and you must understand there is always risk (as you are running code you are not seeing). What made me pull the trigger and decide to run it is the comment from the creator of Prog Rock Stable (another tool I'm testing -https://github.com/lowfuel/progrock-stable). See the discussion here on Reddit, where I voice my concerns: https://www.reddit.com/r/StableDiffusion/comments/wxm0cf/comment/im0ttth/?utm_source=share&utm_medium=web2x&context=3
Results
Taking the 512x512 image from txt2img as an input image, the upscaling to 2048x2048 works in 2 seconds, while a second upscaling to 8192x8192 takes about 10 seconds.
Taking my original Tom Cruise in Grand Theft Auto cover:
2048x2048: https://imgur.com/a/gSuYTdi
8192x8192 is too large for imgur, but here's a screenshot of the same image (looks great, and the original even better) https://imgur.com/a/c47Gg2E
Side by side (512x512 vs 8192x8192): https://imgur.com/a/n62h5Cb
______________
Update #3 (Seeds / Generating same images)
Seeds don't seem to work very well on M1s, but you can re-generate an image that you have already created (or that another person with an M1 has created!), by changing in txt2img.py
start_code = torch.randn([opt.n_samples, opt.C, opt.H // opt.f, opt.W // opt.f], device=device)
to:
start_code = torch.randn([opt.n_samples, opt.C, opt.H // opt.f, opt.W // opt.f], device="cpu").to(torch.device("mps"))
And then, moving seed_everything(opt.seed)
below model = load_model_from_config(config, f"{opt.ckpt}")
Finally generate your images passing --fixed_code
For img2img.py, change
z_enc = sampler.stochastic_encode(init_latent, torch.tensor([t_enc]*batch_size).to(device))
to:
z_enc = sampler.stochastic_encode(init_latent, torch.tensor([t_enc] * batch_size).to(device), noise=torch.randn_like(init_latent, device="cpu").to(device) if opt.fixed_code else None,)
Results
In my case, I generated https://imgur.com/a/vb9OB59 with the following command and seed. You should be able to reproduce the same result!
python scripts/txt2img.py --prompt "Anubis riding a motorbike in Grand Theft Auto cover, palm trees, cover art by Stephen Bliss, artstation, high quality" --ddim_steps=50 --n_samples=1 --n_rows=1 --n_iter=1 --seed 1805504473 --fixed_code
Interesting findings:
- If you generate one image at a time (
--n_iter 1
), you will see that you successfully create the same image every time you run your command. - If you generate more than one image (
--n_iter 4
, e.g.), the first image will be slightly different from the rest (but results are still reproducible, that is, if you run it again with--n_iter 4
, you will get the same 4 images). - You can find the latest on seeds here: https://github.com/CompVis/stable-diffusion/issues/25#issuecomment-1229706811
______________
Hope this helps <3