Diffusers documentation

Getting Started: VAE Encode with Hybrid Inference

Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

Getting Started: VAE Encode with Hybrid Inference

VAE encode is used for training, image-to-image and image-to-video - turning into images or videos into latent representations.

Memory

These tables demonstrate the VRAM requirements for VAE encode with SD v1 and SD XL on different GPUs.

For the majority of these GPUs the memory usage % dictates other models (text encoders, UNet/Transformer) must be offloaded, or tiled encoding has to be used which increases time taken and impacts quality.

SD v1.5
GPUResolutionTime (seconds)Memory (%)Tiled Time (secs)Tiled Memory (%)
NVIDIA GeForce RTX 4090512x5120.0153.519010.0153.51901
NVIDIA GeForce RTX 4090256x2560.0041.31540.0051.3154
NVIDIA GeForce RTX 40902048x20480.40247.18520.4963.51901
NVIDIA GeForce RTX 40901024x10240.07812.26580.0943.51901
NVIDIA GeForce RTX 4080 SUPER512x5120.0235.301050.0235.30105
NVIDIA GeForce RTX 4080 SUPER256x2560.0061.981520.0061.98152
NVIDIA GeForce RTX 4080 SUPER2048x20480.57471.080.6565.30105
NVIDIA GeForce RTX 4080 SUPER1024x10240.11118.47720.145.30105
NVIDIA GeForce RTX 3090512x5120.0323.527820.0323.52782
NVIDIA GeForce RTX 3090256x2560.011.318690.0091.31869
NVIDIA GeForce RTX 30902048x20480.74247.30330.9543.52782
NVIDIA GeForce RTX 30901024x10240.13612.29650.2073.52782
NVIDIA GeForce RTX 3080512x5120.0368.517610.0368.51761
NVIDIA GeForce RTX 3080256x2560.013.183870.013.18387
NVIDIA GeForce RTX 30802048x20480.86386.74241.1918.51761
NVIDIA GeForce RTX 30801024x10240.15729.68880.2278.51761
NVIDIA GeForce RTX 3070512x5120.05110.69410.05110.6941
NVIDIA GeForce RTX 3070256x2560.0153.997430.0153.99743
NVIDIA GeForce RTX 30702048x20481.21796.0541.48210.6941
NVIDIA GeForce RTX 30701024x10240.22337.27510.32710.6941
SDXL
GPUResolutionTime (seconds)Memory Consumed (%)Tiled Time (seconds)Tiled Memory (%)
NVIDIA GeForce RTX 4090512x5120.0294.957070.0294.95707
NVIDIA GeForce RTX 4090256x2560.0072.296660.0072.29666
NVIDIA GeForce RTX 40902048x20480.87366.34520.86315.5649
NVIDIA GeForce RTX 40901024x10240.14215.54790.14315.5479
NVIDIA GeForce RTX 4080 SUPER512x5120.0447.467350.0447.46735
NVIDIA GeForce RTX 4080 SUPER256x2560.013.45970.013.4597
NVIDIA GeForce RTX 4080 SUPER2048x20481.31787.16151.29123.447
NVIDIA GeForce RTX 4080 SUPER1024x10240.21323.42150.21423.4215
NVIDIA GeForce RTX 3090512x5120.0585.656380.0585.65638
NVIDIA GeForce RTX 3090256x2560.0162.450810.0162.45081
NVIDIA GeForce RTX 30902048x20481.75577.82391.61418.4193
NVIDIA GeForce RTX 30901024x10240.26518.40230.26518.4023
NVIDIA GeForce RTX 3080512x5120.06413.65680.06413.6568
NVIDIA GeForce RTX 3080256x2560.0185.917280.0185.91728
NVIDIA GeForce RTX 30802048x2048OOMOOM1.86644.4717
NVIDIA GeForce RTX 30801024x10240.30244.43080.30244.4308
NVIDIA GeForce RTX 3070512x5120.09317.14650.09317.1465
NVIDIA GeForce RTX 3070256x2560.0257.429310.0267.42931
NVIDIA GeForce RTX 30702048x2048OOMOOM2.67455.8355
NVIDIA GeForce RTX 30701024x10240.44355.78410.44355.7841

Available VAEs

EndpointModel
Stable Diffusion v1https://qc6479g0aac6qwy9.us-east-1.aws.endpoints.huggingface.cloudstabilityai/sd-vae-ft-mse
Stable Diffusion XLhttps://xjqqhmyn62rog84g.us-east-1.aws.endpoints.huggingface.cloudmadebyollin/sdxl-vae-fp16-fix
Fluxhttps://ptccx55jz97f9zgo.us-east-1.aws.endpoints.huggingface.cloudblack-forest-labs/FLUX.1-schnell

Model support can be requested here.

Code

Install diffusers from main to run the code: pip install git+https://github.com/huggingface/diffusers@main

A helper method simplifies interacting with Hybrid Inference.

from diffusers.utils.remote_utils import remote_encode

Basic example

Let’s encode an image, then decode it to demonstrate.

Code
from diffusers.utils import load_image from diffusers.utils.remote_utils import remote_decode image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/astronaut.jpg?download=true") latent = remote_encode( endpoint="https://ptccx55jz97f9zgo.us-east-1.aws.endpoints.huggingface.cloud/", scaling_factor=0.3611, shift_factor=0.1159, ) decoded = remote_decode( endpoint="https://whhx50ex1aryqvw6.us-east-1.aws.endpoints.huggingface.cloud/", tensor=latent, scaling_factor=0.3611, shift_factor=0.1159, )

Generation

Now let’s look at a generation example, we’ll encode the image, generate then remotely decode too!

Code
import torch from diffusers import StableDiffusionImg2ImgPipeline from diffusers.utils import load_image from diffusers.utils.remote_utils import remote_decode, remote_encode pipe = StableDiffusionImg2ImgPipeline.from_pretrained( "stable-diffusion-v1-5/stable-diffusion-v1-5", torch_dtype=torch.float16, variant="fp16", vae=None, ).to("cuda") init_image = load_image( "https://raw.githubusercontent.com/CompVis/stable-diffusion/main/assets/stable-samples/img2img/sketch-mountains-input.jpg" ) init_image = init_image.resize((768, 512)) init_latent = remote_encode( endpoint="https://qc6479g0aac6qwy9.us-east-1.aws.endpoints.huggingface.cloud/", image=init_image, scaling_factor=0.18215, ) prompt = "A fantasy landscape, trending on artstation" latent = pipe( prompt=prompt, image=init_latent, strength=0.75, output_type="latent", ).images image = remote_decode( endpoint="https://q1bj3bpq6kzilnsu.us-east-1.aws.endpoints.huggingface.cloud/", tensor=latent, scaling_factor=0.18215, ) image.save("fantasy_landscape.jpg")

Integrations

<>Update on GitHub

close