Skip to content

Latest commit

 

History

History
118 lines (84 loc) · 5.11 KB

textual_inversion_inference.md

File metadata and controls

118 lines (84 loc) · 5.11 KB

Textual inversion

[[open-in-colab]]

The [StableDiffusionPipeline] supports textual inversion, a technique that enables a model like Stable Diffusion to learn a new concept from just a few sample images. This gives you more control over the generated images and allows you to tailor the model towards specific concepts. You can get started quickly with a collection of community created concepts in the Stable Diffusion Conceptualizer.

This guide will show you how to run inference with textual inversion using a pre-learned concept from the Stable Diffusion Conceptualizer. If you're interested in teaching a model new concepts with textual inversion, take a look at the Textual Inversion training guide.

Import the necessary libraries:

importtorchfromdiffusersimportStableDiffusionPipelinefromdiffusers.utilsimportmake_image_grid

Stable Diffusion 1 and 2

Pick a Stable Diffusion checkpoint and a pre-learned concept from the Stable Diffusion Conceptualizer:

pretrained_model_name_or_path="stable-diffusion-v1-5/stable-diffusion-v1-5"repo_id_embeds="sd-concepts-library/cat-toy"

Now you can load a pipeline, and pass the pre-learned concept to it:

pipeline=StableDiffusionPipeline.from_pretrained( pretrained_model_name_or_path, torch_dtype=torch.float16, use_safetensors=True ).to("cuda") pipeline.load_textual_inversion(repo_id_embeds)

Create a prompt with the pre-learned concept by using the special placeholder token <cat-toy>, and choose the number of samples and rows of images you'd like to generate:

prompt="a grafitti in a favela wall with a <cat-toy> on it"num_samples_per_row=2num_rows=2

Then run the pipeline (feel free to adjust the parameters like num_inference_steps and guidance_scale to see how they affect image quality), save the generated images and visualize them with the helper function you created at the beginning:

all_images= [] for_inrange(num_rows): images=pipeline(prompt, num_images_per_prompt=num_samples_per_row, num_inference_steps=50, guidance_scale=7.5).imagesall_images.extend(images) grid=make_image_grid(all_images, num_rows, num_samples_per_row) grid

Stable Diffusion XL

Stable Diffusion XL (SDXL) can also use textual inversion vectors for inference. In contrast to Stable Diffusion 1 and 2, SDXL has two text encoders so you'll need two textual inversion embeddings - one for each text encoder model.

Let's download the SDXL textual inversion embeddings and have a closer look at it's structure:

fromhuggingface_hubimporthf_hub_downloadfromsafetensors.torchimportload_filefile=hf_hub_download("dn118/unaestheticXL", filename="unaestheticXLv31.safetensors") state_dict=load_file(file) state_dict
{'clip_g': tensor([[ 0.0077, -0.0112, 0.0065, ..., 0.0195, 0.0159, 0.0275], ..., [-0.0170, 0.0213, 0.0143, ..., -0.0302, -0.0240, -0.0362]], 'clip_l': tensor([[ 0.0023, 0.0192, 0.0213, ..., -0.0385, 0.0048, -0.0011], ..., [ 0.0475, -0.0508, -0.0145, ..., 0.0070, -0.0089, -0.0163]], 

There are two tensors, "clip_g" and "clip_l". "clip_g" corresponds to the bigger text encoder in SDXL and refers to pipe.text_encoder_2 and "clip_l" refers to pipe.text_encoder.

Now you can load each tensor separately by passing them along with the correct text encoder and tokenizer to [~loaders.TextualInversionLoaderMixin.load_textual_inversion]:

fromdiffusersimportAutoPipelineForText2Imageimporttorchpipe=AutoPipelineForText2Image.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", variant="fp16", torch_dtype=torch.float16) pipe.to("cuda") pipe.load_textual_inversion(state_dict["clip_g"], token="unaestheticXLv31", text_encoder=pipe.text_encoder_2, tokenizer=pipe.tokenizer_2) pipe.load_textual_inversion(state_dict["clip_l"], token="unaestheticXLv31", text_encoder=pipe.text_encoder, tokenizer=pipe.tokenizer) # the embedding should be used as a negative embedding, so we pass it as a negative promptgenerator=torch.Generator().manual_seed(33) image=pipe("a woman standing in front of a mountain", negative_prompt="unaestheticXLv31", generator=generator).images[0] image
close