Skip to content

Latest commit

 

History

History
136 lines (110 loc) · 5.52 KB

README.md

File metadata and controls

136 lines (110 loc) · 5.52 KB

Semantic Image Inversion and Editing using Rectified Stochastic Differential Equations

ComfyUI Hugging Face GitHub

Rectified flows for image inversion and editing. Our approach efficiently inverts reference style images in (a) and (b) without requiring text descriptions of the images and applies desired edits based on new prompts (e.g. “a girl” or “a dwarf”). For a reference content image (e.g. a cat in (c) or a face in (d)), it performs semantic image editing e.g. “ sleeping cat”) and stylization (e.g. “a photo of a cat in origmai style”) based on prompts, without leaking unwanted content from the reference image (input images have orange borders).

teaser

🔥 Updates

  • [2024.12.23] RF-Inversion gradio demo, thanks Linoy!
  • [2024.12.17] RF-Inversion now supported in diffusers, thanks Linoy!
  • [2024.10.15] Code reimplemented by open-source ComfyUI community, thanks logtd!
  • [2024.10.14] Paper is published on arXiv!

🤗 Gradio Interface

We support a Gradio demo for better user experience: Web demonstration🔥

🚀 Diffusers Implementation

Try RF-Inversion using diffusers implementation! Load hyper Flux LoRA to enable 8 step inversion and editing🔥

Imports

import torch
from diffusers import FluxPipeline
import requests
import PIL
from io import BytesIO
import os
# torch.manual_seed(999)

Load RF-Inversion pipeline

pipe = FluxPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-dev",
    torch_dtype=torch.bfloat16,
    custom_pipeline="pipeline_flux_rf_inversion")
pipe.to("cuda")

Load image

def download_image(url):
    response = requests.get(url)
    return PIL.Image.open(BytesIO(response.content)).convert("RGB")

img_url = "https://www.aiml.informatik.tu-darmstadt.de/people/mbrack/tennis.jpg"
image = download_image(img_url)

Perform inversion

inverted_latents, image_latents, latent_image_ids = pipe.invert(
    image=image, 
    num_inversion_steps=28, 
    gamma=0.5
  )

Perform editing

edited_image = pipe(
    prompt="a tomato",
    inverted_latents=inverted_latents,
    image_latents=image_latents,
    latent_image_ids=latent_image_ids,
    start_timestep=0,
    stop_timestep=7/28,
    num_inference_steps=28,
    eta=0.9,    
  ).images[0]

Save result

save_dir = "./results/"
if not os.path.exists(save_dir):
    os.makedirs(save_dir)
image_save_path = os.path.join(save_dir, f"rf_inversion.png")
edited_image.save(image_save_path)
print('Results saved here: ', image_save_path)

🚀 Comfy User Interface

Try ComfyUI for better experience: ComfyUI Node🔥. Follow the guidelines below to setup locally.

Install ComfyUI to run flux

  1. cd ComfyUI

  2. python main.py

  1. cd ComfyUI/custom_nodes

  2. git clone https://github.com/ltdrdata/ComfyUI-Manager.git

  3. cd ..

  4. python main.py

Install RF-Inversion ComfyUI Node

  1. Click on "Manager"
  2. Install via Git URL: https://github.com/logtd/ComfyUI-Fluxtapoz
  3. If you see error, change security level in ComfyUI/custom_nodes/ComfyUI-Manager/config.ini from "normal" to "weak"
  4. cd ComfyUI

  5. python main.py

  6. Copy RF-Inversion workflow and paste on the ComfyUI window.
  7. Install missing custom nodes in Manager
  8. Click on "Queue Prompt" to see the result
  9. Tune hyper-parameters (such as eta, start_step, stop_step) to get the desired outcome

Citation

@article{rout2024rfinversion,
  title={Semantic Image Inversion and Editing using Rectified Stochastic Differential Equations},
  author={Litu Rout and Yujia Chen and Nataniel Ruiz and Constantine Caramanis and Sanjay Shakkottai and Wen-Sheng Chu},
  journal={arXiv preprint arXiv:2410.10792},
  year={2024}
}

Licenses

Copyright © 2024, Google LLC. All rights reserved.