Skip to content

Latest commit

 

History

History
117 lines (93 loc) · 6.05 KB

model_zoo.md

File metadata and controls

117 lines (93 loc) · 6.05 KB

🔥 1. We provide all the links of Sana pth and diffusers safetensor below

Model Reso pth link diffusers Precision Description
Sana-0.6B 512px Sana_600M_512px Efficient-Large-Model/Sana_600M_512px_diffusers fp16/fp32 Multi-Language
Sana-0.6B 1024px Sana_600M_1024px Efficient-Large-Model/Sana_600M_1024px_diffusers fp16/fp32 Multi-Language
Sana-1.6B 512px Sana_1600M_512px Efficient-Large-Model/Sana_1600M_512px_diffusers fp16/fp32 -
Sana-1.6B 512px Sana_1600M_512px_MultiLing Efficient-Large-Model/Sana_1600M_512px_MultiLing_diffusers fp16/fp32 Multi-Language
Sana-1.6B 1024px Sana_1600M_1024px Efficient-Large-Model/Sana_1600M_1024px_diffusers fp16/fp32 -
Sana-1.6B 1024px Sana_1600M_1024px_MultiLing Efficient-Large-Model/Sana_1600M_1024px_MultiLing_diffusers fp16/fp32 Multi-Language
Sana-1.6B 1024px Sana_1600M_1024px_BF16 Efficient-Large-Model/Sana_1600M_1024px_BF16_diffusers bf16/fp32 Multi-Language
Sana-1.6B 2Kpx Sana_1600M_2Kpx_BF16 Efficient-Large-Model/Sana_1600M_2Kpx_BF16_diffusers bf16/fp32 Multi-Language
Sana-1.6B 4Kpx Sana_1600M_4Kpx_BF16 Efficient-Large-Model/Sana_1600M_4Kpx_BF16_diffusers bf16/fp32 Multi-Language

❗ 2. Make sure to use correct precision(fp16/bf16/fp32) for training and inference.

We provide two samples to use fp16 and bf16 weights, respectively.

❗️Make sure to set variant and torch_dtype in diffusers pipelines to the desired precision.

1). For fp16 models

# run `pip install git+https://github.com/huggingface/diffusers` before use Sana in diffusers
import torch
from diffusers import SanaPipeline

pipe = SanaPipeline.from_pretrained(
    "Efficient-Large-Model/Sana_1600M_1024px_diffusers",
    variant="fp16",
    torch_dtype=torch.float16,
)
pipe.to("cuda")

pipe.vae.to(torch.bfloat16)
pipe.text_encoder.to(torch.bfloat16)

prompt = 'a cyberpunk cat with a neon sign that says "Sana"'
image = pipe(
    prompt=prompt,
    height=1024,
    width=1024,
    guidance_scale=5.0,
    num_inference_steps=20,
    generator=torch.Generator(device="cuda").manual_seed(42),
)[0]

image[0].save("sana.png")

2). For bf16 models

# run `pip install git+https://github.com/huggingface/diffusers` before use Sana in diffusers
import torch
from diffusers import SanaPAGPipeline

pipe = SanaPAGPipeline.from_pretrained(
  "Efficient-Large-Model/Sana_1600M_1024px_BF16_diffusers",
  variant="bf16",
  torch_dtype=torch.bfloat16,
  pag_applied_layers="transformer_blocks.8",
)
pipe.to("cuda")

pipe.text_encoder.to(torch.bfloat16)
pipe.vae.to(torch.bfloat16)

prompt = 'a cyberpunk cat with a neon sign that says "Sana"'
image = pipe(
    prompt=prompt,
    guidance_scale=5.0,
    pag_scale=2.0,
    num_inference_steps=20,
    generator=torch.Generator(device="cuda").manual_seed(42),
)[0]
image[0].save('sana.png')

2). For 4K models

4K models need patch_conv to avoid OOM issue.(80GB GPU is recommended)

run pip install patch_conv first, then

# run `pip install git+https://github.com/huggingface/diffusers` before use Sana in diffusers
import torch
from diffusers import SanaPipeline

pipe = SanaPipeline.from_pretrained(
    "Efficient-Large-Model/Sana_1600M_4Kpx_BF16_diffusers",
    variant="bf16",
    torch_dtype=torch.bfloat16,
)
pipe.to("cuda")

pipe.vae.to(torch.bfloat16)
pipe.text_encoder.to(torch.bfloat16)

# for 4096x4096 image generation OOM issue
if pipe.transformer.config.sample_size == 128:
    from patch_conv import convert_model
    pipe.vae = convert_model(pipe.vae, splits=32)

prompt = 'a cyberpunk cat with a neon sign that says "Sana"'
image = pipe(
    prompt=prompt,
    height=4096,
    width=4096,
    guidance_scale=5.0,
    num_inference_steps=20,
    generator=torch.Generator(device="cuda").manual_seed(42),
)[0]

image[0].save("sana_4K.png")