Skip to content

Commit

Permalink
add teacache
Browse files Browse the repository at this point in the history
  • Loading branch information
SHYuanBest committed Dec 26, 2024
1 parent 3d7d2b8 commit e836727
Show file tree
Hide file tree
Showing 10 changed files with 364 additions and 9 deletions.
30 changes: 23 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,11 +27,6 @@
This repository is the official implementation of ConsisID, a tuning-free DiT-based controllable IPT2V model to keep human-identity consistent in the generated video. The approach draws inspiration from previous studies on frequency analysis of vision/diffusion transformers.
</div>






<br>

<details open><summary>💡 We also have other video generation projects that may interest you ✨. </summary><p>
Expand All @@ -54,7 +49,8 @@ This repository is the official implementation of ConsisID, a tuning-free DiT-ba
## 📣 News

* ⏳⏳⏳ Release the full code & datasets & weights.
* `[2024.12.24]` 🚀 We release the [parallel inference code](https://github.com/PKU-YuanGroup/ConsisID/tree/main/parallel_inference) for ConsisID powered by [xDiT](https://github.com/xdit-project/xDiT). Thanks [@feifeibear](https://github.com/feifeibear) for his help.
* `[2024.12.26]` 🚀 We release the [cache inference code](https://github.com/PKU-YuanGroup/ConsisID/tree/main/tools/cache_inference) for ConsisID powered by [TeaCache](https://github.com/LiewFeng/TeaCache). Thanks [@LiewFeng](https://github.com/LiewFeng) for his help.
* `[2024.12.24]` 🚀 We release the [parallel inference code](https://github.com/PKU-YuanGroup/ConsisID/tree/main/tools/parallel_inference) for ConsisID powered by [xDiT](https://github.com/xdit-project/xDiT). Thanks [@feifeibear](https://github.com/feifeibear) for his help.
* `[2024.12.22]` 🤗 ConsisID will be merged into [diffusers](https://github.com/huggingface/diffusers) in the next version. So for now, please use `pip install git+https://github.com/SHYuanBest/ConsisID_diffusers.git` to install diffusers dev version. And we have reorganized the code and weight configs, so it's better to update your local files if you have cloned them previously.
* `[2024.12.09]` 🔥We release the [test set](https://huggingface.co/datasets/BestWishYsh/ConsisID-preview-Data/tree/main/eval) and [metric calculation code](https://github.com/PKU-YuanGroup/ConsisID/tree/main/eval) used in the paper, now your can measure the metrics on your own machine. Please refer to [this guide](https://github.com/PKU-YuanGroup/ConsisID/tree/main/eval) for more details.
* `[2024.12.08]` 🔥The code for <u>data preprocessing</u> is out, which is used to obtain the [training data](https://huggingface.co/datasets/BestWishYsh/ConsisID-preview-Data) required by ConsisID. Please refer to [this guide](https://github.com/PKU-YuanGroup/ConsisID/tree/main/data_preprocess) for more details.
Expand Down Expand Up @@ -173,6 +169,25 @@ pipe.vae.enable_tiling()
```
warning: it will cost more time in inference and may also reduce the quality.
## 🚀 Parallel Inference on Multiple GPUs by xDiT
[xDiT](https://github.com/xdit-project/xDiT) is a Scalable Inference Engine for Diffusion Transformers (DiTs) on multi-GPU Clusters. It has successfully provided low-latency parallel inference solutions for a variety of DiTs models. For example, to generate a video with 6 GPUs, you can use the following command:
```
cd tools/parallel_inference
bash run.sh
# run_usp.sh
```
## 🚀 Cache Inference by TeaCache
[TeaCache](https://github.com/LiewFeng/TeaCache) is a training-free caching approach that estimates and leverages the fluctuating differences among model outputs across timesteps, thereby accelerate the inference. For example, you can use the following command:
```
cd tools/cache_inference
bash run.sh
```
## ⚙️ Requirements and Installation
We recommend the requirements as follows.
Expand Down Expand Up @@ -282,6 +297,7 @@ We found some plugins created by community developers. Thanks for their efforts:
- Windows Docker. [🤗Windows-ConsisID](https://huggingface.co/pkuhexianyi/ConsisID-Windows/tree/main) and [🟣Windows-ConsisID](https://www.wisemodel.cn/models/PkuHexianyi/ConsisID-Windows/file) (by [@shizi](https://www.bilibili.com/video/BV1v3iUY4EeQ/?vd_source=ae3f2652765c02e41cdd698b311989e3)).
- Diffusres. [Diffusers-ConsisID](https://github.com/huggingface/diffusers) (thanks [@arrow](https://github.com/a-r-r-o-w), [@yiyixuxu](https://github.com/yiyixuxu), [@hlky](https://github.com/hlky) and [@stevhliu](https://github.com/stevhliu) for their help).
- xDiT. [xDiT-ConsisID](https://github.com/xdit-project/xDiT) (thanks [@feifeibear](https://github.com/feifeibear) for his help).
- TeaCache. [TeaCache-ConsisID](https://github.com/LiewFeng/TeaCache) (thanks [@LiewFeng](https://github.com/LiewFeng) for his help).
If you find related work, please let us know.
Expand Down Expand Up @@ -327,4 +343,4 @@ If you find our paper and code useful in your research, please consider giving a
<a href="https://github.com/PKU-YuanGroup/ConsisID/graphs/contributors">
<img src="https://contrib.rocks/image?repo=PKU-YuanGroup/ConsisID&anon=true" />
</a>
</a>
54 changes: 54 additions & 0 deletions tools/cache_inference/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
<!-- ## **TeaCache4HunyuanVideo** -->
# TeaCache4ConsisID

[TeaCache](https://github.com/LiewFeng/TeaCache) can speedup [ConsisID](https://github.com/PKU-YuanGroup/ConsisID) 2x without much visual quality degradation, in a training-free manner.

## 📈 Inference Latency Comparisons on a Single H100 GPU

| ConsisID | TeaCache (0.1) | TeaCache (0.15) | TeaCache (0.20) |
| :------: | :------------: | :-------------: | :-------------: |
| ~110 s | ~70 s | ~53 s | ~41 s |


## Usage

Follow [ConsisID](https://github.com/PKU-YuanGroup/ConsisID) to clone the repo and finish the installation, then you can modify the `rel_l1_thresh` to obtain your desired trade-off between latency and visul quality, and change the `ckpts_path`, `prompt`, `image` to customize your identity-preserving video.

For single-gpu inference, you can use the following command:

```bash
python teacache_inference_consisid.py \
--rel_l1_thresh 0.1 \
--ckpts_path BestWishYsh/ConsisID-preview \
--image "https://github.com/PKU-YuanGroup/ConsisID/blob/main/asserts/example_images/2.png?raw=true" \
--prompt "The video captures a boy walking along a city street, filmed in black and white on a classic 35mm camera. His expression is thoughtful, his brow slightly furrowed as if he's lost in contemplation. The film grain adds a textured, timeless quality to the image, evoking a sense of nostalgia. Around him, the cityscape is filled with vintage buildings, cobblestone sidewalks, and softly blurred figures passing by, their outlines faint and indistinct. Streetlights cast a gentle glow, while shadows play across the boy\'s path, adding depth to the scene. The lighting highlights the boy\'s subtle smile, hinting at a fleeting moment of curiosity. The overall cinematic atmosphere, complete with classic film still aesthetics and dramatic contrasts, gives the scene an evocative and introspective feel." \
--seed 42 \
--num_infer_steps 50 \
--output_path ./teacache_results
```

To generate a video with 8 GPUs, you can use the following [here](https://github.com/PKU-YuanGroup/ConsisID/tree/main/tools).

## Resources

Learn more about ConsisID with the following resources.
- A [video](https://www.youtube.com/watch?v=PhlgC-bI5SQ) demonstrating ConsisID's main features.
- The research paper, [Identity-Preserving Text-to-Video Generation by Frequency Decomposition](https://hf.co/papers/2411.17440) for more details.

## Citation

If you find TeaCache is useful in your research or applications, please consider giving us a star 🌟 and citing it by the following BibTeX entry.

```
@article{liu2024timestep,
title={Timestep Embedding Tells: It's Time to Cache for Video Diffusion Model},
author={Liu, Feng and Zhang, Shiwei and Wang, Xiaofeng and Wei, Yujie and Qiu, Haonan and Zhao, Yuzhong and Zhang, Yingya and Ye, Qixiang and Wan, Fang},
journal={arXiv preprint arXiv:2411.19108},
year={2024}
}
```


## Acknowledgements

We would like to thank the contributors to the [ConsisID](https://github.com/PKU-YuanGroup/ConsisID).
8 changes: 8 additions & 0 deletions tools/cache_inference/run.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
python teacache_inference_consisid.py \
--rel_l1_thresh 0.1 \
--ckpts_path BestWishYsh/ConsisID-preview \
--image "https://github.com/PKU-YuanGroup/ConsisID/blob/main/asserts/example_images/2.png?raw=true" \
--prompt "The video captures a boy walking along a city street, filmed in black and white on a classic 35mm camera. His expression is thoughtful, his brow slightly furrowed as if he's lost in contemplation. The film grain adds a textured, timeless quality to the image, evoking a sense of nostalgia. Around him, the cityscape is filled with vintage buildings, cobblestone sidewalks, and softly blurred figures passing by, their outlines faint and indistinct. Streetlights cast a gentle glow, while shadows play across the boy\'s path, adding depth to the scene. The lighting highlights the boy\'s subtle smile, hinting at a fleeting moment of curiosity. The overall cinematic atmosphere, complete with classic film still aesthetics and dramatic contrasts, gives the scene an evocative and introspective feel." \
--seed 42 \
--num_infer_steps 50 \
--output_path ./teacache_results
Loading

0 comments on commit e836727

Please sign in to comment.