Skip to content

Commit

Permalink
Update train-optimize-sd-intel.md to improve the results table (#1148)
Browse files Browse the repository at this point in the history
* Update train-optimize-sd-intel.md to improve the results table

* Update train-optimize-sd-intel.md

Co-authored-by: Bertrand CHEVRIER <[email protected]>

---------

Co-authored-by: Bertrand CHEVRIER <[email protected]>
  • Loading branch information
sayakpaul and krampstudio authored May 26, 2023
1 parent c7ce2a5 commit bdeca76
Showing 1 changed file with 21 additions and 7 deletions.
28 changes: 21 additions & 7 deletions train-optimize-sd-intel.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,12 +53,26 @@ We modified the Token Merging method to be compliant with OpenVINO and stacked i

The resultant model is highly beneficial when running inference on devices with limited computational resources, such as client or edge CPUs. As it was mentioned, stacking Token Merging with quantization leads to an additional reduction in the inference latency.

| **Image** | **Setting** | **Inference <br>Speed** | **Memory <br>Footprint** |
|:---:|:---:|:---:|:---:|
| ![](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/train-optimize-sd-intel/image_torch.png) | PyTorch FP32 | 230.5 seconds | 3.44 GB |
| ![](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/train-optimize-sd-intel/image_fp32.png) | OpenVINO FP32 | 120 seconds <br>**(1.9x)** | 3.44 GB |
| ![](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/train-optimize-sd-intel/image_quantized.png) | OpenVINO 8-bit | 59 seconds<br>**(3.9x)** | 0.86 GB<br>**(0.25x)** |
| ![](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/train-optimize-sd-intel/image_tome_quantized.png) | OpenVINO ToMe + 8-bit | 44.6 seconds<br>**(5.1x)** | 0.86 GB<br>**(0.25x)** |
<div class="flex flex-row">
<div class="grid grid-cols-2 gap-4">
<figure>
<img class="max-w-full rounded-xl border-2 border-solid border-gray-600" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/train-optimize-sd-intel/image_torch.png" alt="Image 1" />
<figcaption class="mt-2 text-center text-sm text-gray-500">PyTorch FP32, Inference Speed: 230.5 seconds, Memory Footprint: 3.44 GB</figcaption>
</figure>
<figure>
<img class="max-w-full rounded-xl border-2 border-solid border-gray-600" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/train-optimize-sd-intel/image_fp32.png" alt="Image 2" />
<figcaption class="mt-2 text-center text-sm text-gray-500">OpenVINO FP32, Inference Speed: 120 seconds (<b>1.9x</b>), Memory Footprint: 3.44 GB</figcaption>
</figure>
<figure>
<img class="max-w-full rounded-xl border-2 border-solid border-gray-600" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/train-optimize-sd-intel/image_quantized.png" alt="Image 3" />
<figcaption class="mt-2 text-center text-sm text-gray-500">OpenVINO 8-bit, Inference Speed: 59 seconds (<b>3.9x</b>), Memory Footprint: 0.86 GB (<b>0.25x</b>)</figcaption>
</figure>
<figure>
<img class="max-w-full rounded-xl border-2 border-solid border-gray-600" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/train-optimize-sd-intel/image_tome_quantized.png" alt="Image 4" />
<figcaption class="mt-2 text-center text-sm text-gray-500">ToMe + OpenVINO 8-bit, Inference Speed: 44.6 seconds (<b>5.1x</b>), Memory Footprint: 0.86 GB (<b>0.25x</b>)</figcaption>
</figure>
</div>
</div>

Results of image generation [demo](https://huggingface.co/spaces/AlexKoff88/stable_diffusion) using different optimized models. Input prompt is “cartoon bird”, seed is 42. The models are with OpenVINO 2022.3 in [Hugging Face Spaces](https://huggingface.co/spaces/AlexKoff88/stable_diffusion) using a “CPU upgrade” instance which utilizes 3rd Generation Intel® Xeon® Scalable Processors with Intel® Deep Learning Boost technology.

Expand Down Expand Up @@ -92,4 +106,4 @@ You can find the training and quantization [code](https://github.com/huggingface

As we showed with the Pokemon image generation task, it is possible to achieve a high level of optimization of the Stable Diffusion pipeline when using a relatively small amount of training resources. At the same time, it is well-known that training a general-purpose Stable Diffusion model is an [expensive task](https://www.mosaicml.com/blog/training-stable-diffusion-from-scratch-part-2). However, with enough budget and HW resources, it is possible to optimize the general-purpose model using the described approach and tune it to produce high-quality images. The only caveat we have is related to the token merging method that reduces the model capacity substantially. The rule of thumb here is the more complicated the dataset you have for the training, the less merging ratio you should use during the optimization.

If you enjoyed reading this post, you might also be interested in checking out [this post](https://huggingface.co/blog/stable-diffusion-inference-intel) that discusses other complementary approaches to optimize the performance of Stable Diffusion on 4th generation Intel Xeon CPUs.
If you enjoyed reading this post, you might also be interested in checking out [this post](https://huggingface.co/blog/stable-diffusion-inference-intel) that discusses other complementary approaches to optimize the performance of Stable Diffusion on 4th generation Intel Xeon CPUs.

0 comments on commit bdeca76

Please sign in to comment.