From bdeca762992c807854a64f042398ca87bd2edd11 Mon Sep 17 00:00:00 2001 From: Sayak Paul Date: Fri, 26 May 2023 16:44:04 +0530 Subject: [PATCH] Update train-optimize-sd-intel.md to improve the results table (#1148) * Update train-optimize-sd-intel.md to improve the results table * Update train-optimize-sd-intel.md Co-authored-by: Bertrand CHEVRIER --------- Co-authored-by: Bertrand CHEVRIER --- train-optimize-sd-intel.md | 28 +++++++++++++++++++++------- 1 file changed, 21 insertions(+), 7 deletions(-) diff --git a/train-optimize-sd-intel.md b/train-optimize-sd-intel.md index acaca6e067..96c418781c 100644 --- a/train-optimize-sd-intel.md +++ b/train-optimize-sd-intel.md @@ -53,12 +53,26 @@ We modified the Token Merging method to be compliant with OpenVINO and stacked i The resultant model is highly beneficial when running inference on devices with limited computational resources, such as client or edge CPUs. As it was mentioned, stacking Token Merging with quantization leads to an additional reduction in the inference latency. -| **Image** | **Setting** | **Inference
Speed** | **Memory
Footprint** | -|:---:|:---:|:---:|:---:| -| ![](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/train-optimize-sd-intel/image_torch.png) | PyTorch FP32 | 230.5 seconds | 3.44 GB | -| ![](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/train-optimize-sd-intel/image_fp32.png) | OpenVINO FP32 | 120 seconds
**(1.9x)** | 3.44 GB | -| ![](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/train-optimize-sd-intel/image_quantized.png) | OpenVINO 8-bit | 59 seconds
**(3.9x)** | 0.86 GB
**(0.25x)** | -| ![](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/train-optimize-sd-intel/image_tome_quantized.png) | OpenVINO ToMe + 8-bit | 44.6 seconds
**(5.1x)** | 0.86 GB
**(0.25x)** | +
+
+
+Image 1 +
PyTorch FP32, Inference Speed: 230.5 seconds, Memory Footprint: 3.44 GB
+
+
+Image 2 +
OpenVINO FP32, Inference Speed: 120 seconds (1.9x), Memory Footprint: 3.44 GB
+
+
+Image 3 +
OpenVINO 8-bit, Inference Speed: 59 seconds (3.9x), Memory Footprint: 0.86 GB (0.25x)
+
+
+Image 4 +
ToMe + OpenVINO 8-bit, Inference Speed: 44.6 seconds (5.1x), Memory Footprint: 0.86 GB (0.25x)
+
+
+
Results of image generation [demo](https://huggingface.co/spaces/AlexKoff88/stable_diffusion) using different optimized models. Input prompt is “cartoon bird”, seed is 42. The models are with OpenVINO 2022.3 in [Hugging Face Spaces](https://huggingface.co/spaces/AlexKoff88/stable_diffusion) using a “CPU upgrade” instance which utilizes 3rd Generation Intel® Xeon® Scalable Processors with Intel® Deep Learning Boost technology. @@ -92,4 +106,4 @@ You can find the training and quantization [code](https://github.com/huggingface As we showed with the Pokemon image generation task, it is possible to achieve a high level of optimization of the Stable Diffusion pipeline when using a relatively small amount of training resources. At the same time, it is well-known that training a general-purpose Stable Diffusion model is an [expensive task](https://www.mosaicml.com/blog/training-stable-diffusion-from-scratch-part-2). However, with enough budget and HW resources, it is possible to optimize the general-purpose model using the described approach and tune it to produce high-quality images. The only caveat we have is related to the token merging method that reduces the model capacity substantially. The rule of thumb here is the more complicated the dataset you have for the training, the less merging ratio you should use during the optimization. -If you enjoyed reading this post, you might also be interested in checking out [this post](https://huggingface.co/blog/stable-diffusion-inference-intel) that discusses other complementary approaches to optimize the performance of Stable Diffusion on 4th generation Intel Xeon CPUs. \ No newline at end of file +If you enjoyed reading this post, you might also be interested in checking out [this post](https://huggingface.co/blog/stable-diffusion-inference-intel) that discusses other complementary approaches to optimize the performance of Stable Diffusion on 4th generation Intel Xeon CPUs.