generated from mlcommons/template
-
Notifications
You must be signed in to change notification settings - Fork 35
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #79 from alxdsptr/mlperf-inference-results-scc24
Mlperf inference results scc24 pku
- Loading branch information
Showing
70 changed files
with
81,416 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
TBD |
3 changes: 3 additions & 0 deletions
3
...erf_inference_lry_40-nvidia_original-gpu-tensorrt-vdefault-scc24-main/README.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
| Model | Scenario | Accuracy | Throughput | Latency (in ms) | | ||
|---------------------|------------|----------------------|--------------|-------------------| | ||
| stable-diffusion-xl | offline | (14.02827, 84.33062) | 7.644 | - | |
60 changes: 60 additions & 0 deletions
60
...original-gpu-tensorrt-vdefault-scc24-main/stable-diffusion-xl/offline/README.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,60 @@ | ||
This experiment is generated using the [MLCommons Collective Mind automation framework (CM)](https://github.com/mlcommons/cm4mlops). | ||
|
||
*Check [CM MLPerf docs](https://docs.mlcommons.org/inference) for more details.* | ||
|
||
## Host platform | ||
|
||
* OS version: Linux-5.14.0-427.33.1.el9_4.x86_64-x86_64-with-glibc2.29 | ||
* CPU version: x86_64 | ||
* Python version: 3.8.10 (default, Sep 11 2024, 16:02:53) | ||
[GCC 9.4.0] | ||
* MLCommons CM version: 3.4.1 | ||
|
||
## CM Run Command | ||
|
||
See [CM installation guide](https://docs.mlcommons.org/inference/install/). | ||
|
||
```bash | ||
pip install -U cmind | ||
|
||
cm rm cache -f | ||
|
||
cm pull repo mlcommons@cm4mlops --checkout=852b297c18a90edb8a9c975dd7ee7cf731e1e347 | ||
|
||
cm run script \ | ||
--tags=run-mlperf,inference,_r4.1-dev,_scc24-main \ | ||
--model=sdxl \ | ||
--implementation=nvidia \ | ||
--max_query_count=5000 \ | ||
--min_query_count=576 \ | ||
--framework=tensorrt \ | ||
--category=datacenter \ | ||
--scenario=Offline \ | ||
--execution_mode=test \ | ||
--device=cuda \ | ||
--max_batchsize=8 \ | ||
--quiet \ | ||
--rerun | ||
``` | ||
*Note that if you want to use the [latest automation recipes](https://docs.mlcommons.org/inference) for MLPerf (CM scripts), | ||
you should simply reload mlcommons@cm4mlops without checkout and clean CM cache as follows:* | ||
|
||
```bash | ||
cm rm repo mlcommons@cm4mlops | ||
cm pull repo mlcommons@cm4mlops | ||
cm rm cache -f | ||
|
||
``` | ||
|
||
## Results | ||
|
||
Platform: mlperf_inference_lry_40-nvidia_original-gpu-tensorrt-vdefault-scc24-main | ||
|
||
Model Precision: int8 | ||
|
||
### Accuracy Results | ||
`CLIP_SCORE`: `14.02827`, Required accuracy for closed division `>= 31.68632` and `<= 31.81332` | ||
`FID_SCORE`: `84.33062`, Required accuracy for closed division `>= 23.01086` and `<= 23.95008` | ||
|
||
### Performance Results | ||
`Samples per second`: `7.64431` |
83 changes: 83 additions & 0 deletions
83
...riginal-gpu-tensorrt-vdefault-scc24-main/stable-diffusion-xl/offline/accuracy_console.out
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,83 @@ | ||
[2024-11-18 17:23:54,543 systems.py:197 INFO] Found unknown device in GPU connection topology: NIC0. Skipping. | ||
[2024-11-18 17:23:54,621 main.py:229 INFO] Detected system ID: KnownSystem.sc1 | ||
[2024-11-18 17:23:56,362 generate_conf_files.py:107 INFO] Generated measurements/ entries for sc1_TRT/stable-diffusion-xl/Offline | ||
[2024-11-18 17:23:56,363 __init__.py:46 INFO] Running command: python3 -m code.stable-diffusion-xl.tensorrt.harness --logfile_outdir="/home/lry/CM/repos/local/cache/6c0ba4746fa74e77/test_results/mlperf_inference_lry_40-nvidia_original-gpu-tensorrt-vdefault-scc24-main/stable-diffusion-xl/offline/accuracy" --logfile_prefix="mlperf_log_" --performance_sample_count=5000 --test_mode="AccuracyOnly" --gpu_batch_size=8 --mlperf_conf_path="/home/lry/CM/repos/local/cache/3e2d12440d5a4a93/inference/mlperf.conf" --tensor_path="build/preprocessed_data/coco2014-tokenized-sdxl/5k_dataset_final/" --use_graphs=true --user_conf_path="/home/lry/CM/repos/mlcommons@cm4mlops/script/generate-mlperf-inference-user-conf/tmp/593e2d3b46e640629d0ab5c1e4ff088a.conf" --gpu_inference_streams=1 --gpu_copy_streams=1 --gpu_engines="./build/engines/sc1/stable-diffusion-xl/Offline/stable-diffusion-xl-CLIP-Offline-gpu-b8-fp16.custom_k_99_MaxP.plan,./build/engines/sc1/stable-diffusion-xl/Offline/stable-diffusion-xl-CLIPWithProj-Offline-gpu-b8-fp16.custom_k_99_MaxP.plan,./build/engines/sc1/stable-diffusion-xl/Offline/stable-diffusion-xl-UNetXL-Offline-gpu-b8-int8.custom_k_99_MaxP.plan,./build/engines/sc1/stable-diffusion-xl/Offline/stable-diffusion-xl-VAE-Offline-gpu-b8-fp32.custom_k_99_MaxP.plan" --scenario Offline --model stable-diffusion-xl | ||
[2024-11-18 17:23:56,363 __init__.py:53 INFO] Overriding Environment | ||
[2024-11-18 17:23:58,943 systems.py:197 INFO] Found unknown device in GPU connection topology: NIC0. Skipping. | ||
2024-11-18 17:24:00,676 INFO worker.py:1567 -- Connecting to existing Ray cluster at address: 10.0.0.1:6379... | ||
2024-11-18 17:24:00,683 INFO worker.py:1743 -- Connected to Ray cluster. View the dashboard at [1m[32mhttp://127.0.0.1:8265 [39m[22m | ||
[2024-11-18 17:24:00,875 harness.py:207 INFO] Start Warm Up! | ||
[36m(SDXLCore pid=107737)[0m [2024-11-18 17:24:03,968 backend.py:428 INFO] initialized | ||
[36m(SDXLCore pid=107737)[0m [2024-11-18 17:24:04,150 backend.py:72 INFO] Loading TensorRT engine: ./build/engines/sc1/stable-diffusion-xl/Offline/stable-diffusion-xl-CLIP-Offline-gpu-b8-fp16.custom_k_99_MaxP.plan. | ||
[36m(SDXLCore pid=107737)[0m [2024-11-18 17:24:04,323 backend.py:72 INFO] Loading TensorRT engine: ./build/engines/sc1/stable-diffusion-xl/Offline/stable-diffusion-xl-CLIPWithProj-Offline-gpu-b8-fp16.custom_k_99_MaxP.plan. | ||
[36m(SDXLCore pid=107737)[0m [2024-11-18 17:24:07,326 backend.py:97 INFO] Enabling cuda graphs for unet | ||
[36m(SDXLCore pid=107737)[0m [2024-11-18 17:24:07,767 backend.py:155 INFO] captured graph for BS=1 | ||
[36m(SDXLCore pid=107737)[0m [2024-11-18 17:24:08,559 backend.py:155 INFO] captured graph for BS=2 | ||
[36m(SDXLCore pid=9090, ip=10.0.0.3)[0m [2024-11-18 17:24:01,717 backend.py:428 INFO] initialized[32m [repeated 8x across cluster] (Ray deduplicates logs by default. Set RAY_DEDUP_LOGS=0 to disable log deduplication, or see https://docs.ray.io/en/master/ray-observability/ray-logging.html#log-deduplication for more options.)[0m | ||
[36m(SDXLCore pid=107744)[0m [2024-11-18 17:24:06,876 backend.py:72 INFO] Loading TensorRT engine: ./build/engines/sc1/stable-diffusion-xl/Offline/stable-diffusion-xl-VAE-Offline-gpu-b8-fp32.custom_k_99_MaxP.plan.[32m [repeated 34x across cluster][0m | ||
[36m(SDXLCore pid=107744)[0m [2024-11-18 17:24:08,645 backend.py:97 INFO] Enabling cuda graphs for unet[32m [repeated 8x across cluster][0m | ||
[36m(SDXLCore pid=9089, ip=10.0.0.3)[0m [2024-11-18 17:24:09,783 backend.py:155 INFO] captured graph for BS=7[32m [repeated 51x across cluster][0m | ||
[2024-11-18 17:24:33,625 harness.py:209 INFO] Warm Up Done! | ||
[2024-11-18 17:24:33,625 harness.py:211 INFO] Start Test! | ||
[2024-11-18 17:24:33,731 backend.py:852 INFO] 500 | ||
[36m(SDXLCore pid=107737)[0m [2024-11-18 17:24:33,733 backend.py:630 INFO] generate_images | ||
[36m(SDXLCore pid=107741)[0m [2024-11-18 17:24:14,698 backend.py:155 INFO] captured graph for BS=8[32m [repeated 19x across cluster][0m | ||
[36m(SDXLCore pid=9088, ip=10.0.0.3)[0m [2024-11-18 17:24:38,439 backend.py:630 INFO] generate_images[32m [repeated 9x across cluster][0m | ||
[36m(SDXLCore pid=9088, ip=10.0.0.3)[0m [2024-11-18 17:24:46,259 backend.py:630 INFO] generate_images[32m [repeated 9x across cluster][0m | ||
[36m(SDXLCore pid=9089, ip=10.0.0.3)[0m [2024-11-18 17:24:54,006 backend.py:630 INFO] generate_images[32m [repeated 9x across cluster][0m | ||
[36m(SDXLCore pid=107737)[0m [2024-11-18 17:25:02,027 backend.py:630 INFO] generate_images[32m [repeated 8x across cluster][0m | ||
[36m(SDXLCore pid=107738)[0m [2024-11-18 17:25:10,624 backend.py:630 INFO] generate_images[32m [repeated 5x across cluster][0m | ||
[36m(SDXLCore pid=107738)[0m [2024-11-18 17:25:19,815 backend.py:630 INFO] generate_images[32m [repeated 9x across cluster][0m | ||
[36m(SDXLCore pid=107738)[0m [2024-11-18 17:25:29,041 backend.py:630 INFO] generate_images[32m [repeated 9x across cluster][0m | ||
[2024-11-18 17:25:40,770 backend.py:901 INFO] [Server] Received 500 total samples | ||
[2024-11-18 17:25:40,773 backend.py:911 INFO] [Device 0] Reported 56 samples | ||
[2024-11-18 17:25:40,774 backend.py:911 INFO] [Device 1] Reported 56 samples | ||
[2024-11-18 17:25:40,775 backend.py:911 INFO] [Device 2] Reported 56 samples | ||
[2024-11-18 17:25:40,777 backend.py:911 INFO] [Device 3] Reported 56 samples | ||
[2024-11-18 17:25:40,778 backend.py:911 INFO] [Device 4] Reported 56 samples | ||
[2024-11-18 17:25:40,780 backend.py:911 INFO] [Device 5] Reported 55 samples | ||
[2024-11-18 17:25:40,782 backend.py:911 INFO] [Device 6] Reported 55 samples | ||
[2024-11-18 17:25:40,783 backend.py:911 INFO] [Device 7] Reported 55 samples | ||
[2024-11-18 17:25:40,784 backend.py:911 INFO] [Device 8] Reported 55 samples | ||
[2024-11-18 17:25:40,784 harness.py:214 INFO] Test Done! | ||
[2024-11-18 17:25:40,784 harness.py:216 INFO] Destroying SUT... | ||
[2024-11-18 17:25:40,784 harness.py:219 INFO] Destroying QSL... | ||
[36m(SDXLCore pid=107737)[0m [2024-11-18 17:25:30,624 backend.py:630 INFO] generate_images[32m [repeated 4x across cluster][0m | ||
benchmark : Benchmark.SDXL | ||
buffer_manager_thread_count : 0 | ||
data_dir : /home/lry/CM/repos/local/cache/d2b9079c1073417b/data | ||
gpu_batch_size : 8 | ||
gpu_copy_streams : 1 | ||
gpu_inference_streams : 1 | ||
input_dtype : int32 | ||
input_format : linear | ||
log_dir : /home/lry/CM/repos/local/cache/3443882dd9374096/repo/closed/NVIDIA/build/logs/2024.11.18-17.23.50 | ||
mlperf_conf_path : /home/lry/CM/repos/local/cache/3e2d12440d5a4a93/inference/mlperf.conf | ||
model_path : /home/lry/CM/repos/local/cache/d2b9079c1073417b/models/SDXL/ | ||
offline_expected_qps : 0.0 | ||
precision : int8 | ||
preprocessed_data_dir : /home/lry/CM/repos/local/cache/d2b9079c1073417b/preprocessed_data | ||
scenario : Scenario.Offline | ||
system : SystemConfiguration(host_cpu_conf=CPUConfiguration(layout={CPU(name='AMD EPYC 9684X 96-Core Processor', architecture=<CPUArchitecture.x86_64: AliasedName(name='x86_64', aliases=(), patterns=())>, core_count=96, threads_per_core=2): 2}), host_mem_conf=MemoryConfiguration(host_memory_capacity=Memory(quantity=791.59486, byte_suffix=<ByteSuffix.GB: (1000, 3)>, _num_bytes=791594860000), comparison_tolerance=0.05), accelerator_conf=AcceleratorConfiguration(layout=defaultdict(<class 'int'>, {GPU(name='NVIDIA H100 80GB HBM3', accelerator_type=<AcceleratorType.Discrete: AliasedName(name='Discrete', aliases=(), patterns=())>, vram=Memory(quantity=79.6474609375, byte_suffix=<ByteSuffix.GiB: (1024, 3)>, _num_bytes=85520809984), max_power_limit=700.0, pci_id='0x233010DE', compute_sm=90): 5})), numa_conf=NUMAConfiguration(numa_nodes={}, num_numa_nodes=2), system_id='sc1') | ||
tensor_path : build/preprocessed_data/coco2014-tokenized-sdxl/5k_dataset_final/ | ||
test_mode : AccuracyOnly | ||
use_graphs : True | ||
user_conf_path : /home/lry/CM/repos/mlcommons@cm4mlops/script/generate-mlperf-inference-user-conf/tmp/593e2d3b46e640629d0ab5c1e4ff088a.conf | ||
system_id : sc1 | ||
config_name : sc1_stable-diffusion-xl_Offline | ||
workload_setting : WorkloadSetting(HarnessType.Custom, AccuracyTarget.k_99, PowerSetting.MaxP) | ||
optimization_level : plugin-enabled | ||
num_profiles : 1 | ||
config_ver : custom_k_99_MaxP | ||
accuracy_level : 99% | ||
inference_server : custom | ||
skip_file_checks : False | ||
power_limit : None | ||
cpu_freq : None | ||
[36m(SDXLCore pid=107737)[0m [I] Loading bytes from ./build/engines/sc1/stable-diffusion-xl/Offline/stable-diffusion-xl-CLIP-Offline-gpu-b8-fp16.custom_k_99_MaxP.plan | ||
[36m(SDXLCore pid=107737)[0m [I] Loading bytes from ./build/engines/sc1/stable-diffusion-xl/Offline/stable-diffusion-xl-CLIPWithProj-Offline-gpu-b8-fp16.custom_k_99_MaxP.plan | ||
[36m(SDXLCore pid=107744)[0m [I] Loading bytes from ./build/engines/sc1/stable-diffusion-xl/Offline/stable-diffusion-xl-VAE-Offline-gpu-b8-fp32.custom_k_99_MaxP.plan[32m [repeated 34x across cluster][0m | ||
[2024-11-18 17:25:42,171 run_harness.py:166 INFO] Result: Accuracy run detected. | ||
|
||
======================== Result summaries: ======================== | ||
|
Oops, something went wrong.