- Goal : Quantization Model PTQ & QAT
- Process :
- Pytorch model train with custom dataset
- Pytorch-Quantization model calibration for ptq
- Pytorch-Quantization model fine tuning for qat
- Generation TensorRT int8 model from Pytorch-Quantization model
- Generation TensorRT int8 model using tensorrt calibration class
- Sample Model : Resnet18
- Dataset : imagenet100
- Device
- Windows 10 laptop
- CPU i7-11375H
- GPU RTX-3060
- Dependency
- cuda 12.1
- cudnn 8.9.2
- tensorrt 8.6.1
- pytorch 2.1.0+cu121
Quantization_EX/
├── calibrator.py # calibration class for TensorRT PTQ
├── common.py # utils for TensorRT
├── infer.py # base model infer
├── onnx_export.py # onnx export
├── ptq.py # Post Train Quantization
├── qat.py # Quantization Aware Training
├── quant_utils.py # utils for quantization
├── train.py # base model train
├── trt_infer.py # TensorRT model infer
├── utils.py # utils
├── LICENSE
└── README.md
- Calculation 10000 iteration with one input data [1, 3, 224, 224]
TRT | TRT | TRT PTQ | PT-Q PTQ | PT-Q PTQ w bnf | PT-Q QAT | PT-Q QAT w bnf | |
Precision | FP32 | FP16 | Int8 | Int8 | Int8 | Int8 | Int8 |
Acc Top-1 [%] | 83.08 | 83.04 | 83.12 | 83.18 | 82.64 | 83.42 | 82.80 |
Avg Latency [ms] | 1.188 ms | 0.527 ms | 0.418 ms | 0.566 ms | 0.545 ms | 0.577 ms | 0.534 ms |
Avg FPS [frame/sec] | 841.74 fps | 1896.01 fps | 2388.33 fps | 1764.55 fps | 1834.69 fps | 1730.89 fps | 1870.99 fps |
Gpu Memory [MB] | 179 MB | 135 MB | 123 MB | 129 MB | 129 MB | 129 MB | 129 MB |
- PT : Ptorch Quantization
- TRT : TensorRT
- bnf : bach normalization folding (conv + bn -> conv')
- infer -> train -> ptq -> qat -> onnx_export -> trt_infer -> trt_infer_acc
- pytorch-quantization : https://github.com/NVIDIA/TensorRT/tree/master/tools/pytorch-quantization
- imagenet100 : https://www.kaggle.com/datasets/ambityga/imagenet100