- Motivation for the course and understanding computational efficiency.
- Factors affecting the efficiency of model performance.
Seminar:
- Measuring time, memory, and how autograd works.
- Introduction to computational devices, how CPU and GPU memory work.
Seminar:
- Profiling models with Pytorch Profiler.
Seminar:
- Working with JIT, converting models to ONNX, converting models to TensorRT. Speeding up models with JIT and compile.
- Main methods and approaches to quantization, overview of LLM quantization methods.
Seminar:
- Implementing quantization with LSQ. Quantization with Pytorch, quantization with ONNX.
- Overview of main methods of model sparsification, motivation for why it works, and types of sparsification. Methods of sparsification for LLM.
Seminar:
- Structured and unstructured pruning for VGG, iterative pruning, and magnitude-based pruning.
Seminar:
- Fine-tuning of Quantized LLM
- Main methods of tensor decomposition for language models. What can be achieved with TD and when is it best to use?
- Introduction to TD. Description of general methods and concepts. Overview of modern methods of model optimization using TD. Overview of libraries.
Seminar:
- Instead of fully connected layers, use their compressed representation obtained with SVD.
- Methods of automatic architecture search, including computationally efficient models.
Seminar:
- Differentiated architecture search and evolution.