Skip to content

Latest commit

 

History

History
68 lines (43 loc) · 1.99 KB

README.md

File metadata and controls

68 lines (43 loc) · 1.99 KB

PART 1: HARDWARE AND LOW-LEVEL OPTIMIZATION

Session 1

Introduction: How Libraries Work

  • Motivation for the course and understanding computational efficiency.
  • Factors affecting the efficiency of model performance.

Seminar:

  • Measuring time, memory, and how autograd works.

Session 2

Hardware and Low-Level Solutions

  • Introduction to computational devices, how CPU and GPU memory work.

Seminar:

  • Profiling models with Pytorch Profiler.

Session 3

Automatic Low-Level Optimization

Seminar:

  • Working with JIT, converting models to ONNX, converting models to TensorRT. Speeding up models with JIT and compile.

Session 4

Quantization

  • Main methods and approaches to quantization, overview of LLM quantization methods.

Seminar:

  • Implementing quantization with LSQ. Quantization with Pytorch, quantization with ONNX.

Session 5

Pruning and Sparsification

  • Overview of main methods of model sparsification, motivation for why it works, and types of sparsification. Methods of sparsification for LLM.

Seminar:

  • Structured and unstructured pruning for VGG, iterative pruning, and magnitude-based pruning.

PART 2: OPTIMIZING LLMs

Session 6

Low-Level and Algorithmic Optimization Methods for Large Language Models (LLM)

Seminar:

  • Fine-tuning of Quantized LLM

Session 7

Tensor Factorization for LM

  • Main methods of tensor decomposition for language models. What can be achieved with TD and when is it best to use?
  • Introduction to TD. Description of general methods and concepts. Overview of modern methods of model optimization using TD. Overview of libraries.

Seminar:

  • Instead of fully connected layers, use their compressed representation obtained with SVD.

PART 3: IN SEARCH OF GOOD MODELS

Session 8

Automatic Architecture Search

  • Methods of automatic architecture search, including computationally efficient models.

Seminar:

  • Differentiated architecture search and evolution.