Add support for QLoRA/ QAdapter training via bitsandbytes (#663)

This PR adds support for wrapping bitsandbytes' `Linear4bit` and `Linear8bitLt` quantization layers with our LoRA implementation, enabling training LoRA adapters on quantized models in QLoRA style.
adapter-hub · Apr 23, 2024 · 42c1753 · 42c1753
1 parent 233db31
commit 42c1753
Show file tree

Hide file tree

Showing 9 changed files with 854 additions and 20 deletions.
diff --git a/README.md b/README.md
@@ -155,6 +155,7 @@ Currently, adapters integrates all architectures and methods listed below:
 | (IA)^3 | [Liu et al. (2022)](https://arxiv.org/pdf/2205.05638.pdf) | [Docs](https://docs.adapterhub.ml/methods.html#ia-3) |
 | UniPELT | [Mao et al. (2022)](https://arxiv.org/pdf/2110.07577.pdf) | [Docs](https://docs.adapterhub.ml/method_combinations.html#unipelt) |
 | Prompt Tuning | [Lester et al. (2021)](https://aclanthology.org/2021.emnlp-main.243/) | [Docs](https://docs.adapterhub.ml/methods.html#prompt-tuning) |
+| QLoRA | [Dettmers et al. (2023)](https://arxiv.org/pdf/2305.14314.pdf) | [Notebook](https://colab.research.google.com/github/Adapter-Hub/adapters/blob/main/notebooks/QLoRA_Llama_Finetuning.ipynb) |
 
 ## Supported Models
 

diff --git a/docs/index.rst b/docs/index.rst
@@ -28,7 +28,7 @@ The framework consists of two main components:
 Currently, we support the PyTorch versions of all models as listed on the `Model Overview <model_overview.html>`_ page.
 
 .. toctree::
-   :maxdepth: 1
+   :maxdepth: 2
    :caption: Getting Started
 
    installation

diff --git a/docs/quickstart.md b/docs/quickstart.md
@@ -14,7 +14,7 @@ In the following, we will briefly go through some examples to showcase these met
     `the 'Usage' section in Hugging Face's documentation <https://huggingface.co/docs/transformers/main/en/quicktour>`_.
 ```
 
-## Initialize Model with Adapters
+## Initialize a Model with Adapters
 
 The `XAdapterModel` is the recommended model for training and inference of adapters:
 

diff --git a/docs/training.md b/docs/training.md
@@ -215,3 +215,9 @@ trainer = AdapterTrainer(
     When you migrate from the previous versions, which use the Trainer class for adapter training and fully fine-tuning, note that the 
     specialized AdapterTrainer class does not have the parameters `do_save_full_model`, `do_save_adapters` and `do_save_adapter_fusion`.
 ```
+
+## Quantized Model Training
+
+_Adapters_ supports fine-tuning of quantized language models similar to [QLoRA (Dettmers et al., 2023)](https://arxiv.org/pdf/2305.14314.pdf) via the `bitsandbytes` library integrated into Transformers.
+Quantized training is supported for LoRA-based adapters as well as bottleneck adapters and prefix tuning.
+Please refer to [this notebook](https://colab.research.google.com/github/Adapter-Hub/adapters/blob/main/notebooks/QLoRA_Llama_Finetuning.ipynb) for a hands-on guide.