Skip to content

Commit

Permalink
Add support for QLoRA/ QAdapter training via bitsandbytes (#663)
Browse files Browse the repository at this point in the history
This PR adds support for wrapping bitsandbytes' `Linear4bit` and
`Linear8bitLt` quantization layers with our LoRA implementation,
enabling training LoRA adapters on quantized models in QLoRA style.
  • Loading branch information
calpt authored Apr 23, 2024
1 parent 233db31 commit 42c1753
Show file tree
Hide file tree
Showing 9 changed files with 854 additions and 20 deletions.
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -155,6 +155,7 @@ Currently, adapters integrates all architectures and methods listed below:
| (IA)^3 | [Liu et al. (2022)](https://arxiv.org/pdf/2205.05638.pdf) | [Docs](https://docs.adapterhub.ml/methods.html#ia-3) |
| UniPELT | [Mao et al. (2022)](https://arxiv.org/pdf/2110.07577.pdf) | [Docs](https://docs.adapterhub.ml/method_combinations.html#unipelt) |
| Prompt Tuning | [Lester et al. (2021)](https://aclanthology.org/2021.emnlp-main.243/) | [Docs](https://docs.adapterhub.ml/methods.html#prompt-tuning) |
| QLoRA | [Dettmers et al. (2023)](https://arxiv.org/pdf/2305.14314.pdf) | [Notebook](https://colab.research.google.com/github/Adapter-Hub/adapters/blob/main/notebooks/QLoRA_Llama_Finetuning.ipynb) |

## Supported Models

Expand Down
2 changes: 1 addition & 1 deletion docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ The framework consists of two main components:
Currently, we support the PyTorch versions of all models as listed on the `Model Overview <model_overview.html>`_ page.

.. toctree::
:maxdepth: 1
:maxdepth: 2
:caption: Getting Started

installation
Expand Down
2 changes: 1 addition & 1 deletion docs/quickstart.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ In the following, we will briefly go through some examples to showcase these met
`the 'Usage' section in Hugging Face's documentation <https://huggingface.co/docs/transformers/main/en/quicktour>`_.
```

## Initialize Model with Adapters
## Initialize a Model with Adapters

The `XAdapterModel` is the recommended model for training and inference of adapters:

Expand Down
6 changes: 6 additions & 0 deletions docs/training.md
Original file line number Diff line number Diff line change
Expand Up @@ -215,3 +215,9 @@ trainer = AdapterTrainer(
When you migrate from the previous versions, which use the Trainer class for adapter training and fully fine-tuning, note that the
specialized AdapterTrainer class does not have the parameters `do_save_full_model`, `do_save_adapters` and `do_save_adapter_fusion`.
```

## Quantized Model Training

_Adapters_ supports fine-tuning of quantized language models similar to [QLoRA (Dettmers et al., 2023)](https://arxiv.org/pdf/2305.14314.pdf) via the `bitsandbytes` library integrated into Transformers.
Quantized training is supported for LoRA-based adapters as well as bottleneck adapters and prefix tuning.
Please refer to [this notebook](https://colab.research.google.com/github/Adapter-Hub/adapters/blob/main/notebooks/QLoRA_Llama_Finetuning.ipynb) for a hands-on guide.
Loading

0 comments on commit 42c1753

Please sign in to comment.