-
Notifications
You must be signed in to change notification settings - Fork 355
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add NLLB (M2M100) support #769
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for working on this! Already looks pretty good overall, I left some comments regarding the open issues that are hopefully helpful.
Once that's done, please also add this new model type to the docs as described in the contributing guide, thanks!
head_types = [ | ||
"classification", | ||
"multilabel_classification", | ||
"question_answering", | ||
"seq2seq_lm", | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this defines the range of supported heads. Since I believe we'd only want to support sequence generation, you can remove everything except for "seq2seq_lm" here.
ParallelAdapterInferenceTestMixin, | ||
ParallelTrainingMixin, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case we don't want to support Parallel
composition (which is totally fine), please remove these two mixins to disable the tests.
Otherwise, by adding the model type here:
adapters/src/adapters/composition.py
Line 121 in f0ca962
SUPPORTED_MODELS = { |
from .test_adapter_heads import PredictionHeadModelTestMixin | ||
|
||
|
||
class M2M100AdapterTestBase(AdapterTestBase): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this model doesn't support text classification tasks (and we test text classification training by default in the test runs), we'd have to override the add_heead()
and dataset()
methods here. That would roughly look like here:
adapters/tests/test_bert_generation.py
Lines 29 to 70 in f0ca962
def add_head(self, model, name, **kwargs): | |
model.add_masked_lm_head(name) | |
return self.default_input_samples_shape[-1] | |
def dataset(self, tokenizer=None): | |
# setup tokenizer | |
if tokenizer is None: | |
tokenizer = AutoTokenizer.from_pretrained(self.tokenizer_name, use_fast=False) | |
if tokenizer.pad_token is None: | |
tokenizer.pad_token = tokenizer.eos_token | |
def preprocess_function(examples): | |
inputs = examples["document"] | |
targets = examples["summary"] | |
inputs = ["Summarize: " + inp for inp in inputs] | |
model_inputs = tokenizer(inputs, padding="max_length", truncation=True, max_length=128) | |
# Setup the tokenizer for targets | |
with tokenizer.as_target_tokenizer(): | |
labels = tokenizer(targets, padding="max_length", truncation=True, max_length=128) | |
# If we are padding here, replace all tokenizer.pad_token_id in the labels by -100 when we want to ignore | |
# padding in the loss. | |
labels["input_ids"] = [ | |
[(l if l != tokenizer.pad_token_id else -100) for l in label] for label in labels["input_ids"] | |
] | |
model_inputs["labels"] = labels["input_ids"] | |
return model_inputs | |
data_args = { | |
"task_name": "xsum", | |
"path": "./tests/fixtures/samples/xsum/sample.json", | |
} | |
dataset = load_dataset("json", data_files=data_args["path"]) | |
train_dataset = dataset["train"] | |
train_dataset = train_dataset.map( | |
preprocess_function, | |
batched=True, | |
desc="Running tokenizer on train dataset", | |
) | |
return train_dataset |
I implemented AdapterHub support for the Facebook NLLB model and its underlying M2M100 architecture. I've carried out and ran all the relevant tests, auto formatting and quality checks.
The code passes 124 tests, skipping 7, and failing 11. The 11 failed tests are all connected to Parallel composition blocks (that I did not implement) and flex heads, which I also did not implement. As the model is a machine translation model, it does not need to have classification heads on top of it, but I didn't find how to disable the irrelevant
head_types
in theADAPTER_MODEL_MAPPING
dictionary to be able to skip these tests.Any advice on this is greatly appreciated!
Key addition:
M2M100AdapterModel
class with the relevantWithAdapters
andAdaptersMixin
classes implemented.