- Perfectly reproduce and recover, save and load random seed state. Work with dataloader and trainer checkpoint
- Resume training (default) Loading existing parameters? (add an options) Work with trainer checkpoint
- Only generation and only evaluation. (
--do_train --do_test --do_eval
) - Quick test the whole pipeline (and
max_length
). Work with dataloader. (--quick_test
with lazy load usingfseek
) - Logger with DDP
- Reminder through email (wandb)
- Hyper-parameter tuning (e.g. batch-size 16, 32, 64) (https://github.com/RUCAIBox/RecBole/blob/master/run_hyper.py, https://recbole.io/docs/user_guide/usage/parameter_tuning.html) (without saving model)
- Run on several random seeds and average their results (without saving model)
- Model deployment (https://clip-as-service.jina.ai/)
- Check
print
andlogger.info
- Check
get_model
,get_trainer
,get_dataset
andget_dataloader
- Check
warnings.warn()
andlogger.warning()
- Simplfy import relation, add useful module in
__init__.py
(for examplePLM_MODELS
) - Do not use
import *
- Config check (user should add their own config in a file, eg
argument_list
) - Print all the config and command line (is
argument_list.py
necessary?) (maybe 3 classesgeneral
,model
anddataset
) - Simplify
init_logger
, removeis_logger
and support user-defined filename - Case of model class, model file and model yaml (same for dataset)
- Use
dataset
anddataloader
from PyTorch, only withsource_id
,source_text
,target_id
,target_text
. (optionaltarget_id
,target_text
) - Load tokenizer and tokenize text here (support tokenize and token2idx seprately)
- Download and process dataset automatically
- Save processed files. How to check is config or file changed? (maybe with md5 of config and files)
- *Max-token?
-
eval
andrepr
- valid target setting if not metric for best is not loss
- Add attribute
tokenized
?
- DAE (like BART)
- Masked Seq2Seq (like MASS)
- Masked span prediction (like T5)
- LM (like GPT-2) Decoder? Encoder-decoder?
- *PLM (like MPNet) one-tower?
- Support weighted sampling (change
sampler
?) (Note randomness!! https://pytorch.org/docs/stable/notes/randomness#dataloader) - Support local reading (especially for pre-training, change
collate_fn
?)
- Multi-GPU with
accelerate
? Need to research! (PyTorch or accelerate)- check
find_unused_parameters
- will our scheduler be impacted
- check
data parallel (model can fit in one GPU)
single node or several nodes
e.g. fine-tune or pre-train BART-large on a large dataset (16GB)
can save and load model and optimizer correctly
print log only once
see HF how to solve it:
https://github.com/huggingface/transformers/blob/main/examples/pytorch/summarization/run_summarization.py)
https://github.com/huggingface/transformers/blob/main/examples/pytorch/summarization/run_summarization_no_trainer.py
- Fast generation (with https://github.com/microsoft/fastseq or https://github.com/bytedance/lightseq/tree/master/lightseq/training )
- Multi-GPU generation (divide data to multiple GPUs for generation.) Is it possible? Under DDP?
- *FP16 (HF? or Pytorch?)
- WanDB to record loss and metric
- Support train and valid for several steps
- Support generation and evaluation during validation
- Checkpoint format? (following HF?)
model parameters (all) (to `cpu`)
optimizer (trained)
random state
config
valid results
- Save checkpoint and generated text every validation
- Check
_check_metrics
- Add optimizer
AdamW
andAdafactor
(for T5) - Hyper-parameter for optimizer
- Only pass tuned parameters (
requires_grad=True
) to optimizer - Simplify useless code
-
tqdm
with loss (metric) anddynamic_ncols=True
- Check
torch.no_grad()
,model.train()
andmodel.eval()
- Move
optimizer
totrainer
and change name toscheduler
- Automaticly detect model name (case-insensitive)
- Simplify code using AutoModel and AutoTokenizer, following HF example
- Model without pre-trained weights
- Support
model_name
,--tokenizer_name
and--config_name
- Check
__getattr__
inabstract_model.py
- Add OPT
- Add UniLM (reproduce results on SQuAD)
- Add MASS (reproduce results on SQuAD)
- Add CPT and chinese BART (Add one Chinese dataset and reproduce results)
- Add XLM (Add one translation dataset and reproduce results)
- Add MarianMT (the same)
- Test mBART, mT5 using the tranlastion dataset
- Refactor RNN Seq2Seq (merge
Attention
) - Refactor Copied Seq2Seq
- Add basic Transformer
- Model initilazation (for PLM?)
- Add prompt tuning
- Add prefix tuning for GPT-2, BART, T5
- Add P-tuningv2 for GPT-2, BART, T5
- Add adapter for GPT-2, BART, T5
- Add LoRA for BART, T5
- Add LoRA, prompt tuning for GPT-2
- Add bias tuning for GPT-2, BART, T5
- Right prompt tuning
- Add CTRL
- Add PPLM
- Add non-autoregressive models
- Unify
base_evaluator
- Refactor
files2rouge
and withtry
andexcept
, and remove empty line -
multi-bleu
traceback - Add TED following https://github.com/PlusLabNLP/AESOP/blob/master/evaluation/eval.py
- Name and doc check
- Check
bert-score
HF logging - Check and remake each dataset (especially, CoQA, webnlg)
- corpus.copy()
- Support evaluation for different datasets and task. (how to specify the evaluation method?)
- Text summarization: CNN/Daily Mail (cnndm), XSum (xsum), SAMSum (samsum), and WLE (wle).
- Open-ended dialogue system: PersonaChat (pc), DailyDialog (dd), DSTC7-AVSD (da), and SGD (sgd).
- Data-to-text generation: WebNLG v2.1 (webnlg), WebNLG v3.0 (webnlg2), WikiBio (wikibio), E2E (e2e), DART (dart), and ToTTo (totto).
- Question generation: SQuAD (squadqg) and CoQA (coqaqg).
- Story generation: ROCStories (roc) and WritingPrompts (wp).
- Question answering: SQuAD (squad) and CoQA (coqa).
- Task-oriented dialogue system: MultiWOZ 2.0 (multiwoz).
- Commonsense generation: CommonGen (cg).
- Text simplification: WikiAuto + Turk/ASSET (wia).
- Paraphrase generation: Quora (quora) and ParaNMT (paranmt).
- Text style transfer: GYAFC-E&M (gyafc_em) and F&R (gyafc_fr).
- Construct leaderboard for each datasets at GitHub page, including common models, paper link, metric results, and generated files (theirs (official link) or our reproduced (provide config)).
- Text summarization: CNN/Daily Mail (cnndm), XSum (xsum), SAMSum (samsum), and WLE (wle).
- Open-ended dialogue system: PersonaChat (pc), DailyDialog (dd), DSTC7-AVSD (da), and SGD (sgd).
- Data-to-text generation: WebNLG v2.1 (webnlg), WebNLG v3.0 (webnlg2), WikiBio (wikibio), E2E (e2e), DART (dart), and ToTTo (totto).
- Question generation: SQuAD (squadqg) and CoQA (coqaqg).
- Story generation: ROCStories (roc) and WritingPrompts (wp).
- Question answering: SQuAD (squad) and CoQA (coqa).
- Task-oriented dialogue system: MultiWOZ 2.0 (multiwoz).
- Commonsense generation: CommonGen (cg).
- Text simplification: WikiAuto + Turk/ASSET (wia). https://arxiv.org/pdf/2005.00352v2.pdf https://arxiv.org/pdf/2110.08329v2.pdf
- Paraphrase generation: Quora (comming soon).
- Text style transfer: GYAFC-E&M and F&R (comming soon).