Mengzi

Although pre-trained models (PLMs) have achieved remarkable improvements in a wide range of NLP tasks, they are expensive in terms of time and resources. This calls for the study of training more efficient models that consume less computation but still ensure impressive performance.

Instead of pursuing a larger scale, we are committed to developing lightweight yet more powerful models trained with equal or less computation and friendly to rapid deployment.

Based on linguistic information integration and training acceleration methods, we have developed the family of Mengzi models. Mengzi models can quickly replace existing pre-trained models because of the same model structure as BERT.

See Mengzi: Towards Lightweight yet Ingenious Pre-trained Models for Chinese for details.

Update 2022-02-10

Thanks to @yingyibiao from the PaddlePaddle team for the PaddleNLP version of the model and documentation.

Note: The PaddleNLP version of the model is not a product of Langboat Technology and we are not responsible for its results or effectiveness.

Navigation

Introduction
Quick Start
Dependency
Download Links
Contact us
Disclaimers
Citation

Introduction

Model	Params	Usage Scenarios	Features
Mengzi-BERT-base	110M	Natural language understanding tasks such as text classification, entity recognition, relationship extraction, and reading comprehension	Same structure as BERT, can directly replace existing BERT weights
Mengzi-BERT-base-fin	110M	Natural language understanding tasks in finance	Trained on financial corpus based on Mengzi-BERT-base
Mengzi-T5-base	220M	Suitable for copywriting generation, news generation and other conditional generation tasks	Same structure as T5, does not contain downstream tasks and needs to be used after finetuning on a specific task. Unlike GPT, not suitable for text continuation
Mengzi-Oscar-base	110M	Suitable for tasks such as image caption, image-text retrieval	A multimodal model based on Mengzi-BERT-base. Trained on millions of image-text pairs

Quick Start

Mengzi-BERT

# Loading with Huggingface transformers
from transformers import BertTokenizer, BertModel

tokenizer = BertTokenizer.from_pretrained("Langboat/mengzi-bert-base")
model = BertModel.from_pretrained("Langboat/mengzi-bert-base")

or

# Loading with PaddleNLP
from paddlenlp.transformers import BertTokenizer, BertModel

tokenizer = BertTokenizer.from_pretrained("Langboat/mengzi-bert-base")
model = BertModel.from_pretrained("Langboat/mengzi-bert-base")

Mengzi-T5

# Loading with Huggingface transformers
from transformers import T5Tokenizer, T5ForConditionalGeneration

tokenizer = T5Tokenizer.from_pretrained("Langboat/mengzi-t5-base")
model = T5ForConditionalGeneration.from_pretrained("Langboat/mengzi-t5-base")

or

# Loading with PaddleNLP
from paddlenlp.transformers import T5Tokenizer, T5ForConditionalGeneration

tokenizer = T5Tokenizer.from_pretrained("Langboat/mengzi-t5-base")
model = T5ForConditionalGeneration.from_pretrained("Langboat/mengzi-t5-base")

Mengzi-Oscar

Document

Dependency

# Loading with Huggingface transformers
pip install transformers

or

# Loading with PaddleNLP
pip install paddlenlp

Downstream tasks

CLUE Scores

Model	AFQMC	TNEWS	IFLYTEK	CMNLI	WSC	CSL	CMRC2018	C3	CHID
RoBERTa-wwm-ext	74.30	57.51	60.80	80.70	67.20	80.67	77.59	67.06	83.78
Mengzi-BERT-base	74.58	57.97	60.68	82.12	87.50	85.40	78.54	71.70	84.16

The scores of RoBERTa-wwm-ext from CLUE baseline

Corresponding hyperparameters

Task	Learning rate	Global batch size	Epochs
AFQMC	3e-5	32	10
TNEWS	3e-5	128	10
IFLYTEK	3e-5	64	10
CMNLI	3e-5	512	10
WSC	8e-6	64	50
CSL	5e-5	128	5
CMRC2018	5e-5	8	5
C3	1e-4	240	3
CHID	5e-5	256	5

Download Links

Mengzi-BERT | Mengzi-BERT (PaddleNLP)
Mengzi-BERT-fin | Mengzi-BERT-fin (PaddleNLP)
Mengzi-T5 | Mengzi-T5 (PaddleNLP)
Mengzi-Oscar

Contact Us

wangyulong[at]chuangxin[dot]com

Disclaimers

The contents of this project are for technical research purposes only and are not intended as a basis for any conclusive findings. Users are free to use the models as they wish within the scope of the license, but we are not responsible for direct or indirect damages resulting from the use of the contents of this project. The experimental results presented in the technical report only indicate performance with specific data sets and hyperparameter combinations and are not representative of the nature of the individual models. Experimental results are subject to change due to random number seeds, computing equipment.

While using the models (which include but not limited to modified use, direct use or use through third party), users shall not use the model in any way which will violate the laws and regulations of the jurisdiction, as well as social ethics. Users shall be responsible for their own behaviors, and taking joint legal liabilities for disputes arising from the use of models. We are not responsible for any liability arising from the use of the models.

We reserve the right to interpret, modify and update this Disclaimer.

Citation

@misc{zhang2021mengzi,
      title={Mengzi: Towards Lightweight yet Ingenious Pre-trained Models for Chinese}, 
      author={Zhuosheng Zhang and Hanqing Zhang and Keming Chen and Yuhang Guo and Jingyun Hua and Yulong Wang and Ming Zhou},
      year={2021},
      eprint={2110.06696},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README_en.md

README_en.md

Mengzi

Update 2022-02-10

Navigation

Introduction

Quick Start

Mengzi-BERT

Mengzi-T5

Mengzi-Oscar

Dependency

Downstream tasks

CLUE Scores

Corresponding hyperparameters

Download Links

Contact Us

Disclaimers

Citation

Files

README_en.md

Latest commit

History

README_en.md

File metadata and controls

Mengzi

Update 2022-02-10

Navigation

Introduction

Quick Start

Mengzi-BERT

Mengzi-T5

Mengzi-Oscar

Dependency

Downstream tasks

CLUE Scores

Corresponding hyperparameters

Download Links

Contact Us

Disclaimers

Citation