-
For Leaderboard submission:
- Zero-Shot Setting:
- Training data : None
- Test data:
test_sentences.txt
- Fine-Tuning Setting:
- Training data :
data/10k/
- Test data:
test_sentences.txt
- Training data :
- Zero-Shot Setting:
-
All data is in
data
:- We prepared the json format to better help users to understand the structures of our probe sets, i.e., which perturbation corresponds to each statement, what is the logical form, etc. For actual testing/training files, we've replaced the A/B with random entities and are in txt format.
- Full 253k data (noisy) in json format
data/RICA_253k_axiom2set.jsonl
- Human-Verified 10k data in json format
data/RICA_10k_axiom2set.jsonl
-
Testing Models
- Besides the following scripts, we also prepared a Jupyter notebook in
Probing_Examples.ipynb
that use the most up-to-date Huggingface pipeline for masked word prediction.
-
For zeroshot
-
BERT
experiments/eval_bert_zeroshot.py
python eval_bert_zeroshot.py test_dir_name test_name(easy/hard/joint) filename seed
e.g.
python eval_bert_zeroshot.py joint_test_set joint bert-large-42 42
-
RoBERTa
experiments/eval_roberta_zeroshot.py
python eval_roberta_zeroshot.py test_dir_name test_name(easy/hard/joint) filename seed
e.g. python eval_roberta_zeroshot.py joint_test_set joint roberta-large-42 42
-
GPT2
experiments/eval_gpt2_zeroshot.py
python eval_gpt2_zeroshot.py
-
-
For finetuned models
-
BERT
experiments/eval_bert_finetuned.py
python eval_bert_finetuned.py test_data_dir model_dir output_dir #of_novel_entities model_name
e.g.
python eval_bert_finetuned.py human_curated_set 10k 10k_fine_tuned 1 bert-large-42
-
RoBERTa
experiments/eval_roberta_finetuned.py
`python eval_roberta_finetuned.py test_data_dir model_dir output_dir #of_novel_entities model_name``
e.g.
python eval_roberta_finetuned.py human_curated_set 10k 10k_fine_tuned 1 robert-large-42
-
GPT2
experiments/run_generative_gpt2_on_easy.py
experiments/run_generative_gpt2_on_hard.py
experiments/run_generative_gpt2_on_joint.py
python run_generative_gpt2_on_easy.py #_of_novel_entities
e.g.
python run_generative_gpt2_on_easy.py 5
-
- Besides the following scripts, we also prepared a Jupyter notebook in
-
Finetuning BERT/RoBERTa for MWP
-
Finetuning code
train_mlm.py
is inhappy-transformer/examples/
python train_mlm.py training_data_directory #_of_novel_entities output_filename model_name seed_number
e.g.
python train_mlm.py 10k 10 roberta-large-42 roberta-large 42
-
After finetuning, you can get the average binary score using
experiments/test_mlm.py
python test_mlm.py training_data_directory filename.txt
e.g.
python test_mlm.py 10k 10 10_roberta-large-42.txt
-
-
Finetuning GPT2
- You can finetune GPT2 using
experiments/fine_tune_GPT-2.sh
- You can finetune GPT2 using