This is a list of 100 important natural language processing (NLP) papers that serious students and researchers working in the field should probably know about and read. This list is compiled by Masato Hagiwara. I welcome any feedback on this list.
This list is originally based on the answers for a Quora question I posted years ago: What are the most important research papers which all NLP students should definitely read?. I thank all the people who contributed to the original post.
This list is far from complete or objective, and is evolving, as important papers are being published year after year. Please let me know via pull requests and issues if anything is missing.
A paper doesn't have to be a peer-reviewed conference/journal paper to appear here. We also include tutorial/survey-style papers and blog posts that are often easier to understand than the original papers.
-
Avrim Blum and Tom Mitchell: Combining Labeled and Unlabeled Data with Co-Training, 1998.
-
Marco Tulio Ribeiro et al.: Beyond Accuracy: Behavioral Testing of NLP Models with CheckList
-
Yoon Kim: Convolutional Neural Networks for Sentence Classification, 2014.
-
Matthew E. Peters, et al.: Deep contextualized word representations, 2018.
-
Yihan Liu et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach, 2020.
-
Peter F Brown, et al.: Class-Based n-gram Models of Natural Language, 1992.
-
Tomas Mikolov, et al.: Efficient Estimation of Word Representations in Vector Space, 2013.
-
Quoc V. Le and Tomas Mikolov: Distributed Representations of Sentences and Documents, 2014.
-
Jeffrey Pennington, et al.: GloVe: Global Vectors for Word Representation, 2014.
-
Piotr Bojanowski, et al.: Enriching Word Vectors with Subword Information, 2017.
-
Joshua Goodman: A bit of progress in language modeling, MSR Technical Report, 2001.
-
Yee Whye Teh: A Hierarchical Bayesian Language Model based on Pitman-Yor Processes, COLING/ACL 2006.
-
Yee Whye Teh: A Bayesian interpretation of Interpolated Kneser-Ney, 2006.
-
Yoshua Bengio, et al.: A Neural Probabilistic Language Model, J. of Machine Learning Research, 2003.
-
Andrej Karpathy: The Unreasonable Effectiveness of Recurrent Neural Networks, 2015.
-
Yoon Kim, et al.: Character-Aware Neural Language Models, 2015.
-
Alec Radford, et al.: Language Models are Unsupervised Multitask Learners, 2018.
-
Adwait Ratnaparkhi: A Maximum Entropy Model for Part-Of-Speech Tagging, EMNLP 1996.
-
Eugene Charniak: A Maximum-Entropy-Inspired Parser, NAACL 2000.
-
Dan Klein and Christopher Manning: Accurate Unlexicalized Parsing, ACL 2003.
-
Joakim Nivre and Mario Scholz: Deterministic Dependency Parsing of English Text, COLING 2004.
-
Ryan McDonald et al.: Non-Projective Dependency Parsing using Spanning-Tree Algorithms, EMNLP 2005.
-
Daniel Andor et al.: Globally Normalized Transition-Based Neural Networks, 2016.
-
Marti A. Hearst: Automatic Acquisition of Hyponyms from Large Text Corpora, COLING 1992.
-
Collins and Singer: Unsupervised Models for Named Entity Classification, EMNLP 1999.
-
Patrick Pantel and Dekang Lin, Discovering Word Senses from Text, SIGKDD, 2002.
-
Mike Mintz et al.: Distant supervision for relation extraction without labeled data, ACL 2009.
-
Zhiheng Huang et al.: Bidirectional LSTM-CRF Models for Sequence Tagging, 2015.
-
Xuezhe Ma and Eduard Hovy: End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF, ACL 2016.
-
Kevin Knight, Graehl Jonathan. Machine Transliteration. Computational Linguistics, 1992.
-
Kishore Papineni, et al.: BLEU: a Method for Automatic Evaluation of Machine Translation, ACL 2002.
-
Philipp Koehn, Franz J Och, and Daniel Marcu: Statistical Phrase-Based Translation, NAACL 2003.
-
Philip Resnik and Noah A. Smith: The Web as a Parallel Corpus, Computational Linguistics, 2003.
-
David Chiang. A Hierarchical Phrase-Based Model for Statistical Machine Translation, ACL 2005.
-
Minh-Thang Luong, et al.: Effective Approaches to Attention-based Neural Machine Translation, 2015.
-
Rico Sennrich et al.: Neural Machine Translation of Rare Words with Subword Units. ACL 2016.
-
Jonas Gehring, et al.: Convolutional Sequence to Sequence Learning, 2017.
-
Vincent Ng: Supervised Noun Phrase Coreference Research: The First Fifteen Years, ACL 2010.
-
Kenton Lee at al.: End-to-end Neural Coreference Resolution, EMNLP 2017.
-
Kevin Knight and Daniel Marcu: Summarization beyond sentence extraction. Artificial Intelligence 139, 2002.
-
James Clarke and Mirella Lapata: Modeling Compression with Discourse Constraints. EMNLP-CONLL 2007.
-
Ryan McDonald: A Study of Global Inference Algorithms in Multi-Document Summarization, ECIR 2007.
-
Alexander M Rush, et al.: A Neural Attention Model for Sentence Summarization. EMNLP 2015.
-
Abigail See et al.: Get To The Point: Summarization with Pointer-Generator Networks. ACL 2017.
-
Pranav Rajpurkar et al.: SQuAD: 100,000+ Questions for Machine Comprehension of Text. EMNLP 2015.
-
Minjoon Soo et al.: Bi-Directional Attention Flow for Machine Comprehension. ICLR 2015.
-
Jiwei Li, et al.: Deep Reinforcement Learning for Dialogue Generation, EMNLP 2016.
-
Marc’Aurelio Ranzato et al.: Sequence Level Training with Recurrent Neural Networks. ICLR 2016.
-
Samuel R Bowman et al.: Generating sentences from a continuous space, CoNLL 2016.
-
Lantao Yu, et al.: SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient, AAAI 2017.