ReCoNLL is revised on CoNLL-2003. We manually fixed errors with the instruction of the measure ECR (entity coverage ratio) proposed by the work. Specifically, we corrected 65 sentences in the test set and 14 sentences in the training set.
PLONER (Person, Location, Organization NER) is purposed to evaluate the cross-domain generalization. We pick the samples which contain at least one of three entity type (Person, Location, Organization) from representative datasets, such as WNUT16, CoNLL03, OntoNotes 5.0.
CoNLL2003 is one of the most widely used NER data set. More about the datasets can be found in this paper: Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition.
The corpus type of OntoNotes 5.0 includes newswire (News), broadcast news (BN), broadcast conversation (BC), telephone conversation (Tele) and web data (Web) in English. For more detailed description about the data set, please refer to the document: OntoNotes Release 5.0 .
A shared task on named entity recognition in Twitter. For more information about the data set, please refer to the paper: Results of the WNUT16 Named Entity Recognition Shared Task.
CoNLL++ views ner (CoNLL03) as a task to train with noisy annotations and test with gold-standard annotations. We corrected 186 sentences in the test set. For more information about the data set, please refer to the paper: CrossWeigh: Training Named Entity Tagger from Imperfect Annotations.