Translations:Manually annotated corpora/13/en

From Clarin K-Centre
Jump to navigation Jump to search

Dutch Archaeology NER Training Dataset

A manually annotated NER dataset, consisting of Dutch archaeological excavation reports. The following entity types are labelled: Artefacts, Time periods, Materials, Places (geographical locations), Archaeological contexts and Species. The dataset is provided in the BIO format, with each token on 1 line and empty lines denoting sentence boundaries. On each line you can find the token, PoS tag, morphological segmentation and finally the label, separated by spaces. The PoS tag and morphological segmentation are assigned by Frog.