Translations:Manually annotated corpora/13/nl: Difference between revisions

Latest revision as of 17:56, 27 May 2024

Information about message (contribute)

This message has no documentation. If you know where or how this message is used, you can help other translators by adding documentation to this message.

Message definition (Manually annotated corpora)

==Dutch Archaeology NER Training Dataset==
A manually annotated NER dataset, consisting of Dutch archaeological excavation reports. The following entity types are labelled: Artefacts, Time periods, Materials, Places (geographical locations), Archaeological contexts and Species.
The dataset is provided in the BIO format, with each token on 1 line and empty lines denoting sentence boundaries. On each line you can find the token, PoS tag, morphological segmentation and finally the label, separated by spaces. The PoS tag and morphological segmentation are assigned by Frog.

Dutch Archaeology NER Training Dataset

Een handmatig geannoteerde NER-dataset (Named Entity Recognition), bestaande uit Nederlandse archeologische opgravingsverslagen. De volgende entiteittypes zijn gelabeld: Artefacten, Tijdperiodes, Materialen, Plaatsen (geografische locaties), Archeologische contexten en Soorten. De dataset wordt geleverd in het BIO-formaat, met elk token op 1 lijn en lege regels die zinsgrenzen aanduiden. Op elke regel staat een token, een POS-tag, de morfologische segmentatie en ten slotte het label, gescheiden door spaties. De POS-tags en de morfologische segmentatie zijn toegewezen door Frog.