Jump to content

Translations:Manually annotated corpora/2/en: Difference between revisions

From Clarin K-Centre
FuzzyBot (talk | contribs)
Importing a new version from external source
 
(No difference)

Latest revision as of 14:34, 14 March 2024

Information about message (contribute)
This message has no documentation. If you know where or how this message is used, you can help other translators by adding documentation to this message.
Message definition (Manually annotated corpora)
==Corpus Gesproken Nederlands==
The Spoken Dutch Corpus comprises a large number of samples of (recorded) spoken text (appr. 9 million words). The entire corpus has been transcribed orthographically, while the transcripts have been linked to the speech files. The orthographic transcription was used as the starting point for the lemmatization and part-of-speech tagging of the corpus. For a selection of one million words, a (verified) broad phonetic transcription has been produced, while for this part of the corpus also the alignment of the transcripts and the speech files has been verified at the word level. In addition, a selection of one million words has been annotated syntactically. Finally, for a more modest part of the corpus, approximately 250,000 words, a prosodic annotation is available.

Corpus Gesproken Nederlands

The Spoken Dutch Corpus comprises a large number of samples of (recorded) spoken text (appr. 9 million words). The entire corpus has been transcribed orthographically, while the transcripts have been linked to the speech files. The orthographic transcription was used as the starting point for the lemmatization and part-of-speech tagging of the corpus. For a selection of one million words, a (verified) broad phonetic transcription has been produced, while for this part of the corpus also the alignment of the transcripts and the speech files has been verified at the word level. In addition, a selection of one million words has been annotated syntactically. Finally, for a more modest part of the corpus, approximately 250,000 words, a prosodic annotation is available.