All translations

Jump to navigation Jump to search

Enter a message name below to show all available translations.

Message

Found 2 translations.

NameCurrent message text
 h English (en)==Corpus Gesproken Nederlands==
The Spoken Dutch Corpus comprises a large number of samples of (recorded) spoken text (appr. 9 million words). The entire corpus has been transcribed orthographically, while the transcripts have been linked to the speech files. The orthographic transcription was used as the starting point for the lemmatization and part-of-speech tagging of the corpus. For a selection of one million words, a (verified) broad phonetic transcription has been produced, while for this part of the corpus also the alignment of the transcripts and the speech files has been verified at the word level. In addition, a selection of one million words has been annotated syntactically. Finally, for a more modest part of the corpus, approximately 250,000 words, a prosodic annotation is available.
 h Dutch (nl)==Corpus Gesproken Nederlands==
Het Corpus Gesproken Nederlands (CGN) is een verzameling van 900 uur (bijna 9 miljoen woorden) hedendaagse Nederlandse spraak, afkomstig van Vlamingen en Nederlanders. De spraakfragmenten (spontaan en voorbereid) zijn opgelijnd met diverse transcripties (o.a. orthografisch, fonetisch) en annotaties (syntactisch, POS-tags). Metadata, lexica, frequentielijsten en de corpusexploratiesoftware Corex behoren ook tot het CGN.