Corpus Gesproken Nederlands (CGN): Difference between revisions
Appearance
Created page with "== Properties == * 900 hours of spoken Dutch * 1998 - 2004 * tagged, lemmatized, annotated (orthographic/phonetic) * corpus exploration software (Corex) * version 2.0.3. * [ht..." |
(No difference)
|
Revision as of 15:22, 18 February 2021
Properties
- 900 hours of spoken Dutch
- 1998 - 2004
- tagged, lemmatized, annotated (orthographic/phonetic)
- corpus exploration software (Corex)
- version 2.0.3.
- project website
Description
The Corpus Gesproken Nederlands (Corpus Spoken Dutch) is a collection of 900 hours (almost 9 million words) of contemporary spoken Dutch from native speakers in Flanders and the Netherlands.
The speech recordings are aligned with several transcriptions (e.g. orthographic, phonetic) and annotations (syntax, POS-tags). Metadata, lexica, frequency lists and the tool Corex which can be used to explore the data are included.
Download page
https://taalmaterialen.ivdnt.org/download/tstc-wablieft-corpus-1-2/