Corpus Gesproken Nederlands (CGN)

From Clarin K-Centre
Revision as of 15:24, 18 February 2021 by Laura (talk | contribs) (→‎Download page)
Jump to navigation Jump to search

Properties

  • 900 hours of spoken Dutch
  • 1998 - 2004
  • tagged, lemmatized, annotated (orthographic/phonetic)
  • corpus exploration software (Corex)
  • version 2.0.3.
  • project website

Description

The Corpus Gesproken Nederlands (Corpus Spoken Dutch) is a collection of 900 hours (almost 9 million words) of contemporary spoken Dutch from native speakers in Flanders and the Netherlands.

The speech recordings are aligned with several transcriptions (e.g. orthographic, phonetic) and annotations (syntax, POS-tags). Metadata, lexica, frequency lists and the tool Corex which can be used to explore the data are included.

Download page

https://taalmaterialen.ivdnt.org/download/tstc-corpus-gesproken-nederlands/