Corpus Gesproken Nederlands (CGN): Difference between revisions

From Clarin K-Centre
Jump to navigation Jump to search
(Created page with "== Properties == * 900 hours of spoken Dutch * 1998 - 2004 * tagged, lemmatized, annotated (orthographic/phonetic) * corpus exploration software (Corex) * version 2.0.3. * [ht...")
 
No edit summary
 
(3 intermediate revisions by one other user not shown)
Line 13: Line 13:


== Download page ==
== Download page ==
[https://taalmaterialen.ivdnt.org/download/tstc-wablieft-corpus-1-2/ https://taalmaterialen.ivdnt.org/download/tstc-wablieft-corpus-1-2/]
[http://hdl.handle.net/10032/tm-a2-k6 http://hdl.handle.net/10032/tm-a2-k6]
 
== Online search ==
You can search the corpus without downloading, at [https://portal.clarin.inl.nl/opensonar_frontend/opensonar/search OpenSonar]. If you go to ''Extended Mode'' you can select to exclusively search in the Corpus Spoken Dutch.

Latest revision as of 09:00, 22 February 2021

Properties

  • 900 hours of spoken Dutch
  • 1998 - 2004
  • tagged, lemmatized, annotated (orthographic/phonetic)
  • corpus exploration software (Corex)
  • version 2.0.3.
  • project website

Description

The Corpus Gesproken Nederlands (Corpus Spoken Dutch) is a collection of 900 hours (almost 9 million words) of contemporary spoken Dutch from native speakers in Flanders and the Netherlands.

The speech recordings are aligned with several transcriptions (e.g. orthographic, phonetic) and annotations (syntax, POS-tags). Metadata, lexica, frequency lists and the tool Corex which can be used to explore the data are included.

Download page

http://hdl.handle.net/10032/tm-a2-k6

Online search

You can search the corpus without downloading, at OpenSonar. If you go to Extended Mode you can select to exclusively search in the Corpus Spoken Dutch.