Corpus Gesproken Nederlands (CGN): Difference between revisions

Revision as of 15:25, 18 February 2021

Properties

900 hours of spoken Dutch
1998 - 2004
tagged, lemmatized, annotated (orthographic/phonetic)
corpus exploration software (Corex)
version 2.0.3.
project website

Description

The Corpus Gesproken Nederlands (Corpus Spoken Dutch) is a collection of 900 hours (almost 9 million words) of contemporary spoken Dutch from native speakers in Flanders and the Netherlands.

The speech recordings are aligned with several transcriptions (e.g. orthographic, phonetic) and annotations (syntax, POS-tags). Metadata, lexica, frequency lists and the tool Corex which can be used to explore the data are included.

Download page

http://hdl.handle.net/10032/tm-a2-k6

Revision as of 15:24, 18 February 2021 view source Laura (talk \| contribs) 33 edits →Download page ← Older edit		Revision as of 15:25, 18 February 2021 view source Laura (talk \| contribs) 33 edits →Download page Newer edit →
Line 13:		Line 13:

	== Download page ==		== Download page ==
	[~~https~~://~~taalmaterialen~~.~~ivdnt~~.~~org~~/~~download~~/~~tstc~~-~~corpus~~-~~gesproken-nederlands/ https~~://~~taalmaterialen~~.~~ivdnt~~.~~org~~/~~download~~/~~tstc-corpus~~-~~gesproken~~-~~nederlands/~~]		[http://hdl.handle.net/10032/tm-a2-k6 http://hdl.handle.net/10032/tm-a2-k6]