Corpus Gesproken Nederlands (CGN): Difference between revisions
Jump to navigation
Jump to search
No edit summary |
|||
(One intermediate revision by one other user not shown) | |||
Line 13: | Line 13: | ||
== Download page == | == Download page == | ||
[ | [http://hdl.handle.net/10032/tm-a2-k6 http://hdl.handle.net/10032/tm-a2-k6] | ||
== Online search == | |||
You can search the corpus without downloading, at [https://portal.clarin.inl.nl/opensonar_frontend/opensonar/search OpenSonar]. If you go to ''Extended Mode'' you can select to exclusively search in the Corpus Spoken Dutch. |
Latest revision as of 09:00, 22 February 2021
Properties
- 900 hours of spoken Dutch
- 1998 - 2004
- tagged, lemmatized, annotated (orthographic/phonetic)
- corpus exploration software (Corex)
- version 2.0.3.
- project website
Description
The Corpus Gesproken Nederlands (Corpus Spoken Dutch) is a collection of 900 hours (almost 9 million words) of contemporary spoken Dutch from native speakers in Flanders and the Netherlands.
The speech recordings are aligned with several transcriptions (e.g. orthographic, phonetic) and annotations (syntax, POS-tags). Metadata, lexica, frequency lists and the tool Corex which can be used to explore the data are included.
Download page
http://hdl.handle.net/10032/tm-a2-k6
Online search
You can search the corpus without downloading, at OpenSonar. If you go to Extended Mode you can select to exclusively search in the Corpus Spoken Dutch.