Spoken corpora: Difference between revisions

Revision as of 09:19, 2 March 2021

Spoken corpora are corpora that consist of spoken data or material based on spoken data.

Corpus Gesproken Nederlands

(Spoken Dutch Corpus)

900 hours of spoken Dutch
1998 - 2004
tagged, lemmatized, annotated (orthographic/phonetic)
corpus exploration software (Corex)
version 2.0.3.
Project website
http://hdl.handle.net/10032/tm-a2-k6 Download page
Online search: OpenSonar. If you go to Extended Mode you can select to exclusively search in the Corpus Spoken Dutch.

Description

The Corpus Gesproken Nederlands (Corpus Spoken Dutch) is a collection of 900 hours (almost 9 million words) of contemporary spoken Dutch from native speakers in Flanders and the Netherlands.

The speech recordings are aligned with several transcriptions (e.g. orthographic, phonetic) and annotations (syntax, POS-tags). Metadata, lexica, frequency lists and the tool Corex which can be used to explore the data are included.

JASMIN-spraakcorpus: a corpus of contemporary Dutch (Dutch/Flemish) as spoken by children of different age groups, elderly people and non-natives with different mother tongues, and human-machine interaction

@@ Line 1: / Line 1: @@
 Spoken corpora are corpora that consist of spoken data or material based on spoken data.
-We have the following corpora:
+==Corpus Gesproken Nederlands==
+(Spoken Dutch Corpus)
+* 900 hours of spoken Dutch
+* 1998 - 2004
+* tagged, lemmatized, annotated (orthographic/phonetic)
+* corpus exploration software (Corex)
+* version 2.0.3.
+* [https://taalmaterialen.ivdnt.org/wp-content/uploads/documentatie/cgn_website/doc_English/start.htm Project website]
+* [http://hdl.handle.net/10032/tm-a2-k6 http://hdl.handle.net/10032/tm-a2-k6 Download page]
+* Online search: [https://portal.clarin.inl.nl/opensonar_frontend/opensonar/search OpenSonar].  If you go to ''Extended Mode'' you can select to exclusively search in the Corpus Spoken Dutch.
+=== Description ===
+The Corpus Gesproken Nederlands (Corpus Spoken Dutch) is a collection of 900 hours (almost 9 million words) of contemporary spoken Dutch from native speakers in Flanders and the Netherlands.
+The speech recordings are aligned with several transcriptions (e.g. orthographic, phonetic) and annotations (syntax, POS-tags). Metadata, lexica, frequency lists and the tool Corex which can be used to explore the data are included.
-* [[Corpus Gesproken Nederlands (CGN)]]: Spoken Dutch Corpus (Dutch/Flemish)
 * [[JASMIN-spraakcorpus]]: a corpus of contemporary Dutch (Dutch/Flemish) as spoken by children of different age groups, elderly people and non-natives with different mother tongues, and human-machine interaction