Spoken corpora: Difference between revisions

From Clarin K-Centre
Jump to navigation Jump to search
Line 25: Line 25:
* version 1.0 (2008)
* version 1.0 (2008)
* [https://taalmaterialen.ivdnt.org/wp-content/uploads/documentatie/jasmin_lrec2008_en.pdf Recording Speech of Children, Non-Natives and Elderly People for HLT Applications: the JASMIN-CGN Corpus (LREC Proceedings 2008)]
* [https://taalmaterialen.ivdnt.org/wp-content/uploads/documentatie/jasmin_lrec2008_en.pdf Recording Speech of Children, Non-Natives and Elderly People for HLT Applications: the JASMIN-CGN Corpus (LREC Proceedings 2008)]
* [http://hdl.handle.net/10032/tm-a2-j7 http://hdl.handle.net/10032/tm-a2-j7 Download page]
* [http://hdl.handle.net/10032/tm-a2-j7 Download page]

Revision as of 14:11, 11 March 2021

Spoken corpora are corpora that consist of spoken data or material based on spoken data.

Corpus Gesproken Nederlands

(Spoken Dutch Corpus) Almost 9 million words of contemporary spoken Dutch from native speakers in Flanders and the Netherlands.

The speech recordings are aligned with several transcriptions (e.g. orthographic, phonetic) and annotations (syntax, POS-tags). Metadata, lexica, frequency lists and the tool Corex which can be used to explore the data are included.

JASMIN-spraakcorpus

A corpus of contemporary Dutch (Dutch/Flemish) as spoken by children of different age groups, elderly people and non-natives with different mother tongues, and human-machine interaction