Spoken corpora: Difference between revisions
Jump to navigation
Jump to search
No edit summary |
No edit summary |
||
Line 1: | Line 1: | ||
Spoken corpora are corpora that consist of spoken data or material based on spoken data. | Spoken corpora are corpora that consist of spoken data or material based on spoken data. | ||
==Corpus Gesproken Nederlands== | |||
(Spoken Dutch Corpus) | |||
* 900 hours of spoken Dutch | |||
* 1998 - 2004 | |||
* tagged, lemmatized, annotated (orthographic/phonetic) | |||
* corpus exploration software (Corex) | |||
* version 2.0.3. | |||
* [https://taalmaterialen.ivdnt.org/wp-content/uploads/documentatie/cgn_website/doc_English/start.htm Project website] | |||
* [http://hdl.handle.net/10032/tm-a2-k6 http://hdl.handle.net/10032/tm-a2-k6 Download page] | |||
* Online search: [https://portal.clarin.inl.nl/opensonar_frontend/opensonar/search OpenSonar]. If you go to ''Extended Mode'' you can select to exclusively search in the Corpus Spoken Dutch. | |||
=== Description === | |||
The Corpus Gesproken Nederlands (Corpus Spoken Dutch) is a collection of 900 hours (almost 9 million words) of contemporary spoken Dutch from native speakers in Flanders and the Netherlands. | |||
The speech recordings are aligned with several transcriptions (e.g. orthographic, phonetic) and annotations (syntax, POS-tags). Metadata, lexica, frequency lists and the tool Corex which can be used to explore the data are included. | |||
* [[JASMIN-spraakcorpus]]: a corpus of contemporary Dutch (Dutch/Flemish) as spoken by children of different age groups, elderly people and non-natives with different mother tongues, and human-machine interaction | * [[JASMIN-spraakcorpus]]: a corpus of contemporary Dutch (Dutch/Flemish) as spoken by children of different age groups, elderly people and non-natives with different mother tongues, and human-machine interaction |
Revision as of 09:19, 2 March 2021
Spoken corpora are corpora that consist of spoken data or material based on spoken data.
Corpus Gesproken Nederlands
(Spoken Dutch Corpus)
- 900 hours of spoken Dutch
- 1998 - 2004
- tagged, lemmatized, annotated (orthographic/phonetic)
- corpus exploration software (Corex)
- version 2.0.3.
- Project website
- http://hdl.handle.net/10032/tm-a2-k6 Download page
- Online search: OpenSonar. If you go to Extended Mode you can select to exclusively search in the Corpus Spoken Dutch.
Description
The Corpus Gesproken Nederlands (Corpus Spoken Dutch) is a collection of 900 hours (almost 9 million words) of contemporary spoken Dutch from native speakers in Flanders and the Netherlands.
The speech recordings are aligned with several transcriptions (e.g. orthographic, phonetic) and annotations (syntax, POS-tags). Metadata, lexica, frequency lists and the tool Corex which can be used to explore the data are included.
- JASMIN-spraakcorpus: a corpus of contemporary Dutch (Dutch/Flemish) as spoken by children of different age groups, elderly people and non-natives with different mother tongues, and human-machine interaction