Spoken corpora: Difference between revisions

From Clarin K-Centre
Jump to navigation Jump to search
Line 13: Line 13:
* version 2.0.3.
* version 2.0.3.
* [https://taalmaterialen.ivdnt.org/wp-content/uploads/documentatie/cgn_website/doc_English/start.htm Project website]
* [https://taalmaterialen.ivdnt.org/wp-content/uploads/documentatie/cgn_website/doc_English/start.htm Project website]
* [http://hdl.handle.net/10032/tm-a2-k6 http://hdl.handle.net/10032/tm-a2-k6 Download page]
* [http://hdl.handle.net/10032/tm-a2-k6 Download page]
* Online search: [https://portal.clarin.inl.nl/opensonar_frontend/opensonar/search OpenSonar].  If you go to ''Extended Mode'' you can select to exclusively search in the Corpus Spoken Dutch.
* [https://portal.clarin.inl.nl/opensonar_frontend/opensonar/search Online search with OpenSonar].  If you go to ''Extended Mode'' you can select to exclusively search in the Corpus Spoken Dutch. (See [[Corpus query]] for more information on OpenSonar.


==JASMIN-spraakcorpus==
==JASMIN-spraakcorpus==

Revision as of 12:26, 9 March 2021

Spoken corpora are corpora that consist of spoken data or material based on spoken data.

Corpus Gesproken Nederlands

(Spoken Dutch Corpus) Almost 9 million words of contemporary spoken Dutch from native speakers in Flanders and the Netherlands.

The speech recordings are aligned with several transcriptions (e.g. orthographic, phonetic) and annotations (syntax, POS-tags). Metadata, lexica, frequency lists and the tool Corex which can be used to explore the data are included.

  • 900 hours of spoken Dutch
  • 1998 - 2004
  • tagged, lemmatized, annotated (orthographic/phonetic)
  • corpus exploration software (Corex)
  • version 2.0.3.
  • Project website
  • Download page
  • Online search with OpenSonar. If you go to Extended Mode you can select to exclusively search in the Corpus Spoken Dutch. (See Corpus query for more information on OpenSonar.

JASMIN-spraakcorpus

A corpus of contemporary Dutch (Dutch/Flemish) as spoken by children of different age groups, elderly people and non-natives with different mother tongues, and human-machine interaction