Spoken corpora: Difference between revisions

From Clarin K-Centre
Jump to navigation Jump to search
No edit summary
Line 15: Line 15:
* [http://hdl.handle.net/10032/tm-a2-k6 Download page]
* [http://hdl.handle.net/10032/tm-a2-k6 Download page]
* [https://portal.clarin.inl.nl/opensonar_frontend/opensonar/search Online search with OpenSonar].  If you go to ''Extended Mode'' you can select to exclusively search in the Corpus Spoken Dutch. (See [[Corpus querying]] for more information on OpenSonar.)
* [https://portal.clarin.inl.nl/opensonar_frontend/opensonar/search Online search with OpenSonar].  If you go to ''Extended Mode'' you can select to exclusively search in the Corpus Spoken Dutch. (See [[Corpus querying]] for more information on OpenSonar.)
==IFA Spoken Language Corpus==
The IFA Spoken Language corpus is a free (GPL) database of hand-segmented Dutch speech. It was constructed with off-the-shelf software using speech from 8 speakers in a variety of speaking styles. For a total of 50,000 words (41 minutes/speaker), speech acquisition and preparation took around 3 person-weeks per speaker.
*version 1.0 (2001)
*4.6 MB
*[http://hdl.handle.net/10032/tm-a2-n8 Download page]
*[https://www.fon.hum.uva.nl/IFA-SpokenLanguageCorpora/IFAcorpus/ Project website]


==JASMIN-spraakcorpus==
==JASMIN-spraakcorpus==

Revision as of 14:21, 11 March 2021

Spoken corpora are corpora that consist of spoken data or material based on spoken data.

Corpus Gesproken Nederlands

(Spoken Dutch Corpus) Almost 9 million words of contemporary spoken Dutch from native speakers in Flanders and the Netherlands.

The speech recordings are aligned with several transcriptions (e.g. orthographic, phonetic) and annotations (syntax, POS-tags). Metadata, lexica, frequency lists and the tool Corex which can be used to explore the data are included.

IFA Spoken Language Corpus

The IFA Spoken Language corpus is a free (GPL) database of hand-segmented Dutch speech. It was constructed with off-the-shelf software using speech from 8 speakers in a variety of speaking styles. For a total of 50,000 words (41 minutes/speaker), speech acquisition and preparation took around 3 person-weeks per speaker.

JASMIN-spraakcorpus

A corpus of contemporary Dutch (Dutch/Flemish) as spoken by children of different age groups, elderly people and non-natives with different mother tongues, and human-machine interaction