L2 learner corpora: Difference between revisions
No edit summary |
|||
Line 22: | Line 22: | ||
*[https://corpora.uclouvain.be/catalog/corpus/multinco Corpus webpage] | *[https://corpora.uclouvain.be/catalog/corpus/multinco Corpus webpage] | ||
==Modern Times== | |||
Narrations based on an extract from Modern Times (Ch. Chaplin 1934 or 36) by native speakers and learners of Dutch and French. | |||
*[https://corpora.uclouvain.be/catalog/corpus/modern-times Corpus webpage (currently dead)] |
Revision as of 15:47, 16 November 2021
L2 learner corpora play a crucial role in second language research and pedagogy, allowing for a systematic study of how a learner of a second language acquires the new language on a lexical as well as syntactic level, and how it is influenced by his or her native language. A special characteristic of this type of corpora are the markup of errors and prosodic features of the learners.
Corpus Ondertitelde UVN-Colleges (COUC)
This corpus contains 57 (2020-07-16) subtitled lectures from the Universiteit van Nederland (UVN). Subtitles were added to existing video recordings of lectures of the UVN.
Unlike common subtitles, the subtitles generated in this project are a nearly 100% literal representation of the speech as spoken by the people in the recordings. They contain exact orthographic transcriptions of subsequent words and thus show the peculiarities of the spoken language modality, lacking grammatical coherence typical for written texts. On the other hand, the transcriptions do not contain speaker noises (such as lip smacks or coughs) nor hesitation sounds as "ehm". For the sake of readability punctuation markers were included.
- 22 MB
- version 1.0 (2020)
- Download page
Meertalige Ondertiteldata 2BDutch
This product consists of the subtitle data belonging to the Dutch videos on the website www.2BDutch.nl. The 2BDutch website contains videos with subtitle options in various languages. With these videos, students of all levels of Dutch can practice their listening skills and learn new Dutch words. The subtitle data belonging to these videos can also be used for various language and speech technology applications including automatic translation and automatic speech recognition.
- 36 KB
- version 1.0 (2020)
- Download page
Multilingual Traditional Immersion and Native Corpus
MulTINCo includes spoken and (longitudinal) written data collected from French-speaking learners of Dutch and English as a second language (L2) in different educational settings (CLIL and traditional L2 classes). The database contains numerous background variables, as well as written productions in the learners’ first language (L1) (viz. French) and productions from native speakers of the learners’ L2 (viz. L1 Dutch and L1 English data).
Modern Times
Narrations based on an extract from Modern Times (Ch. Chaplin 1934 or 36) by native speakers and learners of Dutch and French.