Corpora van academische teksten
Corpora van academische teksten bevatten wetenschappelijke publicaties, zoals onderzoekspapers, essays en abstracts die zijn gepubliceerd in academische tijdschriften, conferentie notulen, scripties geschreven door studenten van bachelor en gediplomeerd niveau en wetenschappelijke monografieën.
Corpus Ondertitelde UVN-Colleges (COUC)
Dit corpus bevat 57 (2020-07-16) ondertitelde colleges van de Universiteit van Nederland (UVN). Er is ondertiteling toegevoegd aan bestaande video-opnames van colleges van de UVN.
Unlike common subtitles, the subtitles generated in this project are a nearly 100% literal representation of the speech as spoken by the people in the recordings. They contain exact orthographic transcriptions of subsequent words and thus show the peculiarities of the spoken language modality, lacking grammatical coherence typical for written texts. On the other hand, the transcriptions do not contain speaker noises (such as lip smacks or coughs) nor hesitation sounds as "ehm". For the sake of readability punctuation markers were included.
- 22 MB
- version 1.0 (2020)
- Download page
Corpus Nederlands door Natives (CNN)
Argumentative writing tasks written by 2nd year students.
SABeD corpus
The SABeD corpus collection project has started on the 1st of March 2021 and is not yet available. The corpus of spoken academic Belgian Dutch will consist of at least 200 lectures.