Corpora of academic texts: Difference between revisions

From Clarin K-Centre
Jump to navigation Jump to search
(Created page with "[http://hdl.handle.net/10032/tm-a2-s3 Corpus Ondertitelde UVN-Colleges (COUC)]")
 
No edit summary
 
(9 intermediate revisions by 4 users not shown)
Line 1: Line 1:
[http://hdl.handle.net/10032/tm-a2-s3 Corpus Ondertitelde UVN-Colleges (COUC)]
<languages/>
<translate>
<!--T:1-->
Corpora of academic texts contain scholarly writing, which includes research papers, essays and abstracts published in academic journals, conference proceedings, and edited volumes, theses written by students at the undergraduate and graduate levels, and scientific monographs.
 
<!--T:2-->
==Corpus Ondertitelde UVN-Colleges (COUC)==
This corpus contains 57 (2020-07-16) subtitled lectures from the Universiteit van Nederland (UVN). Subtitles were added to existing video recordings of lectures of the UVN.
 
<!--T:3-->
Unlike common subtitles, the subtitles generated in this project are a nearly 100% literal representation of the speech as spoken by the people in the recordings. They contain exact orthographic transcriptions of subsequent words and thus show the peculiarities of the spoken language modality, lacking grammatical coherence typical for written texts.
On the other hand, the transcriptions do not contain speaker noises (such as lip smacks or coughs) nor hesitation sounds as "ehm". For the sake of readability punctuation markers were included.
 
<!--T:4-->
*22 MB
*version 1.0 (2020)
*[http://hdl.handle.net/10032/tm-a2-s3 Download page]
 
<!--T:5-->
==Corpus Nederlands door Natives (CNN)==
Argumentative writing tasks written by 2nd year students.
* [https://corpora.uclouvain.be/catalog/corpus/corpus-nederlands-door-natives-cnn Corpus website]
 
<!--T:6-->
==SABeD corpus==
The SABeD corpus collection project has started on the 1st of March 2021 and is not yet available. The corpus of spoken academic Belgian Dutch will consist of at least 200 lectures.
 
<!--T:7-->
* [https://www.arts.kuleuven.be/ling/language-education-society/projects/sabed Project website]
</translate>

Latest revision as of 14:28, 13 March 2024

Other languages:

Corpora of academic texts contain scholarly writing, which includes research papers, essays and abstracts published in academic journals, conference proceedings, and edited volumes, theses written by students at the undergraduate and graduate levels, and scientific monographs.

Corpus Ondertitelde UVN-Colleges (COUC)

This corpus contains 57 (2020-07-16) subtitled lectures from the Universiteit van Nederland (UVN). Subtitles were added to existing video recordings of lectures of the UVN.

Unlike common subtitles, the subtitles generated in this project are a nearly 100% literal representation of the speech as spoken by the people in the recordings. They contain exact orthographic transcriptions of subsequent words and thus show the peculiarities of the spoken language modality, lacking grammatical coherence typical for written texts. On the other hand, the transcriptions do not contain speaker noises (such as lip smacks or coughs) nor hesitation sounds as "ehm". For the sake of readability punctuation markers were included.

Corpus Nederlands door Natives (CNN)

Argumentative writing tasks written by 2nd year students.

SABeD corpus

The SABeD corpus collection project has started on the 1st of March 2021 and is not yet available. The corpus of spoken academic Belgian Dutch will consist of at least 200 lectures.