Other corpora: Difference between revisions

From Clarin K-Centre
Jump to navigation Jump to search
No edit summary
No edit summary
Line 22: Line 22:
The COREA Coreference Corpus is a corpus of Dutch texts annotated with corerefence relations.
The COREA Coreference Corpus is a corpus of Dutch texts annotated with corerefence relations.


*version 1.0.1 (2014)
*[https://taalmaterialen.ivdnt.org/wp-content/uploads/documentatie/corea_lrec08_en.pdf Paper]
*[https://taalmaterialen.ivdnt.org/wp-content/uploads/documentatie/corea_lrec08_en.pdf Paper]
*[https://corea.tst-centrale.org/ Demo]
*[https://corea.tst-centrale.org/ Demo]
*[http://hdl.handle.net/10032/tm-a2-f9 Download page]
*[http://hdl.handle.net/10032/tm-a2-f9 Download page]
==D-Tuna-corpus==
The D-TUNA Corpus consists of 2400 written and (transcribed) spoken referential expressions.
*version 1.0 (2009)
*[https://taalmaterialen.ivdnt.org/wp-content/uploads/documentatie/dtuna_documentatie_en.pdf Paper]
*[http://hdl.handle.net/10032/tm-a2-k5 Download page]

Revision as of 10:15, 7 December 2021

BasiLex-corpus

The Basilex corpus is an annotated collection of texts written for children in the age from four to twelve years.

BasiScript-corpus

The BasiScript Corpus is an annotated collection of texts written by children in the age from four to twelve years.

CLiPS Stylometry Investigation (CSI) Corpus

The CSI corpus is a yearly expanded corpus of student texts in two genres: essays and reviews. The purpose of this corpus lies primarily in stylometric research, but other applications are possible. There is a vast amount of meta-data available, both on the author (gender, age, sexual orientation, region of origin, personality profile) and on the document (timestamp, genre, veracity, sentiment, grade). The current version of the corpus was assembled in February 2016. Previous versions of the corpus are available from the authors via e-mail request.

COREA-coreferentiecorpus

The COREA Coreference Corpus is a corpus of Dutch texts annotated with corerefence relations.

D-Tuna-corpus

The D-TUNA Corpus consists of 2400 written and (transcribed) spoken referential expressions.