Other corpora: Difference between revisions
No edit summary |
|||
Line 12: | Line 12: | ||
* [https://www.narcis.nl/research/RecordID/OND1347377 Project page] | * [https://www.narcis.nl/research/RecordID/OND1347377 Project page] | ||
* [http://hdl.handle.net/10032/tm-a2-p2 Download page] | * [http://hdl.handle.net/10032/tm-a2-p2 Download page] | ||
==CLiPS Stylometry Investigation (CSI) Corpus== | |||
The CSI corpus is a yearly expanded corpus of student texts in two genres: essays and reviews. The purpose of this corpus lies primarily in stylometric research, but other applications are possible. There is a vast amount of meta-data available, both on the author (gender, age, sexual orientation, region of origin, personality profile) and on the document (timestamp, genre, veracity, sentiment, grade). The current version of the corpus was assembled in February 2016. Previous versions of the corpus are available from the authors via e-mail request. | |||
* [https://zenodo.org/record/4639616#.Ya4sX9DMLZR Download page] |
Revision as of 15:35, 6 December 2021
BasiLex-corpus
The Basilex corpus is an annotated collection of texts written for children in the age from four to twelve years.
- version 1.0 (2015)
- Tellings, A., Hulsbosch, M., Vermeer, A. & van den Bosch, A. (2015). BasiLex: an 11.5-million words corpus of Dutch texts written for children. Computational Linguistics in the Netherlands Journal 4, 191-208
- Download page
BasiScript-corpus
The BasiScript Corpus is an annotated collection of texts written by children in the age from four to twelve years.
- version 1.0 (2015)
- Project page
- Download page
CLiPS Stylometry Investigation (CSI) Corpus
The CSI corpus is a yearly expanded corpus of student texts in two genres: essays and reviews. The purpose of this corpus lies primarily in stylometric research, but other applications are possible. There is a vast amount of meta-data available, both on the author (gender, age, sexual orientation, region of origin, personality profile) and on the document (timestamp, genre, veracity, sentiment, grade). The current version of the corpus was assembled in February 2016. Previous versions of the corpus are available from the authors via e-mail request.