Children's language: Difference between revisions
Jump to navigation
Jump to search
Line 21: | Line 21: | ||
==CHILDES== | ==CHILDES== | ||
CHILDES contains a large collection of corpora, which are datasets of transcripts of child-adult interactions, typically annotated and searchable. These include conversations, storytelling, and other linguistic exchanges, gathered from children of various languages, ages, and contexts. | CHILDES contains a large collection of corpora, which are datasets of transcripts of child-adult interactions, typically annotated and searchable. These include conversations, storytelling, and other linguistic exchanges, gathered from children of various languages, ages, and contexts. A login is required. | ||
*[https://childes.talkbank.org/access/DutchAfrikaans/ index to CHILDES data] from Dutch and Afrikaans. | *[https://childes.talkbank.org/access/DutchAfrikaans/ index to CHILDES data] from Dutch and Afrikaans. | ||
Line 27: | Line 28: | ||
Subcorpora: | Subcorpora: | ||
*[https://childes.talkbank.org/access/Biling/DeHouwer.html Dutch-English De Houwer Corpus] | *[https://childes.talkbank.org/access/Biling/DeHouwer.html Dutch-English De Houwer Corpus]: the study focuses on dialect features unique to the Antwerp area | ||
*[https://childes.talkbank.org/access/DutchAfrikaans/Asymmetries.html The Asymmetries Project] collection contains Dutch language productions gathered in Groningen and neighboring towns in the northern Netherlands, between 2007 and 2012 | |||
*[https://childes.talkbank.org/access/Frogs/Dutch-AarssenBos.html Aarssen/Bos] This database contains 1021 transcripts collected in the Netherlands, Turkey, and Morocco by Jeroen Aarssen and Petra Bos, at Tilburg University. Bilingual data (either Turkish-Dutch or Moroccan Arabic-Dutch) were collected within the framework of a longitudinal study into development of bilingualism among Turkish and Moroccan children in the Netherlands. | |||
</translate> | </translate> |
Revision as of 14:04, 11 December 2024
Jasmin Speech corpus
BasiLex-corpus
The Basilex corpus is an annotated collection of texts written for children in the age from four to twelve years.
- version 1.0 (2015)
- Tellings, A., Hulsbosch, M., Vermeer, A. & van den Bosch, A. (2015). BasiLex: an 11.5-million words corpus of Dutch texts written for children. Computational Linguistics in the Netherlands Journal 4, 191-208
- Download page
BasiScript-corpus
The BasiScript Corpus is an annotated collection of texts written by children in the age from four to twelve years.
- version 1.0 (2015)
- Project page
- Download page
CHILDES
CHILDES contains a large collection of corpora, which are datasets of transcripts of child-adult interactions, typically annotated and searchable. These include conversations, storytelling, and other linguistic exchanges, gathered from children of various languages, ages, and contexts. A login is required.
- index to CHILDES data from Dutch and Afrikaans.
- browse the Dutch database online
Subcorpora:
- Dutch-English De Houwer Corpus: the study focuses on dialect features unique to the Antwerp area
- The Asymmetries Project collection contains Dutch language productions gathered in Groningen and neighboring towns in the northern Netherlands, between 2007 and 2012
- Aarssen/Bos This database contains 1021 transcripts collected in the Netherlands, Turkey, and Morocco by Jeroen Aarssen and Petra Bos, at Tilburg University. Bilingual data (either Turkish-Dutch or Moroccan Arabic-Dutch) were collected within the framework of a longitudinal study into development of bilingualism among Turkish and Moroccan children in the Netherlands.