Computerondersteunde communicatie corpora
Computerondersteunde communicatie omvat openbare en private communicatieve handelingen online, zoals posts op blogs en fora, reacties op online nieuwssites, sociale media en netwerksites zoals X en Facebook, mobiele telefoon applicaties zoals Whatsapp, email en chatrooms.
Moroccorp
Moroccorp is a corpus of computer-mediated communication in Dutch by Moroccan-Dutch language users, consisting of ten million words of chat material. The data is delivered in a .txt file of 82.4 Mb.
- version 1.1
- data set from 2019 (version 1.0 from 2012)
- 82.4 MB
- Download page
- Ruette, T. and van de Velde, F. (2013) Moroccorp: tien miljoen woorden uit twee Marokkaans-Nederlandse chatkanalen. Lexikos 23: 456-475.
SoNaR Nieuwe Media Corpus
The SoNaR New Media Corpus 1.0 contains new media texts collected within the STEVIN project SoNaR. The corpus contains text messages, tweets and chat messages. The texts were tokenized, POS-tagged and lemmatized.
- version 1.0
- data set from 2013
- 3.50 MB
- Download page