Computer-mediated communication corpora: Difference between revisions
Jump to navigation
Jump to search
No edit summary |
(Marked this version for translation) |
||
Line 1: | Line 1: | ||
<translate> | <translate> | ||
<!--T:1--> | |||
Computer-mediated communication (CMC) constitutes public and private communication on-line, such as posts on blogs, forums, comments on online news sites, social media and networking sites such as Twitter and Facebook, mobile phone applications such as WhatsApp, e-mail and chat rooms. | Computer-mediated communication (CMC) constitutes public and private communication on-line, such as posts on blogs, forums, comments on online news sites, social media and networking sites such as Twitter and Facebook, mobile phone applications such as WhatsApp, e-mail and chat rooms. | ||
<!--T:2--> | |||
==Moroccorp== | ==Moroccorp== | ||
Moroccorp is a corpus of computer-mediated communication in Dutch by Moroccan-Dutch language users, consisting of ten million words of chat material. The data is delivered in a .txt file of 82.4 Mb. | Moroccorp is a corpus of computer-mediated communication in Dutch by Moroccan-Dutch language users, consisting of ten million words of chat material. The data is delivered in a .txt file of 82.4 Mb. | ||
<!--T:3--> | |||
*version 1.1 | *version 1.1 | ||
*data set from 2019 (version 1.0 from 2012) | *data set from 2019 (version 1.0 from 2012) | ||
Line 11: | Line 14: | ||
*[https://taalmaterialen.ivdnt.org/wp-content/uploads/documentatie/RuetteVandeVelde_2013final_Moroccorp_corpus_chattaal.pdf Ruette, T. and van de Velde, F. (2013) Moroccorp: tien miljoen woorden uit twee Marokkaans-Nederlandse chatkanalen. Lexikos 23: 456-475.] | *[https://taalmaterialen.ivdnt.org/wp-content/uploads/documentatie/RuetteVandeVelde_2013final_Moroccorp_corpus_chattaal.pdf Ruette, T. and van de Velde, F. (2013) Moroccorp: tien miljoen woorden uit twee Marokkaans-Nederlandse chatkanalen. Lexikos 23: 456-475.] | ||
<!--T:4--> | |||
==SoNaR Nieuwe Media Corpus== | ==SoNaR Nieuwe Media Corpus== | ||
The SoNaR New Media Corpus 1.0 contains new media texts collected within the STEVIN project SoNaR. The corpus contains text messages, tweets and chat messages. The texts were tokenized, POS-tagged and lemmatized. | The SoNaR New Media Corpus 1.0 contains new media texts collected within the STEVIN project SoNaR. The corpus contains text messages, tweets and chat messages. The texts were tokenized, POS-tagged and lemmatized. | ||
<!--T:5--> | |||
* version 1.0 | * version 1.0 | ||
* data set from 2013 | * data set from 2013 |
Revision as of 13:18, 13 March 2024
Computer-mediated communication (CMC) constitutes public and private communication on-line, such as posts on blogs, forums, comments on online news sites, social media and networking sites such as Twitter and Facebook, mobile phone applications such as WhatsApp, e-mail and chat rooms.
Moroccorp
Moroccorp is a corpus of computer-mediated communication in Dutch by Moroccan-Dutch language users, consisting of ten million words of chat material. The data is delivered in a .txt file of 82.4 Mb.
- version 1.1
- data set from 2019 (version 1.0 from 2012)
- 82.4 MB
- Download page
- Ruette, T. and van de Velde, F. (2013) Moroccorp: tien miljoen woorden uit twee Marokkaans-Nederlandse chatkanalen. Lexikos 23: 456-475.
SoNaR Nieuwe Media Corpus
The SoNaR New Media Corpus 1.0 contains new media texts collected within the STEVIN project SoNaR. The corpus contains text messages, tweets and chat messages. The texts were tokenized, POS-tagged and lemmatized.
- version 1.0
- data set from 2013
- 3.50 MB
- Download page