Computer-mediated communication corpora: Difference between revisions

From Clarin K-Centre
Jump to navigation Jump to search
No edit summary
No edit summary
Line 1: Line 1:
<translate>
Computer-mediated communication (CMC) constitutes public and private communication on-line, such as posts on blogs, forums, comments on online news sites, social media and networking sites such as Twitter and Facebook, mobile phone applications such as WhatsApp, e-mail and chat rooms.
Computer-mediated communication (CMC) constitutes public and private communication on-line, such as posts on blogs, forums, comments on online news sites, social media and networking sites such as Twitter and Facebook, mobile phone applications such as WhatsApp, e-mail and chat rooms.


Line 17: Line 18:
* 3.50 MB
* 3.50 MB
* [http://hdl.handle.net/10032/tm-a2-k3 Download page]
* [http://hdl.handle.net/10032/tm-a2-k3 Download page]
</translate>

Revision as of 13:17, 13 March 2024

Computer-mediated communication (CMC) constitutes public and private communication on-line, such as posts on blogs, forums, comments on online news sites, social media and networking sites such as Twitter and Facebook, mobile phone applications such as WhatsApp, e-mail and chat rooms.

Moroccorp

Moroccorp is a corpus of computer-mediated communication in Dutch by Moroccan-Dutch language users, consisting of ten million words of chat material. The data is delivered in a .txt file of 82.4 Mb.

SoNaR Nieuwe Media Corpus

The SoNaR New Media Corpus 1.0 contains new media texts collected within the STEVIN project SoNaR. The corpus contains text messages, tweets and chat messages. The texts were tokenized, POS-tagged and lemmatized.