Social media corpora: Difference between revisions

Revision as of 14:47, 15 November 2021

DALC Dutch Abusive Language Corpus

The Dutch Abusive Language Corpus v1.0 (DALC v1.0)

Github
Publication: Caselli, Tommaso, Schelhaas, Arjan, Weultjes, Marieke, Leistra, Folkert, van der Veen, Hylke, Timmerman, Gerben and Nissim, Malvina (2021). DALC: the Dutch Abusive Language Corpus. Proceedings of the 5th Workshop on Online Abuse and Harms (WOAH). Association for Computational Linguistics

SoNaR New Media Corpus

The SoNaR New Media Corpus 1.0 contains new media texts collected within the STEVIN project SoNaR. The corpus contains text messages, tweets and chat messages. The texts were tokenized, POS-tagged and lemmatized.

Download page

Whatsapp corpus Verheijen

Whatsappdata collected for the PhD research of Lieke Verheijen (Radboud University). Informed consent was only obtained from the contributor and not from the conversational partner. Consequently, the subcorpus only contains contributions from the submitter.

Project website

@@ Line 5: / Line 5: @@
 * Publication: Caselli, Tommaso, Schelhaas, Arjan, Weultjes, Marieke, Leistra, Folkert, van der Veen, Hylke, Timmerman, Gerben and Nissim, Malvina (2021). DALC: the Dutch Abusive Language Corpus. Proceedings of the 5th Workshop on Online Abuse and Harms (WOAH). Association for Computational Linguistics
-==Sonar new media==
+==SoNaR New Media Corpus==
-*download website: http://hdl.handle.net/10032/tm-a2-k3
+The SoNaR New Media Corpus 1.0 contains new media texts collected within the STEVIN project SoNaR. The corpus contains text messages, tweets and chat messages. The texts were tokenized, POS-tagged and lemmatized.
+* [http://hdl.handle.net/10032/tm-a2-k3 Download page]
 ==Whatsapp corpus Verheijen==
-*project website: https://easy.dans.knaw.nl/ui/datasets/id/easy-dataset:112987
+Whatsappdata collected for the PhD research of Lieke Verheijen (Radboud University). Informed consent was only obtained from the contributor and not from the conversational partner. Consequently, the subcorpus only contains contributions from the submitter.
+* [https://easy.dans.knaw.nl/ui/datasets/id/easy-dataset:112987 Project website]

Social media corpora: Difference between revisions

Revision as of 14:47, 15 November 2021

DALC Dutch Abusive Language Corpus

SoNaR New Media Corpus

Whatsapp corpus Verheijen

Navigation menu

Search