Social media corpora: Difference between revisions
Jump to navigation
Jump to search
No edit summary |
|||
Line 3: | Line 3: | ||
* [https://github.com/tommasoc80/DALC Github] | * [https://github.com/tommasoc80/DALC Github] | ||
* Publication: Caselli, Tommaso, Schelhaas, Arjan, Weultjes, Marieke, Leistra, Folkert, van der Veen, Hylke, Timmerman, Gerben and Nissim, Malvina (2021). DALC: the Dutch Abusive Language Corpus. Proceedings of the 5th Workshop on Online Abuse and Harms (WOAH). Association for Computational Linguistics | * Publication: '''Caselli, Tommaso, Schelhaas, Arjan, Weultjes, Marieke, Leistra, Folkert, van der Veen, Hylke, Timmerman, Gerben and Nissim, Malvina''' (2021). [https://aclanthology.org/2021.woah-1.6/ DALC: the Dutch Abusive Language Corpus.] Proceedings of the 5th Workshop on Online Abuse and Harms (WOAH). Association for Computational Linguistics | ||
==SoNaR New Media Corpus== | ==SoNaR New Media Corpus== |
Revision as of 11:25, 20 January 2022
DALC Dutch Abusive Language Corpus
The Dutch Abusive Language Corpus v1.0 (DALC v1.0)
- Github
- Publication: Caselli, Tommaso, Schelhaas, Arjan, Weultjes, Marieke, Leistra, Folkert, van der Veen, Hylke, Timmerman, Gerben and Nissim, Malvina (2021). DALC: the Dutch Abusive Language Corpus. Proceedings of the 5th Workshop on Online Abuse and Harms (WOAH). Association for Computational Linguistics
SoNaR New Media Corpus
The SoNaR New Media Corpus 1.0 contains new media texts collected within the STEVIN project SoNaR. The corpus contains text messages, tweets and chat messages. The texts were tokenized, POS-tagged and lemmatized.
Whatsapp corpus Verheijen
Whatsappdata collected for the PhD research of Lieke Verheijen (Radboud University). Informed consent was only obtained from the contributor and not from the conversational partner. Consequently, the subcorpus only contains contributions from the submitter.