Embeddings: Difference between revisions
Jump to navigation
Jump to search
No edit summary |
No edit summary |
||
Line 14: | Line 14: | ||
*[https://www.textgain.com/portfolio/geenstijl-embeddings/ Project page] | *[https://www.textgain.com/portfolio/geenstijl-embeddings/ Project page] | ||
*[https://www.textgain.com/wp-content/uploads/2021/06/TGTR4-geenstijl.pdf Report] | *[https://www.textgain.com/wp-content/uploads/2021/06/TGTR4-geenstijl.pdf Report] | ||
*[https://www.textgain.com/projects/geenstijl/geenstijl_embeddings.zip Download page] |
Revision as of 09:30, 4 March 2022
Word2Vec embeddings
Repository for the word embeddings described in Evaluating Unsupervised Dutch Word Embeddings as a Linguistic Resource, presented at LREC 2016.
BERT embeddings
GeenStijl.nl embeddings
GeenStijl.nl embeddings contains over 8M messages from the controversial Dutch websites GeenStijl and Dumpert to train a word embedding model that captures the toxic language representations contained in the dataset. The trained word embeddings (±150MB) are released for free and may be useful for further study on toxic online discourse.