Newspaper corpora: Difference between revisions

From Clarin K-Centre
Jump to navigation Jump to search
No edit summary
No edit summary
Line 1: Line 1:
Newspaper corpora are corpora which exclusively consist of newspaper material.
Newspaper corpora are corpora which exclusively consist of newspaper material.


We have the following corpora:
==SumNL: summary-corpus==
The SumNL summary corpus is based on 30 clusters. Each cluster consists of a topic and 5-25 newspaper articles relevant to the topic. For each cluster two summaries of different sizes and also extracts consisting of ten sentences from the texts were made.
 
* version 1.0.1
* data set from 2014
* 1.60 MB
* [https://taalmaterialen.ivdnt.org/download/tstc-sumnl-samenvattingencorpus/ Download page]


* [[SumNL]]: summary-corpus
* [[Wablieft corpus]]: easy language
* [[Wablieft corpus]]: easy language
* [[Corpus VU-DNC (VU University Diachronic News text Corpus)]]
* [[Corpus VU-DNC (VU University Diachronic News text Corpus)]]

Revision as of 09:57, 2 March 2021

Newspaper corpora are corpora which exclusively consist of newspaper material.

SumNL: summary-corpus

The SumNL summary corpus is based on 30 clusters. Each cluster consists of a topic and 5-25 newspaper articles relevant to the topic. For each cluster two summaries of different sizes and also extracts consisting of ten sentences from the texts were made.