Newspaper corpora
Newspaper corpora are corpora which exclusively consist of newspaper material.
SumNL: summary-corpus
The SumNL summary corpus is based on 30 clusters. Each cluster consists of a topic and 5-25 newspaper articles relevant to the topic. For each cluster two summaries of different sizes and also extracts consisting of ten sentences from the texts were made.
- version 1.0.1
- data set from 2014
- 1.60 MB
- Download page