Newspaper corpora: Difference between revisions
Jump to navigation
Jump to search
No edit summary |
No edit summary |
||
Line 1: | Line 1: | ||
Newspaper corpora are corpora which exclusively consist of newspaper material. | Newspaper corpora are corpora which exclusively consist of newspaper material. | ||
==SumNL: summary-corpus== | |||
The SumNL summary corpus is based on 30 clusters. Each cluster consists of a topic and 5-25 newspaper articles relevant to the topic. For each cluster two summaries of different sizes and also extracts consisting of ten sentences from the texts were made. | |||
* version 1.0.1 | |||
* data set from 2014 | |||
* 1.60 MB | |||
* [https://taalmaterialen.ivdnt.org/download/tstc-sumnl-samenvattingencorpus/ Download page] | |||
* [[Wablieft corpus]]: easy language | * [[Wablieft corpus]]: easy language | ||
* [[Corpus VU-DNC (VU University Diachronic News text Corpus)]] | * [[Corpus VU-DNC (VU University Diachronic News text Corpus)]] |
Revision as of 09:57, 2 March 2021
Newspaper corpora are corpora which exclusively consist of newspaper material.
SumNL: summary-corpus
The SumNL summary corpus is based on 30 clusters. Each cluster consists of a topic and 5-25 newspaper articles relevant to the topic. For each cluster two summaries of different sizes and also extracts consisting of ten sentences from the texts were made.
- version 1.0.1
- data set from 2014
- 1.60 MB
- Download page