Reference corpora
Corpus Hedendaags Nederlands
A collection of more than 800,000 texts taken from newspapers, magazines, news broadcasts and legal writings (1814-2013).
The corpus is a combination of the 5, 27 and 38 Million Words Corpora and the PAROLE Corpus, supplemented with newspaper texts from NRC and De Standaard (until 2013).
Online search
Lassy Large
The Lassy Large Corpus is a collection written texts consisting of approximately 700 million words with automatically generated annotations. The lemmas and POS-tags were generated with Tadpole (now Frog) and the syntactical dependency structures were generated with Alpino.
- Download
- Online treebank search with PaQu
- Online treebank search with Federated GrETEL