Reference corpora: Difference between revisions
Jump to navigation
Jump to search
No edit summary |
|||
Line 11: | Line 11: | ||
*[https://taalmaterialen.ivdnt.org/download/tstc-lassy-groot-corpus Download] | *[https://taalmaterialen.ivdnt.org/download/tstc-lassy-groot-corpus Download] | ||
*[https://paqu.let.rug.nl:8068/ Online treebank search] | *[https://paqu.let.rug.nl:8068/ Online treebank search] with PaQu | ||
*[http://chn.gretel.ivdnt.org/ Online treebank search] with Federated GrETEL |
Revision as of 15:11, 2 March 2021
Corpus Hedendaags Nederlands
A collection of more than 800,000 texts taken from newspapers, magazines, news broadcasts and legal writings (1814-2013).
The corpus is a combination of the 5, 27 and 38 Million Words Corpora and the PAROLE Corpus, supplemented with newspaper texts from NRC and De Standaard (until 2013).
Online search
Lassy Large
The Lassy Large Corpus is a collection written texts consisting of approximately 700 million words with automatically generated annotations. The lemmas and POS-tags were generated with Tadpole (now Frog) and the syntactical dependency structures were generated with Alpino.
- Download
- Online treebank search with PaQu
- Online treebank search with Federated GrETEL