Reference corpora: Difference between revisions

From Clarin K-Centre
Jump to navigation Jump to search
No edit summary
Line 11: Line 11:


*[https://taalmaterialen.ivdnt.org/download/tstc-lassy-groot-corpus Download]
*[https://taalmaterialen.ivdnt.org/download/tstc-lassy-groot-corpus Download]
*[https://paqu.let.rug.nl:8068/ Online treebank search]
*[https://paqu.let.rug.nl:8068/ Online treebank search] with PaQu
*[http://chn.gretel.ivdnt.org/ Online treebank search] with Federated GrETEL

Revision as of 15:11, 2 March 2021

Corpus Hedendaags Nederlands

A collection of more than 800,000 texts taken from newspapers, magazines, news broadcasts and legal writings (1814-2013).

The corpus is a combination of the 5, 27 and 38 Million Words Corpora and the PAROLE Corpus, supplemented with newspaper texts from NRC and De Standaard (until 2013).

Online search

Lassy Large

The Lassy Large Corpus is a collection written texts consisting of approximately 700 million words with automatically generated annotations. The lemmas and POS-tags were generated with Tadpole (now Frog) and the syntactical dependency structures were generated with Alpino.