Reference corpora: Difference between revisions

From Clarin K-Centre
Jump to navigation Jump to search
(Created page with "== Corpus Hedendaags Nederlands == A collection of more than 800,000 texts taken from newspapers, magazines, news broadcasts and legal writings (1814-2013). The corpus is a c...")
 
No edit summary
Line 5: Line 5:


===[http://chn.ivdnt.org/ Online search]===
===[http://chn.ivdnt.org/ Online search]===
== Lassy Large ==
The Lassy Large Corpus is a collection written texts consisting of approximately 700 million words with automatically generated annotations.
The lemmas and POS-tags were generated with Tadpole (now Frog) and the syntactical dependency structures were generated with Alpino.
*[https://taalmaterialen.ivdnt.org/download/tstc-lassy-groot-corpus Download]
*[https://paqu.let.rug.nl:8068/ Online treebank search]

Revision as of 15:09, 2 March 2021

Corpus Hedendaags Nederlands

A collection of more than 800,000 texts taken from newspapers, magazines, news broadcasts and legal writings (1814-2013).

The corpus is a combination of the 5, 27 and 38 Million Words Corpora and the PAROLE Corpus, supplemented with newspaper texts from NRC and De Standaard (until 2013).

Online search

Lassy Large

The Lassy Large Corpus is a collection written texts consisting of approximately 700 million words with automatically generated annotations. The lemmas and POS-tags were generated with Tadpole (now Frog) and the syntactical dependency structures were generated with Alpino.