Corpora and lexicons: Difference between revisions

From Clarin K-Centre
Jump to navigation Jump to search
(Created page with "==Brieven als Buit== ==Gysseling Corpus== The Gysseling Corpus is the collection of all 13th-century texts that have served as source material for the Dictionary of Early Mi...")
 
No edit summary
Line 6: Line 6:


The texts are diplomatic editions, which means that the source texts have been rendered in modern script as accurately as possible. The corpus has been linguistically annotated with word classes and modern Dutch lemmas (entry words) to enhance its searchability. The annotation of the entire corpus has been manually verified.
The texts are diplomatic editions, which means that the source texts have been rendered in modern script as accurately as possible. The corpus has been linguistically annotated with word classes and modern Dutch lemmas (entry words) to enhance its searchability. The annotation of the entire corpus has been manually verified.
==Corpus of Contemporary Dutch==
In order to monitor contemporary Dutch, the Dutch Language Institute has created the Corpus of Contemporary Dutch (CHN): an ever-growing collection of already more than 800,000 texts from newspapers, magazines, news broadcasts and legal materials.
Contents of the CHN
We try to include sources in this corpus that continually provide us with new text materials. But in principle, all text materials used in the various projects of the Dutch Language Institute end up in the CHN, such as the ANW corpus (1970 – now), compiled for our Dictionary of Contemporary Dutch.
From 1994 onwards, the Institute for Dutch Lexicology (INL), predecessor of the Dutch Language Institute, put several corpora of contemporary Dutch online: the 5, 27 and 38 million words corpora, and the Dutch Parole Internet Corpus. The materials from these older corpora have been added to the CHN.

Revision as of 15:31, 26 November 2020

Brieven als Buit

Gysseling Corpus

The Gysseling Corpus is the collection of all 13th-century texts that have served as source material for the Dictionary of Early Middle Dutch (VMNW). The corpus consists mainly of official and literary sources of thirteenth-century texts that have been handed down in 13th-century manuscripts.

The texts are diplomatic editions, which means that the source texts have been rendered in modern script as accurately as possible. The corpus has been linguistically annotated with word classes and modern Dutch lemmas (entry words) to enhance its searchability. The annotation of the entire corpus has been manually verified.

Corpus of Contemporary Dutch

In order to monitor contemporary Dutch, the Dutch Language Institute has created the Corpus of Contemporary Dutch (CHN): an ever-growing collection of already more than 800,000 texts from newspapers, magazines, news broadcasts and legal materials.

Contents of the CHN

We try to include sources in this corpus that continually provide us with new text materials. But in principle, all text materials used in the various projects of the Dutch Language Institute end up in the CHN, such as the ANW corpus (1970 – now), compiled for our Dictionary of Contemporary Dutch.

From 1994 onwards, the Institute for Dutch Lexicology (INL), predecessor of the Dutch Language Institute, put several corpora of contemporary Dutch online: the 5, 27 and 38 million words corpora, and the Dutch Parole Internet Corpus. The materials from these older corpora have been added to the CHN.