Wordlists: Difference between revisions

From Clarin K-Centre
Jump to navigation Jump to search
No edit summary
Line 12: Line 12:
* [http://crr.ugent.be/programs-data/subtitle-frequencies/subtlex-nl Project page]
* [http://crr.ugent.be/programs-data/subtitle-frequencies/subtlex-nl Project page]
* Reference: Keuleers, E., Brysbaert, M. & New, B. (2010). SUBTLEX-NL: A new frequency measure for Dutch words based on film subtitles. Behavior Research Methods, 42(3), 643-650.
* Reference: Keuleers, E., Brysbaert, M. & New, B. (2010). SUBTLEX-NL: A new frequency measure for Dutch words based on film subtitles. Behavior Research Methods, 42(3), 643-650.
==CombiLex==
CombiLex is a list of Dutch lemmas and word forms without further annotation. The lexicon contains over 213.000 unique lemmas and over 442.000 unique lemmas and word forms.
* [https://taalmaterialen.ivdnt.org/wp-content/uploads/documentatie/clex_documentatie_en.pdf Documentation]
* [http://hdl.handle.net/10032/tm-a2-k2 Download page]

Revision as of 13:28, 7 December 2021

Woordenlijst van de Nederlandse Taal

Since 1804, our spelling has been fixed by the government. This includes basic principles and specific rules, such as those for spelling vowels and consonants, the use of capitals and characters (accents, hyphens, punctuation marks and apostrophes), the spelling of compounds with a middle sound (pancake, briefcase) and the division of words into syllables. In addition, the government publishes a list of words that are spelled according to the rules and others that are difficult to derive from rules, for example words that we adopt from other languages.

At the end of 2015, the Woordenlijst van de Nederlandse Taal contained over 180,000 keywords. In the online version provided with the Woordenlijst (woordenlijst.org) these words can all be found, amply provided with data on hyphenation, inflection and conjugation.

Subtlex NL

SUBTLEX-NL is a database of Dutch word frequencies based on 44 million words from film and television subtitles.

  • Project page
  • Reference: Keuleers, E., Brysbaert, M. & New, B. (2010). SUBTLEX-NL: A new frequency measure for Dutch words based on film subtitles. Behavior Research Methods, 42(3), 643-650.

CombiLex

CombiLex is a list of Dutch lemmas and word forms without further annotation. The lexicon contains over 213.000 unique lemmas and over 442.000 unique lemmas and word forms.