Jump to content

K-Dutch: Difference between revisions

From Clarin K-Centre
Marked this version for translation
 
(186 intermediate revisions by 6 users not shown)
Line 1: Line 1:
<span style="color:white">Mediawiki:Mainpage</span><br>
<languages/>
<translate>
<!--T:29-->
<span style="color:white">Mediawiki:Mainpage</span>


Welcome to [[K-DUTCH]], the place for anyone who wants to know anything about the Dutch language: linguistic properties, language advice, available tools and resources, etymology, dialects...  
<!--T:30-->
[[File:K-centre-logo.jpg|frameless|right]]
Welcome to [[K-Dutch]], the place for anyone who wants to know anything about the Dutch language: linguistic properties, language advice, available tools and resources, etymology, dialects...  


K-Dutch is (will be soon) a [https://www.clarin.eu/content/knowledge-centres CLARIN Knowledge Centre]. It is hosted by the [https://www.ivdnt.org Instituut voor de Nederlandse Taal] (Dutch Language Institute) , which is also a [https://www.clarin.eu/content/certified-centres CLARIN-B centre] and host of many resources for Dutch, which are, in general, freely available for research purposes.
==[[About]]== <!--T:159-->




==Linguisitic topics==
<!--T:34-->
You are most welcome to contribute to these pages, please contact [mailto://servicedesk@ivdnt.org servicedesk@ivdnt.org] with the subject line K-Dutch, and we will be in touch.


===Grammar===
==Linguistic topics== <!--T:105-->
====Phonology, Morphology and Syntax: Taalportaal====
Many aspects of Dutch linguistics are described in the [https://www.taalportaal.org/ Taalportaal website]


Taalportaal (or Language Portal) is an interactive knowledge base about Dutch, Frisian and Afrikaans. It provides access to a comprehensive and authoritative scientific grammar for these three languages.
===[[Grammar]]=== <!--T:35-->
Up to now there has been no comprehensive scientifically-based description of the grammars of Dutch, Frisian and Afrikaans. This is a serious shortcoming, considering that
* language is seen as an important part of cultural identity and cultural heritage
* a large number of people learn these languages as a second language
* educated speakers frequently lack grammatical knowledge of their native language
* Dutch and Afrikaans an important object of study in linguistic theory and related fields of research


Taalportaal fills this gap by providing a thorough description of the '''phonology, morphology and syntax''' of the three languages.
<!--T:36-->
* [[Grammar#Phonology,_morphology_and_syntax:_Taalportaal|Phonology, morphology and syntax: Taalportaal]]


====Morphosyntax====
<!--T:106-->
*[https://dev.clarin.nl/node/1911 MIMORE]: Microcomparative Morphosyntax Research Tool
* [[Grammar#Morphosyntax|Morphosyntax]]
*[http://hdl.handle.net/11858/00-395C-0000-0000-6F63-1 MAND/FAND/GTRP-database]


====Syntactic Atlas of the Dutch dialects (SAND)====
<!--T:107-->
The Dynamic Syntactic Atlas of the Dutch dialects (DynaSAND) is an on-line tool for dialect syntax research. DynaSAND consists of a database, a search engine, a cartographic component and a bibliography.
* [[Grammar#Syntactic_Atlas_of_the_Dutch_dialects_(SAND)|Syntactic Atlas of the Dutch dialects (SAND)]]


*[http://hdl.handle.net/11858/00-395C-0000-0000-6EB4-4 Online search]
<!--T:108-->
* [[Grammar#Dutch_descriptive_grammar:_e-ANS_(in_Dutch)|Dutch descriptive grammar: e-ANS (in Dutch)]]


====Dutch descriptive grammar: e-ANS (in Dutch)====
<!--T:109-->
* [[Grammar#Grambank|Grambank]]


The General Dutch Grammar, or ANS (Algemene Nederlandse Spraakkunst), is the go-to reference grammar for the Dutch language. It is the most extensive description of the grammatical aspects of contemporary Dutch. Its target users are both native speakers and foreign speakers learning Dutch. The ANS was born out of a Belgian-Dutch cooperation and was first printed in 1984. The second and revised 1997 edition was digitized, resulting in the e-ANS.
===[[Lexicography]]=== <!--T:37-->


Lately, the Dutch Language Institute (INT) has been working on a new, user-friendly website for the ANS, while work was started on the revision of its contents by the Leiden University Center for Linguistics (LUCL), Ghent University, KU Leuven and Radboud University Nijmegen.
<!--T:38-->
* [[Lexicography#Dutch_dictionaries|Dutch dictionaries]]


From 2020 onwards, the further revision of the contents will also be coordinated by the INT. The first revised chapters of the General Dutch Grammar will appear online in 2020, describing prepositions, word order and negations, among other subjects.
<!--T:110-->
* [[Lexicography#Elexis|The Elexis Project]]


===[[Lexicography]]===
<!--T:111-->
====Dutch [[dictionaries]]====
* [https://ivdnt.org/wp-content/uploads/2021/02/The-Future-of-Academic-Lexicography-A-White-Paper.pdf White paper]: The Future of Academic Lexicography
We provide a special page with more details about different types of dictionaries that are available for Dutch


====Elexis====
===[[Terminology]]=== <!--T:39-->
ELEXIS is an acronym for European Lexicographic Infrastructure. This project is carried out as part of the Horizon 2020 programme and aims to create a durable infrastructure for e-lexicography. A large amount of high-quality semantic information is now still kept in individual lexicographic sources, spread out over Europe. ELEXIS makes it possible to link, share, distribute and save all of these different European sources on a large scale. Besides, the project helps diminish the gap between communities with great lexicographic expertise and those with little.


* [http://www.elex.is/ Elexis project website]  
<!--T:67-->
*[[Terminology#Centre_of_Expertise_for_Dutch_Terminology|The Centre of Expertise for Dutch Terminology]]


====White Paper: The Future of Academic Lexicography====
<!--T:112-->
*[https://ivdnt.org/wp-content/uploads/2021/02/The-Future-of-Academic-Lexicography-A-White-Paper.pdf White paper]: The Future of Academic Lexicography
*[[Terminology#Academic_language|Academic language]]


===Terminology===
<!--T:113-->
The Centre of Expertise for Dutch Terminology (Expertisecentrum Nederlandstalige Terminologie or ENT) supports people and organisations involved with terminology. They can find terminological information and tools here, on the website of the Dutch Language Institute (INT). A newsletter is sent round several times a year, describing developments and events in the field of terminology.
*[[Terminology#Medical_terminology|Medical terminology]]


*[https://ivdnt.org/terminologie/expertisecentrum/ Centre of Expertise for Dutch Terminology]
<!--T:114-->
*[[Terminology#Dutch_as_a_scientific_language|Dutch as a scientific language]]


====Higher Education Terminology====
<!--T:115-->
HOTNeV is an acronym for Hoger Onderwijs Terminologie in Nederland en Vlaanderen (Higher Education Terminology in the Netherlands and Flanders). This project was prompted by a sharp increase in educational terms, generated by the EU’s education policy and implemented by the Tuning Project. HOTNeV has a dual purpose. Until now, Dutch equivalents for the English terminology were created mainly ad hoc, but this project focuses on the need to coordinate the provision of terms that have been approved by parties in the Dutch-speaking educational sector. It also wants to show the feasibility of this ambition.
*[[Terminology#Legal_terminology|Legal terminology]]


*[https://ivdnt.org/terminologie/onderwijsterminologie-hotnev/ HOTNEV website]
===[[Spelling]]=== <!--T:40-->


====Medical Terminology====
<!--T:68-->
The Medical Pilot is an experimental database in which a small part of the medical vocabulary is described at various levels, from scientific to accessible to people with low literacy, and in which differences between Flemish and Dutch terms are also shown.
*[[Spelling#Woordenlijst.org_(Official_Dutch_Word_List)|Woordenlijst.org (Official Dutch Word List)]]


*[http://hdl.handle.net/10032/tm-a2-s7 MedPilot website]
<!--T:116-->
*[[Spelling#Spelling_Certification_Mark|Spelling Certification Mark]]


===Spelling===  
==Linguistic resources: datasets== <!--T:41-->


====Woordenlijst.org (Official Dutch Word List)====
===[[Corpora]]=== <!--T:69-->


The Word List of the Dutch Language is online available for free at woordenlijst.org. In 2015, the online version grew from approximately 100,000 entries to roughly 168,000 entries. All words from the previous printed edition have been retained.
===Lexical resources=== <!--T:42-->


The newly added words are derived from text files collected at the Dutch Language Institute, containing newspaper texts, literary texts and texts from the internet. In addition, a selection was made from all words that had been looked up in vain in the online Word List.
<!--T:71-->
* [[Lexica]]


Since 2015, woordenlijst.org has been updated several times a year with hundreds of new words. At the end of 2019 it contained a total of 186,000 words. With all plural forms, diminutive forms, past tenses and past participles, the digital version of the Word List now contains information about approximately 680,000 word forms.
<!--T:117-->
* [[Dictionaries]]


*[https://woordenlijst.org Online version]
<!--T:118-->
* [[Conceptual resources]]


====Spelling Certification Mark====
<!--T:119-->
* [[Wordlists]]


The Spelling Certification Mark (Keurmerk Spelling) is a guarantee given by the Union for the Dutch Language (Nederlandse Taalunie) that a reference work can be used to look up the official spelling.
<!--T:120-->
* [[Embeddings]]


For the automatic spell check of word lists (for example provided by dictionary suppliers), the Dutch Language Institute uses the Spelling Certification Mark, also known as the HulK. Our spelling specialists manually correct the words the HulK does not recognize and add these to our own material. From then on the words can be processed automatically.
<!--T:121-->
* [[Lexica of terminology]]


Any word list compiled in accordance with the rules and principles of the official spelling receives the Spelling Certification Mark.
<!--T:122-->
* [[Ontologies]]


====Spelling tools====
===N-grams=== <!--T:43-->


*[https://dev.clarin.nl/node/1914 TiCCLops]: Text-Induced Corpus Clean-up online processing system
<!--T:72-->
* [[Character N-grams]]


==Linguistic resources: corpora and lexica==
==Tools for Dutch== <!--T:44-->
===Corpora===
* [[Newspaper corpora]]
* [[Parliamentary corpora]]
* [[Computer-mediated communication corpora]]
* [[Corpora of academic texts]]
* [[Historical corpora]]
* [[L2 learner corpora]]
* Literary corpora
* [[Manually annotated corpora]]
* [[Multimodal corpora]]
* [[Parallel corpora]]
* [[Reference corpora]]
* [[Spoken corpora]]


===Lexical Resources===
===Normalisation=== <!--T:73-->
* [[Lexica]]
* [[Dictionaries]]
* [[Conceptual Resources]]
* [[Wordlists]]


==Tools for Dutch==
<!--T:74-->
===Normalisation===
* [[Format conversion]]
* [[Format conversion]]
<!--T:123-->
* [[Spell checking]]
* [[Spell checking]]


===Language Learning===
<!--T:124-->
*[https://schrijfassistent.be Schrijfassistent]
*[https://piccl.ivdnt.org/ PICCL]: The Text-Induced Corpus Clean-up (TICCL) online processing system is part of PICCL (Philosophical Integrator of Computational and Corpus Libraries). TICCL performs spelling correction and OCR post-correction.
*[http://schrijfassistent.standaard.be/ Schrijfassistent] at De Standaard
*[https://www.nedbox.be NedBox]: Online exercises to learn Dutch
*[https://oefenen.nl/programma/soort/taal Oefenen.nl]: Online exercises to learn Dutch
*[http://woordcombinaties.ivdnt.org/ Woordcombinaties]: Verbs and how their combination patterns


===Automatic linguistic annotation===
<!--T:125-->
*[https://lt3.ugent.be/normalisation-demo/ Normalisation demo]
 
===[[Language Learning Resources]]=== <!--T:45-->
 
===Automatic linguistic annotation=== <!--T:46-->
 
<!--T:76-->
* [[Basic language processing]]
* [[Basic language processing]]
<!--T:132-->
* [[Deep parsing]]
* [[Deep parsing]]
<!-- ===Information extraction!-->
<!-- ===Information extraction!-->
Line 129: Line 130:
<!--* Text mining!-->
<!--* Text mining!-->


===Speech processing===
===Speech processing=== <!--T:47-->
* Speech recognition
* Speech synthesis


===Natural Language Processing===
<!--T:77-->
* [[Language Modeling]]
* [[Spoken language recognition]]
 
<!--T:134-->
* [[Speech recognition]]
 
<!--T:135-->
* [[Speech synthesis]]
 
===Natural Language Processing (NLP)=== <!--T:48-->
 
<!--T:78-->
* [[Language modeling]]
 
<!--T:136-->
* [[Machine translation]]
* [[Machine translation]]
* [[Stylometry]]


===Resource querying===
<!--T:137-->
* [[Coreference resolution]]
 
<!--T:138-->
* [[Compound splitting]]
 
<!--T:139-->
* [[Word sense disambiguation]]
 
<!--T:140-->
* [[Text classification]]
 
<!--T:141-->
* [[Sentiment analysis]]
 
<!--T:142-->
* [[Readability]]
 
<!--T:143-->
* [[Text simplification]]
 
<!--T:144-->
* [[Clinical NLP]]
 
<!--T:157-->
* [[Syllabification]]
 
<!--T:149-->
* [https://gate.ac.uk/ GATE] (General Architecture for Text Engineering) is a Java suite of tools originally developed at the University of Sheffield and it is used for many natural language processing tasks, including information extraction. ([https://cloud.gate.ac.uk/shopfront#tagged=Dutch Dutch services in GATE Cloud]).
 
===Resource querying=== <!--T:49-->
 
<!--T:79-->
* [[Corpus querying]]
* [[Corpus querying]]
<!--T:145-->
* [[Treebank querying]]
* [[Treebank querying]]


===Other===
===Terminology extraction=== <!--T:59-->
 
<!--T:160-->
*[https://termwerk.ivdnt.org/ Termwerk]. New online term extraction and term management system, with CLARIN login.
 
<!--T:84-->
* [https://termtreffer.org/ Termtreffer]. Ask for login at [mailto:terminologie@ivdnt.org terminologie@ivdnt.org].
 
<!--T:146-->
* [https://lt3.ugent.be/dterminer D-Terminer demo]. Terminology extraction for Dutch, English, French and German. (Rigouts Terryn, A. (2021). D-TERMINE: Data-driven Term Extraction Methodologies Investigated [Doctoral thesis]. Ghent University.)
 
===Terminology management=== <!--T:60-->
 
<!--T:85-->
* [https://iate.europa.eu/home IATE] (Interactive Terminology for Europe) is the EU's terminology management system. It’s the shared terminology management system of the institutions of the European Union and it contains more than 7 million terms in 26 languages covering more than 100 domains of the EU legislation.
 
<!--T:161-->
===Word Usage Measurements===
* Woordpeiler: Temporal tendencies for words in contemporary Dutch. Application that shows graphs of the relative frequency of queried words over the course of time, starting from 2000.
 
<!--T:162-->
*[https://woordpeiler.ivdnt.org/ Website]
 
===Other=== <!--T:61-->
 
<!--T:86-->
* Previously unmentioned [[CLARIN projects]] at INT
* Previously unmentioned [[CLARIN projects]] at INT
* [https://webservices.cls.ru.nl/ Language and Speech Tools] at Radboud Nijmegen


==Helpdesk==
<!--T:148-->
* [https://www.opener-project.eu/ OpeNER] is a language analysis toolchain helping (academic) researchers and companies make sense out of natural language analysis”. It consist of easy to install, improve and configure components to e.g. detect the language of a text, determine polarisation of texts (sentiment analysis), detect what topics are included in the text,... The supported language set currently consists of: English, Spanish, Italian, German and Dutch.
 
<!--T:150-->
* [https://speech-repository.webcloud.ec.europa.eu/ Speech Repository] is an online e-learning tool. It contains video recordings of real-life speeches and tailor-made pedagogical material speeches which give the interpreter and interpreting students an opportunity to practise and improve their interpretation skills.
 
<!--T:151-->
* [https://subworkshop.sourceforge.net/ Subtitle Workshop] is a free application for creating, editing, and converting text-based subtitle files.
 
<!--T:152-->
* [https://youdescribe.org/ YouDescribe] is a free, web-based platform for adding audio description to YouTube content.
 
<!--T:153-->
* [https://www.audacityteam.org/ Audacity] is an audio recording and editing software application that is open source.
 
<!--T:163-->
* [https://debias-tool.ails.ece.ntua.gr/ De-Bias] detects outdated and potentially harmful language in descriptions of cultural heritage collections.
 
<!--T:164-->
* [https://ai4culture.crosslang.dev/ui Occam] OCR and HTR tool with spelling correction, including Dutch.
 
<!--T:165-->
* [https://www.conker.ai/ Conker] supports teachers in devising questions for assignments or evaluations. Using artificial intelligence and based on self-entered learning content or general themes, this tool automatically generates evaluation questions and assignments. Conker offers some free options.
 
<!--T:166-->
* [https://questionwell.org/ QuestionWell] supports teachers through artificial intelligence by generating sets of questions and answers for specific learning content that they themselves have added to the tool. QuestionWell offers some free options.
 
<!--T:167-->
*[https://resoomer.ai/nl/ Resoomer] is a website that automatically and instantly generates summaries of documents or text fragments submitted by users.
 
==Helpdesk== <!--T:62-->
 
<!--T:87-->
For information about Dutch: If you cannot find the answers to your questions on this wiki, you can send your question to [mailto://servicedesk@ivdnt.org servicedesk@ivdnt.org ]. Your questions will be forwarded as soon as possible to the appropriate experts and you should receive an answer within two working days.
For information about Dutch: If you cannot find the answers to your questions on this wiki, you can send your question to [mailto://servicedesk@ivdnt.org servicedesk@ivdnt.org ]. Your questions will be forwarded as soon as possible to the appropriate experts and you should receive an answer within two working days.


You can also ask us for information and assistance with the use of data and tools
<!--T:63-->
You can also ask us for information and assistance with the use of data and tools.


==Other Services==
==Other Services== <!--T:64-->
 
<!--T:88-->
* [[Best practice documents and guidelines]]
* [[Best practice documents and guidelines]]
<!--T:154-->
* [[Internships]]
* [[Internships]]
<!--T:155-->
* [[Consulting]]
* [[Consulting]]
<!--T:156-->
* [[CLARIN]] for Dutch
* [[CLARIN]] for Dutch


We will store answers to questions we receive in this wiki, which will grow into a repository of K-Dutch answers to your questions.
==Questions and Answers== <!--T:65-->
 
<!--T:89-->
On the [[Q&A|Questions and Answers page]] we keep track of all questions we receive concerning Dutch. This will grow into a repository of K-Dutch answers to your questions.
 
<!--T:66-->
Note that there is also a very active Discord server concerning Dutch NLP: https://discord.gg/jn94Ux5j
 
 
 
</translate>

Latest revision as of 16:16, 18 November 2025

Mediawiki:Mainpage

Welcome to K-Dutch, the place for anyone who wants to know anything about the Dutch language: linguistic properties, language advice, available tools and resources, etymology, dialects...

About

You are most welcome to contribute to these pages, please contact servicedesk@ivdnt.org with the subject line K-Dutch, and we will be in touch.

Linguistic topics

Grammar

Lexicography

Terminology

Spelling

Linguistic resources: datasets

Corpora

Lexical resources

N-grams

Tools for Dutch

Normalisation

  • PICCL: The Text-Induced Corpus Clean-up (TICCL) online processing system is part of PICCL (Philosophical Integrator of Computational and Corpus Libraries). TICCL performs spelling correction and OCR post-correction.

Language Learning Resources

Automatic linguistic annotation

Speech processing

Natural Language Processing (NLP)

  • GATE (General Architecture for Text Engineering) is a Java suite of tools originally developed at the University of Sheffield and it is used for many natural language processing tasks, including information extraction. (Dutch services in GATE Cloud).

Resource querying

Terminology extraction

  • Termwerk. New online term extraction and term management system, with CLARIN login.
  • D-Terminer demo. Terminology extraction for Dutch, English, French and German. (Rigouts Terryn, A. (2021). D-TERMINE: Data-driven Term Extraction Methodologies Investigated [Doctoral thesis]. Ghent University.)

Terminology management

  • IATE (Interactive Terminology for Europe) is the EU's terminology management system. It’s the shared terminology management system of the institutions of the European Union and it contains more than 7 million terms in 26 languages covering more than 100 domains of the EU legislation.

Word Usage Measurements

  • Woordpeiler: Temporal tendencies for words in contemporary Dutch. Application that shows graphs of the relative frequency of queried words over the course of time, starting from 2000.

Other

  • OpeNER is a language analysis toolchain helping (academic) researchers and companies make sense out of natural language analysis”. It consist of easy to install, improve and configure components to e.g. detect the language of a text, determine polarisation of texts (sentiment analysis), detect what topics are included in the text,... The supported language set currently consists of: English, Spanish, Italian, German and Dutch.
  • Speech Repository is an online e-learning tool. It contains video recordings of real-life speeches and tailor-made pedagogical material speeches which give the interpreter and interpreting students an opportunity to practise and improve their interpretation skills.
  • Subtitle Workshop is a free application for creating, editing, and converting text-based subtitle files.
  • YouDescribe is a free, web-based platform for adding audio description to YouTube content.
  • Audacity is an audio recording and editing software application that is open source.
  • De-Bias detects outdated and potentially harmful language in descriptions of cultural heritage collections.
  • Occam OCR and HTR tool with spelling correction, including Dutch.
  • Conker supports teachers in devising questions for assignments or evaluations. Using artificial intelligence and based on self-entered learning content or general themes, this tool automatically generates evaluation questions and assignments. Conker offers some free options.
  • QuestionWell supports teachers through artificial intelligence by generating sets of questions and answers for specific learning content that they themselves have added to the tool. QuestionWell offers some free options.
  • Resoomer is a website that automatically and instantly generates summaries of documents or text fragments submitted by users.

Helpdesk

For information about Dutch: If you cannot find the answers to your questions on this wiki, you can send your question to servicedesk@ivdnt.org . Your questions will be forwarded as soon as possible to the appropriate experts and you should receive an answer within two working days.

You can also ask us for information and assistance with the use of data and tools.

Other Services

Questions and Answers

On the Questions and Answers page we keep track of all questions we receive concerning Dutch. This will grow into a repository of K-Dutch answers to your questions.

Note that there is also a very active Discord server concerning Dutch NLP: https://discord.gg/jn94Ux5j