Jump to content

K-Dutch: Difference between revisions

From Clarin K-Centre
Marked this version for translation
 
(280 intermediate revisions by 7 users not shown)
Line 1: Line 1:
=K-DUTCH -- The Dutch Language Institute as CLARIN Knowledge Centre for Dutch=
<languages/>
<translate>
<!--T:29-->
<span style="color:white">Mediawiki:Mainpage</span>


K-DUTCH is the place for anyone who wants to know anything about the Dutch language: linguistic properties, language advice, available tools and resources, etymology, dialects.... K-DUTCH is hosted by the [https://www.ivdnt.org Instituut voor de Nederlandse Taal] (Dutch Language Institute) , which is also a CLARIN-B centre and host of many resources for Dutch, which are, in general, freely available.
<!--T:30-->
[[File:K-centre-logo.jpg|frameless|right]]
Welcome to [[K-Dutch]], the place for anyone who wants to know anything about the Dutch language: linguistic properties, language advice, available tools and resources, etymology, dialects...  


== Targeted audience ==
==[[About]]== <!--T:159-->


The INT CLARIN K-center is targeted towards all people interested in any aspects of the Dutch language.


* researchers in
<!--T:34-->
** linguistics,  
You are most welcome to contribute to these pages, please contact [mailto://servicedesk@ivdnt.org servicedesk@ivdnt.org] with the subject line K-Dutch, and we will be in touch.
** humanities,  
** natural language processing


* general public
==Linguistic topics== <!--T:105-->


== Types of services offered ==
===[[Grammar]]=== <!--T:35-->


The Instituut voor de Nederlandse Taal (or Dutch Language Institute) is the place for anyone who wants to know anything about the Dutch language through the centuries. It is a scholarly but at the same time easily accessible institute that studies all aspects of the Dutch language, including its vocabulary, grammar and linguistic variations. The institute collects new Dutch words, updates important reference works such as the Algemene Nederlandse Spraakkunst, the main standard work on Dutch grammar, and creates terminology lists to make professional jargons accessible.
<!--T:36-->
* [[Grammar#Phonology,_morphology_and_syntax:_Taalportaal|Phonology, morphology and syntax: Taalportaal]]


The institute also takes a central position in the Dutch-speaking world (the Netherlands, Flanders, Suriname and the Netherlands Antilles) as a developer, keeper and distributor of corpora, lexica, dictionaries and grammars. With these sustainable language resources, all results of scholarly methods, the Dutch Language Institute provides the necessary building blocks of the study of Dutch.
<!--T:106-->
* [[Grammar#Morphosyntax|Morphosyntax]]


The services the Institute offers to the CLARIN community as a CLARIN-Knowledge center are:
<!--T:107-->
* [[Grammar#Syntactic_Atlas_of_the_Dutch_dialects_(SAND)|Syntactic Atlas of the Dutch dialects (SAND)]]


* providing information on Dutch linguistics: grammar, spelling, morphology, historical Dutch, variation, dialect
<!--T:108-->
* providing information on which corpora, lexicons and tools are available and would be suitable in specific cases
* [[Grammar#Dutch_descriptive_grammar:_e-ANS_(in_Dutch)|Dutch descriptive grammar: e-ANS (in Dutch)]]
* providing consulting on how to solve problems related to natural language processing for Dutch


== List of modalities covered by the expertise of the centre (bulleted list of key words and phrases) ==
<!--T:109-->
* [[Grammar#Grambank|Grambank]]


* speech (audio)
===[[Lexicography]]=== <!--T:37-->
* text


== List of linguistic topics covered (bulleted list of key words and phrases) ==
<!--T:38-->
* [[Lexicography#Dutch_dictionaries|Dutch dictionaries]]


* [[dictionaries]]
<!--T:110-->
* [[spelling and grammar]]
* [[Lexicography#Elexis|The Elexis Project]]
* [[corpora and lexicons]]
* regional variation and dialects


== List of language processing topics covered (bulleted list of key words and phrases) ==
<!--T:111-->
* [https://ivdnt.org/wp-content/uploads/2021/02/The-Future-of-Academic-Lexicography-A-White-Paper.pdf White paper]: The Future of Academic Lexicography


* syntactic parsing
===[[Terminology]]=== <!--T:39-->
* lemmatizing
 
* part-of-speech tagging
<!--T:67-->
* morphological analysis
*[[Terminology#Centre_of_Expertise_for_Dutch_Terminology|The Centre of Expertise for Dutch Terminology]]
* spelling variation and expansion
 
* translation
<!--T:112-->
* named entity labeling
*[[Terminology#Academic_language|Academic language]]
 
<!--T:113-->
*[[Terminology#Medical_terminology|Medical terminology]]
 
<!--T:114-->
*[[Terminology#Dutch_as_a_scientific_language|Dutch as a scientific language]]
 
<!--T:115-->
*[[Terminology#Legal_terminology|Legal terminology]]
 
===[[Spelling]]=== <!--T:40-->
 
<!--T:68-->
*[[Spelling#Woordenlijst.org_(Official_Dutch_Word_List)|Woordenlijst.org (Official Dutch Word List)]]
 
<!--T:116-->
*[[Spelling#Spelling_Certification_Mark|Spelling Certification Mark]]
 
==Linguistic resources: datasets== <!--T:41-->
 
===[[Corpora]]=== <!--T:69-->
 
===Lexical resources=== <!--T:42-->
 
<!--T:71-->
* [[Lexica]]
 
<!--T:117-->
* [[Dictionaries]]
 
<!--T:118-->
* [[Conceptual resources]]
 
<!--T:119-->
* [[Wordlists]]
 
<!--T:120-->
* [[Embeddings]]
 
<!--T:121-->
* [[Lexica of terminology]]
 
<!--T:122-->
* [[Ontologies]]
 
===N-grams=== <!--T:43-->
 
<!--T:72-->
* [[Character N-grams]]
 
==Tools for Dutch== <!--T:44-->
 
===Normalisation=== <!--T:73-->
 
<!--T:74-->
* [[Format conversion]]
 
<!--T:123-->
* [[Spell checking]]
 
<!--T:124-->
*[https://piccl.ivdnt.org/ PICCL]: The Text-Induced Corpus Clean-up (TICCL) online processing system is part of PICCL (Philosophical Integrator of Computational and Corpus Libraries). TICCL performs spelling correction and OCR post-correction.
 
<!--T:125-->
*[https://lt3.ugent.be/normalisation-demo/ Normalisation demo]
 
===[[Language Learning Resources]]=== <!--T:45-->
 
===Automatic linguistic annotation=== <!--T:46-->
 
<!--T:76-->
* [[Basic language processing]]
 
<!--T:132-->
* [[Deep parsing]]
<!-- ===Information extraction!-->
<!--* Processing of historical variants of Dutch!-->
<!--* Text mining!-->
 
===Speech processing=== <!--T:47-->
 
<!--T:77-->
* [[Spoken language recognition]]
 
<!--T:134-->
* [[Speech recognition]]
 
<!--T:135-->
* [[Speech synthesis]]
 
===Natural Language Processing (NLP)=== <!--T:48-->
 
<!--T:78-->
* [[Language modeling]]
 
<!--T:136-->
* [[Machine translation]]
 
<!--T:137-->
* [[Coreference resolution]]
 
<!--T:138-->
* [[Compound splitting]]
 
<!--T:139-->
* [[Word sense disambiguation]]
 
<!--T:140-->
* [[Text classification]]
 
<!--T:141-->
* [[Sentiment analysis]]
 
<!--T:142-->
* [[Readability]]
 
<!--T:143-->
* [[Text simplification]]
 
<!--T:144-->
* [[Clinical NLP]]
 
<!--T:157-->
* [[Syllabification]]
 
<!--T:149-->
* [https://gate.ac.uk/ GATE] (General Architecture for Text Engineering) is a Java suite of tools originally developed at the University of Sheffield and it is used for many natural language processing tasks, including information extraction. ([https://cloud.gate.ac.uk/shopfront#tagged=Dutch Dutch services in GATE Cloud]).
 
===Resource querying=== <!--T:49-->
 
<!--T:79-->
* [[Corpus querying]]
 
<!--T:145-->
* [[Treebank querying]]
 
===Terminology extraction=== <!--T:59-->
 
<!--T:160-->
*[https://termwerk.ivdnt.org/ Termwerk]. New online term extraction and term management system, with CLARIN login.
 
<!--T:84-->
* [https://termtreffer.org/ Termtreffer]. Ask for login at [mailto:terminologie@ivdnt.org terminologie@ivdnt.org].
 
<!--T:146-->
* [https://lt3.ugent.be/dterminer D-Terminer demo]. Terminology extraction for Dutch, English, French and German. (Rigouts Terryn, A. (2021). D-TERMINE: Data-driven Term Extraction Methodologies Investigated [Doctoral thesis]. Ghent University.)
 
===Terminology management=== <!--T:60-->
 
<!--T:85-->
* [https://iate.europa.eu/home IATE] (Interactive Terminology for Europe) is the EU's terminology management system. It’s the shared terminology management system of the institutions of the European Union and it contains more than 7 million terms in 26 languages covering more than 100 domains of the EU legislation.
 
<!--T:161-->
===Word Usage Measurements===
* Woordpeiler: Temporal tendencies for words in contemporary Dutch. Application that shows graphs of the relative frequency of queried words over the course of time, starting from 2000.
 
<!--T:162-->
*[https://woordpeiler.ivdnt.org/ Website]
 
===Other=== <!--T:61-->
 
<!--T:86-->
* Previously unmentioned [[CLARIN projects]] at INT
 
<!--T:148-->
* [https://www.opener-project.eu/ OpeNER] is a language analysis toolchain helping (academic) researchers and companies make sense out of natural language analysis”. It consist of easy to install, improve and configure components to e.g. detect the language of a text, determine polarisation of texts (sentiment analysis), detect what topics are included in the text,... The supported language set currently consists of: English, Spanish, Italian, German and Dutch.
 
<!--T:150-->
* [https://speech-repository.webcloud.ec.europa.eu/ Speech Repository] is an online e-learning tool. It contains video recordings of real-life speeches and tailor-made pedagogical material speeches which give the interpreter and interpreting students an opportunity to practise and improve their interpretation skills.
 
<!--T:151-->
* [https://subworkshop.sourceforge.net/ Subtitle Workshop] is a free application for creating, editing, and converting text-based subtitle files.
 
<!--T:152-->
* [https://youdescribe.org/ YouDescribe] is a free, web-based platform for adding audio description to YouTube content.
 
<!--T:153-->
* [https://www.audacityteam.org/ Audacity] is an audio recording and editing software application that is open source.
 
<!--T:163-->
* [https://debias-tool.ails.ece.ntua.gr/ De-Bias] detects outdated and potentially harmful language in descriptions of cultural heritage collections.
 
<!--T:164-->
* [https://ai4culture.crosslang.dev/ui Occam] OCR and HTR tool with spelling correction, including Dutch.
 
<!--T:165-->
* [https://www.conker.ai/ Conker] supports teachers in devising questions for assignments or evaluations. Using artificial intelligence and based on self-entered learning content or general themes, this tool automatically generates evaluation questions and assignments. Conker offers some free options.
 
<!--T:166-->
* [https://questionwell.org/ QuestionWell] supports teachers through artificial intelligence by generating sets of questions and answers for specific learning content that they themselves have added to the tool. QuestionWell offers some free options.
 
<!--T:167-->
*[https://resoomer.ai/nl/ Resoomer] is a website that automatically and instantly generates summaries of documents or text fragments submitted by users.
 
==Helpdesk== <!--T:62-->
 
<!--T:87-->
For information about Dutch: If you cannot find the answers to your questions on this wiki, you can send your question to [mailto://servicedesk@ivdnt.org servicedesk@ivdnt.org ]. Your questions will be forwarded as soon as possible to the appropriate experts and you should receive an answer within two working days.
 
<!--T:63-->
You can also ask us for information and assistance with the use of data and tools.
 
==Other Services== <!--T:64-->
 
<!--T:88-->
* [[Best practice documents and guidelines]]
 
<!--T:154-->
* [[Internships]]
 
<!--T:155-->
* [[Consulting]]
 
<!--T:156-->
* [[CLARIN]] for Dutch
 
==Questions and Answers== <!--T:65-->
 
<!--T:89-->
On the [[Q&A|Questions and Answers page]] we keep track of all questions we receive concerning Dutch. This will grow into a repository of K-Dutch answers to your questions.
 
<!--T:66-->
Note that there is also a very active Discord server concerning Dutch NLP: https://discord.gg/jn94Ux5j
 
 
 
</translate>

Latest revision as of 16:16, 18 November 2025

Mediawiki:Mainpage

Welcome to K-Dutch, the place for anyone who wants to know anything about the Dutch language: linguistic properties, language advice, available tools and resources, etymology, dialects...

About

You are most welcome to contribute to these pages, please contact servicedesk@ivdnt.org with the subject line K-Dutch, and we will be in touch.

Linguistic topics

Grammar

Lexicography

Terminology

Spelling

Linguistic resources: datasets

Corpora

Lexical resources

N-grams

Tools for Dutch

Normalisation

  • PICCL: The Text-Induced Corpus Clean-up (TICCL) online processing system is part of PICCL (Philosophical Integrator of Computational and Corpus Libraries). TICCL performs spelling correction and OCR post-correction.

Language Learning Resources

Automatic linguistic annotation

Speech processing

Natural Language Processing (NLP)

  • GATE (General Architecture for Text Engineering) is a Java suite of tools originally developed at the University of Sheffield and it is used for many natural language processing tasks, including information extraction. (Dutch services in GATE Cloud).

Resource querying

Terminology extraction

  • Termwerk. New online term extraction and term management system, with CLARIN login.
  • D-Terminer demo. Terminology extraction for Dutch, English, French and German. (Rigouts Terryn, A. (2021). D-TERMINE: Data-driven Term Extraction Methodologies Investigated [Doctoral thesis]. Ghent University.)

Terminology management

  • IATE (Interactive Terminology for Europe) is the EU's terminology management system. It’s the shared terminology management system of the institutions of the European Union and it contains more than 7 million terms in 26 languages covering more than 100 domains of the EU legislation.

Word Usage Measurements

  • Woordpeiler: Temporal tendencies for words in contemporary Dutch. Application that shows graphs of the relative frequency of queried words over the course of time, starting from 2000.

Other

  • OpeNER is a language analysis toolchain helping (academic) researchers and companies make sense out of natural language analysis”. It consist of easy to install, improve and configure components to e.g. detect the language of a text, determine polarisation of texts (sentiment analysis), detect what topics are included in the text,... The supported language set currently consists of: English, Spanish, Italian, German and Dutch.
  • Speech Repository is an online e-learning tool. It contains video recordings of real-life speeches and tailor-made pedagogical material speeches which give the interpreter and interpreting students an opportunity to practise and improve their interpretation skills.
  • Subtitle Workshop is a free application for creating, editing, and converting text-based subtitle files.
  • YouDescribe is a free, web-based platform for adding audio description to YouTube content.
  • Audacity is an audio recording and editing software application that is open source.
  • De-Bias detects outdated and potentially harmful language in descriptions of cultural heritage collections.
  • Occam OCR and HTR tool with spelling correction, including Dutch.
  • Conker supports teachers in devising questions for assignments or evaluations. Using artificial intelligence and based on self-entered learning content or general themes, this tool automatically generates evaluation questions and assignments. Conker offers some free options.
  • QuestionWell supports teachers through artificial intelligence by generating sets of questions and answers for specific learning content that they themselves have added to the tool. QuestionWell offers some free options.
  • Resoomer is a website that automatically and instantly generates summaries of documents or text fragments submitted by users.

Helpdesk

For information about Dutch: If you cannot find the answers to your questions on this wiki, you can send your question to servicedesk@ivdnt.org . Your questions will be forwarded as soon as possible to the appropriate experts and you should receive an answer within two working days.

You can also ask us for information and assistance with the use of data and tools.

Other Services

Questions and Answers

On the Questions and Answers page we keep track of all questions we receive concerning Dutch. This will grow into a repository of K-Dutch answers to your questions.

Note that there is also a very active Discord server concerning Dutch NLP: https://discord.gg/jn94Ux5j