K-Dutch: Difference between revisions

From Clarin K-Centre
Jump to navigation Jump to search
No edit summary
No edit summary
 
(9 intermediate revisions by 3 users not shown)
Line 2: Line 2:
<translate>
<translate>
<!--T:29-->
<!--T:29-->
<span style="color:white">Mediawiki:Mainpage</span><br>
<span style="color:white">Mediawiki:Mainpage</span><br>        <span style="color:red">Vanwege een serververhuizing in het datacenter van de Universiteit Leiden is https://kdutch.ivdnt.org/ op '''27 november 2024''' tijdelijk niet bereikbaar. Onze excuses voor het ongemak!</span>


<!--T:30-->
<!--T:30-->
Line 21: Line 21:
You are most welcome to contribute to these pages, please contact [mailto://servicedesk@ivdnt.org servicedesk@ivdnt.org] with as subject K-Dutch, and we will be in touch.
You are most welcome to contribute to these pages, please contact [mailto://servicedesk@ivdnt.org servicedesk@ivdnt.org] with as subject K-Dutch, and we will be in touch.


==Linguistic topics==
==Linguistic topics== <!--T:105-->


===[[Grammar]]=== <!--T:35-->
===[[Grammar]]=== <!--T:35-->
Line 28: Line 28:
* [[Grammar#Phonology,_morphology_and_syntax:_Taalportaal|Phonology, morphology and syntax: Taalportaal]]
* [[Grammar#Phonology,_morphology_and_syntax:_Taalportaal|Phonology, morphology and syntax: Taalportaal]]


<!--T:106-->
* [[Grammar#Morphosyntax|Morphosyntax]]
* [[Grammar#Morphosyntax|Morphosyntax]]


<!--T:107-->
* [[Grammar#Syntactic_Atlas_of_the_Dutch_dialects_(SAND)|Syntactic Atlas of the Dutch dialects (SAND)]]
* [[Grammar#Syntactic_Atlas_of_the_Dutch_dialects_(SAND)|Syntactic Atlas of the Dutch dialects (SAND)]]


<!--T:108-->
* [[Grammar#Dutch_descriptive_grammar:_e-ANS_(in_Dutch)|Dutch descriptive grammar: e-ANS (in Dutch)]]
* [[Grammar#Dutch_descriptive_grammar:_e-ANS_(in_Dutch)|Dutch descriptive grammar: e-ANS (in Dutch)]]


<!--T:109-->
* [[Grammar#Grambank|Grambank]]
* [[Grammar#Grambank|Grambank]]


Line 41: Line 45:
* [[Lexicography#Dutch_dictionaries|Dutch dictionaries]]
* [[Lexicography#Dutch_dictionaries|Dutch dictionaries]]


<!--T:110-->
* [[Lexicography#Elexis|The Elexis Project]]
* [[Lexicography#Elexis|The Elexis Project]]


<!--T:111-->
* [https://ivdnt.org/wp-content/uploads/2021/02/The-Future-of-Academic-Lexicography-A-White-Paper.pdf White paper]: The Future of Academic Lexicography
* [https://ivdnt.org/wp-content/uploads/2021/02/The-Future-of-Academic-Lexicography-A-White-Paper.pdf White paper]: The Future of Academic Lexicography


Line 50: Line 56:
*[[Terminology#Centre_of_Expertise_for_Dutch_Terminology|The Centre of Expertise for Dutch Terminology]]
*[[Terminology#Centre_of_Expertise_for_Dutch_Terminology|The Centre of Expertise for Dutch Terminology]]


<!--T:112-->
*[[Terminology#Academic_language|Academic language]]
*[[Terminology#Academic_language|Academic language]]


<!--T:113-->
*[[Terminology#Medical_terminology|Medical terminology]]
*[[Terminology#Medical_terminology|Medical terminology]]


<!--T:114-->
*[[Terminology#Dutch_as_a_scientific_language|Dutch as a scientific language]]
*[[Terminology#Dutch_as_a_scientific_language|Dutch as a scientific language]]


<!--T:115-->
*[[Terminology#Legal_terminology|Legal terminology]]
*[[Terminology#Legal_terminology|Legal terminology]]


Line 63: Line 73:
*[[Spelling#Woordenlijst.org_(Official_Dutch_Word_List)|Woordenlijst.org (Official Dutch Word List)]]
*[[Spelling#Woordenlijst.org_(Official_Dutch_Word_List)|Woordenlijst.org (Official Dutch Word List)]]


<!--T:116-->
*[[Spelling#Spelling_Certification_Mark|Spelling Certification Mark]]
*[[Spelling#Spelling_Certification_Mark|Spelling Certification Mark]]


Line 68: Line 79:


===[[Corpora]]=== <!--T:69-->
===[[Corpora]]=== <!--T:69-->
<!--T:70-->
* [[Newspaper corpora]]: corpora exclusively consisting of newspaper text
<!--T:90-->
* [[Parliamentary corpora]]
<!--T:91-->
* [[Computer-mediated communication corpora]]
<!--T:92-->
* [[Corpora of academic texts]]
<!--T:93-->
* [[Historical corpora]]
<!--T:94-->
* [[L2 learner corpora]]
<!--T:95-->
* [[Manually annotated corpora]]
<!--T:96-->
* [[Multimodal corpora]]
<!--T:97-->
* [[Parallel corpora]]
<!--T:98-->
* [[Reference corpora]]
<!--T:99-->
* [[Social media corpora]]
<!--T:100-->
* [[Spoken corpora]]
<!--T:101-->
* [[Sign Language corpora]]
<!--T:102-->
* [[Propbanks]]: contains semantic role labels
<!--T:103-->
* [[Treebanks]]
<!--T:104-->
* [[Other corpora]]


===Lexical resources=== <!--T:42-->
===Lexical resources=== <!--T:42-->
Line 122: Line 85:
* [[Lexica]]
* [[Lexica]]


<!--T:117-->
* [[Dictionaries]]
* [[Dictionaries]]


<!--T:118-->
* [[Conceptual resources]]
* [[Conceptual resources]]


<!--T:119-->
* [[Wordlists]]
* [[Wordlists]]


<!--T:120-->
* [[Embeddings]]
* [[Embeddings]]


<!--T:121-->
* [[Lexica of terminology]]
* [[Lexica of terminology]]


<!--T:122-->
* [[Ontologies]]
* [[Ontologies]]


Line 146: Line 115:
* [[Format conversion]]
* [[Format conversion]]


<!--T:123-->
* [[Spell checking]]
* [[Spell checking]]


<!--T:124-->
*[https://dev.clarin.nl/node/1914 TiCCLops]: Text-Induced Corpus Clean-up online processing system: no longer available
*[https://dev.clarin.nl/node/1914 TiCCLops]: Text-Induced Corpus Clean-up online processing system: no longer available


<!--T:125-->
*[https://lt3.ugent.be/normalisation-demo/ Normalisation demo]
*[https://lt3.ugent.be/normalisation-demo/ Normalisation demo]


Line 157: Line 129:
*[https://schrijfassistent.be Schrijfassistent]
*[https://schrijfassistent.be Schrijfassistent]


*[http://schrijfassistent.standaard.be/ Schrijfassistent] at De Standaard
<!--T:127-->
 
*[https://www.nedbox.be NedBox]: Online exercises to learn Dutch
*[https://www.nedbox.be NedBox]: Online exercises to learn Dutch


<!--T:128-->
*[https://oefenen.nl/programma/soort/taal Oefenen.nl]: Online exercises to learn Dutch
*[https://oefenen.nl/programma/soort/taal Oefenen.nl]: Online exercises to learn Dutch


<!--T:129-->
*[http://woordcombinaties.ivdnt.org/ Woordcombinaties]: Verbs and their combination patterns
*[http://woordcombinaties.ivdnt.org/ Woordcombinaties]: Verbs and their combination patterns


<!--T:130-->
*[https://orientplus.ucll.be/ Orient+]: A serious game to enhance academic vocabulary
*[https://orientplus.ucll.be/ Orient+]: A serious game to enhance academic vocabulary


<!--T:131-->
*[https://www.taalwinkel.nl/ Taalwinkel]: Language Advice
*[https://www.taalwinkel.nl/ Taalwinkel]: Language Advice


Line 174: Line 149:
* [[Basic language processing]]
* [[Basic language processing]]


<!--T:132-->
* [[Deep parsing]]
* [[Deep parsing]]


<!--T:133-->
<!-- ===Information extraction!-->
<!-- ===Information extraction!-->
<!--* Processing of historical variants of Dutch!-->
<!--* Processing of historical variants of Dutch!-->
Line 185: Line 162:
* [[Spoken language recognition]]
* [[Spoken language recognition]]


<!--T:134-->
* [[Speech recognition]]
* [[Speech recognition]]


<!--T:135-->
* Speech synthesis
* Speech synthesis


Line 194: Line 173:
* [[Language modeling]]
* [[Language modeling]]


<!--T:136-->
* [[Machine translation]]
* [[Machine translation]]


<!--T:137-->
* [[Coreference resolution]]
* [[Coreference resolution]]


<!--T:138-->
* [[Compound splitting]]
* [[Compound splitting]]


<!--T:139-->
* [[Word sense disambiguation]]
* [[Word sense disambiguation]]


<!--T:140-->
* [[Text classification]]
* [[Text classification]]


<!--T:141-->
* [[Sentiment analysis]]
* [[Sentiment analysis]]


<!--T:142-->
* [[Readability]]
* [[Readability]]


<!--T:143-->
* [[Text simplification]]
* [[Text simplification]]


<!--T:144-->
* [[Clinical NLP]]
* [[Clinical NLP]]
<!--T:157-->
* [[Syllabification]]


===Resource querying=== <!--T:49-->
===Resource querying=== <!--T:49-->
Line 217: Line 208:
* [[Corpus querying]]
* [[Corpus querying]]


<!--T:145-->
* [[Treebank querying]]
* [[Treebank querying]]


Line 224: Line 216:
* [https://termtreffer.org/ Termtreffer]. Ask for login at [mailto:terminologie@ivdnt.org terminologie@ivdnt.org].
* [https://termtreffer.org/ Termtreffer]. Ask for login at [mailto:terminologie@ivdnt.org terminologie@ivdnt.org].


<!--T:146-->
* [https://lt3.ugent.be/dterminer D-Terminer demo]. Terminology extraction for Dutch,  
* [https://lt3.ugent.be/dterminer D-Terminer demo]. Terminology extraction for Dutch,  
English, French and German. (Rigouts Terryn, A. (2021). D-TERMINE: Data-driven Term Extraction Methodologies Investigated [Doctoral thesis]. Ghent University.)
English, French and German. (Rigouts Terryn, A. (2021). D-TERMINE: Data-driven Term Extraction Methodologies Investigated [Doctoral thesis]. Ghent University.)
Line 237: Line 230:
* Previously unmentioned [[CLARIN projects]] at INT
* Previously unmentioned [[CLARIN projects]] at INT


<!--T:147-->
* [https://webservices.cls.ru.nl/ Language and Speech Tools] at Radboud Nijmegen. e.g. [https://webservices.cls.ru.nl/tscan T-scan], an analysis tool for dutch texts to assess the complexity of the text.
* [https://webservices.cls.ru.nl/ Language and Speech Tools] at Radboud Nijmegen. e.g. [https://webservices.cls.ru.nl/tscan T-scan], an analysis tool for dutch texts to assess the complexity of the text.


<!--T:148-->
* [https://www.opener-project.eu/ OpeNER] is a language analysis toolchain helping (academic) researchers and companies make sense out of natural language analysis”. It consist of easy to install, improve and configure components to e.g. detect the language of a text, determine polarisation of texts (sentiment analysis), detect what topics are included in the text,... The supported language set currently consists of: English, Spanish, Italian, German and Dutch.
* [https://www.opener-project.eu/ OpeNER] is a language analysis toolchain helping (academic) researchers and companies make sense out of natural language analysis”. It consist of easy to install, improve and configure components to e.g. detect the language of a text, determine polarisation of texts (sentiment analysis), detect what topics are included in the text,... The supported language set currently consists of: English, Spanish, Italian, German and Dutch.


<!--T:149-->
* [https://gate.ac.uk/ GATE] (General Architecture for Text Engineering) is a Java suite of tools originally developed at the University of Sheffield and it is used for many natural language processing tasks, including information extraction. ([https://cloud.gate.ac.uk/shopfront#tagged=Dutch Dutch services in GATE Cloud]).
* [https://gate.ac.uk/ GATE] (General Architecture for Text Engineering) is a Java suite of tools originally developed at the University of Sheffield and it is used for many natural language processing tasks, including information extraction. ([https://cloud.gate.ac.uk/shopfront#tagged=Dutch Dutch services in GATE Cloud]).


<!--T:150-->
* [https://speech-repository.webcloud.ec.europa.eu/ Speech Repository] is an online e-learning tool. It contains video recordings of real-life speeches and tailor-made pedagogical material speeches which give the interpreter and interpreting students an opportunity to practise and improve their interpretation skills.
* [https://speech-repository.webcloud.ec.europa.eu/ Speech Repository] is an online e-learning tool. It contains video recordings of real-life speeches and tailor-made pedagogical material speeches which give the interpreter and interpreting students an opportunity to practise and improve their interpretation skills.


<!--T:151-->
* [https://subworkshop.sourceforge.net/ Subtitle Workshop] is a free application for creating, editing, and converting text-based subtitle files.  
* [https://subworkshop.sourceforge.net/ Subtitle Workshop] is a free application for creating, editing, and converting text-based subtitle files.  


<!--T:152-->
* [https://youdescribe.org/ YouDescribe] is a free, web-based platform for adding audio description to YouTube content.
* [https://youdescribe.org/ YouDescribe] is a free, web-based platform for adding audio description to YouTube content.


<!--T:153-->
* [https://www.audacityteam.org/ Audacity] is an audio recording and editing software application that is open source.
* [https://www.audacityteam.org/ Audacity] is an audio recording and editing software application that is open source.


Line 264: Line 264:
* [[Best practice documents and guidelines]]
* [[Best practice documents and guidelines]]


<!--T:154-->
* [[Internships]]
* [[Internships]]


<!--T:155-->
* [[Consulting]]
* [[Consulting]]


<!--T:156-->
* [[CLARIN]] for Dutch
* [[CLARIN]] for Dutch



Latest revision as of 11:38, 20 November 2024

Other languages:

Mediawiki:Mainpage
        Vanwege een serververhuizing in het datacenter van de Universiteit Leiden is https://kdutch.ivdnt.org/ op 27 november 2024 tijdelijk niet bereikbaar. Onze excuses voor het ongemak!

Welcome to K-Dutch, the place for anyone who wants to know anything about the Dutch language: linguistic properties, language advice, available tools and resources, etymology, dialects...

K-Dutch is a CLARIN Knowledge Centre. It is hosted by the Instituut voor de Nederlandse Taal (Dutch Language Institute) , which is also a CLARIN-B centre and host of many resources for Dutch, which are, in general, freely available for research purposes. K-Dutch is an initiative of CLARIN-ERIC and CLARIN-BE.

The status of Dutch with respect to language technologies is described in

You are most welcome to contribute to these pages, please contact servicedesk@ivdnt.org with as subject K-Dutch, and we will be in touch.

Linguistic topics

Grammar

Lexicography

Terminology

Spelling

Linguistic resources: datasets

Corpora

Lexical resources

N-grams

Tools for Dutch

Normalisation

  • TiCCLops: Text-Induced Corpus Clean-up online processing system: no longer available

Language Learning

  • NedBox: Online exercises to learn Dutch
  • Orient+: A serious game to enhance academic vocabulary

Automatic linguistic annotation


Speech processing

  • Speech synthesis

Natural Language Processing (NLP)

Resource querying

Terminology extraction

English, French and German. (Rigouts Terryn, A. (2021). D-TERMINE: Data-driven Term Extraction Methodologies Investigated [Doctoral thesis]. Ghent University.)

Terminology management

  • IATE (Interactive Terminology for Europe) is the EU's terminology management system. It’s the shared terminology management system of the institutions of the European Union and it contains more than 7 million terms in 26 languages covering more than 100 domains of the EU legislation.

Other

  • OpeNER is a language analysis toolchain helping (academic) researchers and companies make sense out of natural language analysis”. It consist of easy to install, improve and configure components to e.g. detect the language of a text, determine polarisation of texts (sentiment analysis), detect what topics are included in the text,... The supported language set currently consists of: English, Spanish, Italian, German and Dutch.
  • GATE (General Architecture for Text Engineering) is a Java suite of tools originally developed at the University of Sheffield and it is used for many natural language processing tasks, including information extraction. (Dutch services in GATE Cloud).
  • Speech Repository is an online e-learning tool. It contains video recordings of real-life speeches and tailor-made pedagogical material speeches which give the interpreter and interpreting students an opportunity to practise and improve their interpretation skills.
  • Subtitle Workshop is a free application for creating, editing, and converting text-based subtitle files.
  • YouDescribe is a free, web-based platform for adding audio description to YouTube content.
  • Audacity is an audio recording and editing software application that is open source.

Helpdesk

For information about Dutch: If you cannot find the answers to your questions on this wiki, you can send your question to servicedesk@ivdnt.org . Your questions will be forwarded as soon as possible to the appropriate experts and you should receive an answer within two working days.

You can also ask us for information and assistance with the use of data and tools.

Other Services

Questions and Answers

On the Questions and Answers page we keep track of all questions we receive concerning Dutch. This will grow into a repository of K-Dutch answers to your questions.

Note that there is also a very active Discord server concerning Dutch NLP: https://discord.gg/jn94Ux5j