K-Dutch: Difference between revisions

From Clarin K-Centre
Jump to navigation Jump to search
No edit summary
 
(211 intermediate revisions by 5 users not shown)
Line 1: Line 1:
=K-DUTCH -- The Dutch Language Institute as CLARIN Knowledge Centre for Dutch=
<languages/>
<translate>
<span style="color:white">Mediawiki:Mainpage</span><br>


K-DUTCH is the place for anyone who wants to know anything about the Dutch language:
[[File:K-centre-logo.jpg|frameless|right]]
* linguistic properties,  
Welcome to [[K-Dutch]], the place for anyone who wants to know anything about the Dutch language: linguistic properties, language advice, available tools and resources, etymology, dialects...  
* language advice,  
* available tools and resources,  
* etymology,  
* dialects....  


K-DUTCH is hosted by the [https://www.ivdnt.org Instituut voor de Nederlandse Taal] (Dutch Language Institute) , which is also a CLARIN-B centre and host of many resources for Dutch, which are, in general, freely available.
K-Dutch is a [https://www.clarin.eu/content/knowledge-centres CLARIN Knowledge Centre]. It is hosted by the [https://www.ivdnt.org Instituut voor de Nederlandse Taal] (Dutch Language Institute) , which is also a [https://www.clarin.eu/content/certified-centres CLARIN-B centre] and host of many resources for Dutch, which are, in general, freely available for research purposes. K-Dutch is an initiative of [https://www.clarin.eu CLARIN-ERIC] and [https://clarin-be.ivdnt.org CLARIN-BE].


This page provides an overview of which services we cater to whom. We will store answers to questions we receive in this wiki, which will grow into a repository of K-Dutch answers to your questions.
The status of Dutch with respect to language technologies is described in


== Types of services offered ==
* Short version: [https://link.springer.com/chapter/10.1007/978-3-031-28819-7_12https://link.springer.com/chapter/10.1007/978-3-031-28819-7_12 Steurs, Vandeghinste and Daelemans (2023).] Language Report Dutch. In : Rehm, G., Way, A. (eds) ''European Language Equality''. Cognitive Technologies. Springer, Cham. <nowiki>https://doi.org/10.1007/978-3-031-28819-7_12</nowiki>
* Longer version: [https://european-language-equality.eu/wp-content/uploads/2022/03/ELE___Deliverable_D1_10__Language_Report_Dutch_.pdf Steurs, Vandeghinste and Daelemans (2022). Report on Dutch.] Project deliverable. European Language Equality.


The services K-Dutch offers to the CLARIN community as a CLARIN-Knowledge center are:
You are most welcome to contribute to these pages, please contact [mailto://servicedesk@ivdnt.org servicedesk@ivdnt.org] with as subject K-Dutch, and we will be in touch.
==Linguisitic topics==


* [[Helpdesk]] for information about Dutch
===[[Grammar]]===
* [[Helpdesk|Assistance]] with the use of data and tools
 
* [[Best practice documents and guidelines]]
* [[Grammar#Phonology,_Morphology_and_Syntax:_Taalportaal|Phonology, Morphology and Syntax: Taalportaal]]
* [[Internships]]
* [[Grammar#Morphosyntax|Morphosyntax]]
* [[Consulting]]
* [[Grammar#Syntactic_Atlas_of_the_Dutch_dialects_(SAND)|Syntactic Atlas of the Dutch dialects (SAND)]]
* [[Grammar#Dutch_descriptive_grammar:_e-ANS_(in_Dutch)|Dutch descriptive grammar: e-ANS (in Dutch)]]
* [[Grammar#Grambank|Grambank]]
 
===[[Lexicography]]===
 
* [[Lexicography#Dutch_dictionaries|Dutch dictionaries]]
* [[Lexicography#Elexis|The Elexis Project]]
* [https://ivdnt.org/wp-content/uploads/2021/02/The-Future-of-Academic-Lexicography-A-White-Paper.pdf White paper]: The Future of Academic Lexicography
 
===[[Terminology]]===
*[[Terminology#Centre_of_Expertise_for_Dutch_Terminology|The Centre of Expertise for Dutch Terminology]]
*[[Terminology#Academic_Language|Academic Language]]
*[[Terminology#Medical_Terminology|Medical Terminology]]
*[[Terminology#Dutch_as_a_scientific_language|Dutch as a scientific language]]
*[[Terminology#Legal_Terminology|Legal Terminology]]
 
===[[Spelling]]===
*[[Spelling#Woordenlijst.org_(Official_Dutch_Word_List)|Woordenlijst.org (Official Dutch Word List)]]
*[[Spelling#Spelling_Certification_Mark|Spelling Certification Mark]]
 
==Linguistic resources: datasets==
===[[Corpora]]===
* [[Newspaper corpora]]: corpora exclusively consisting of newspaper text
* [[Parliamentary corpora]]
* [[Computer-mediated communication corpora]]
* [[Corpora of academic texts]]
* [[Historical corpora]]
* [[L2 learner corpora]]
* [[Manually annotated corpora]]
* [[Multimodal corpora]]
* [[Parallel corpora]]
* [[Reference corpora]]
* [[Social media corpora]]
* [[Spoken corpora]]
* [[Sign Language corpora]]
* [[Propbanks]]: contains semantic role labels
* [[Treebanks]]
* [[Other corpora]]
 
===Lexical Resources===
* [[Lexica]]
* [[Dictionaries]]
* [[Conceptual Resources]]
* [[Wordlists]]
* [[Embeddings]]
* [[Lexica of terminology]]
* [[Ontologies]]
 
===N-grams===
* [[Character N-grams]]


== List of linguistic topics covered (bulleted list of key words and phrases) ==
==Tools for Dutch==
===Normalisation===
* [[Format conversion]]
* [[Spell checking]]
*[https://dev.clarin.nl/node/1914 TiCCLops]: Text-Induced Corpus Clean-up online processing system: no longer available
*[https://lt3.ugent.be/normalisation-demo/ Normalisation Demo]


* Morphology
===Language Learning===
* Syntax
*[https://schrijfassistent.be Schrijfassistent]
* Semantics
*[http://schrijfassistent.standaard.be/ Schrijfassistent] at De Standaard
* Pragmatics
*[https://www.nedbox.be NedBox]: Online exercises to learn Dutch
* Stylistics
*[https://oefenen.nl/programma/soort/taal Oefenen.nl]: Online exercises to learn Dutch
* Language learning
*[http://woordcombinaties.ivdnt.org/ Woordcombinaties]: Verbs and their combination patterns
* Translation studies
*[https://orientplus.ucll.be/ Orient+]: A serious game to enhance academic vocabulary
* Diachronic language studies
*[https://www.taalwinkel.nl/ Taalwinkel]: Language Advice
* Phonology
* Terminology
* Dialectology
* Lexicography
* Natural Language Processing


== List of language processing topics covered (bulleted list of key words and phrases) ==
===Automatic linguistic annotation===
* [[Basic language processing]]
* [[Deep parsing]]
<!-- ===Information extraction!-->
<!--* Processing of historical variants of Dutch!-->
<!--* Text mining!-->


* Basic language processing
===Speech processing===
* Deep parsing
* [[Spoken Language Recognition]]
* Information extraction
* [[Speech recognition]]
* Machine translation
* Processing of historical variants of Dutch
* Speech recognition
* Speech synthesis
* Speech synthesis
* Text mining
* Corpus querying
* Treebank querying


== Data types covered ==
===Natural Language Processing===
* [[Language Modeling]]
* [[Machine translation]]
* [[Coreference resolution]]
* [[Compound splitting]]
* [[Word Sense Disambiguation]]
* [[Text classification]]
* [[Sentiment analysis]]
* [[Readability]]
* [[Clinical NLP]]
 
===Resource querying===
* [[Corpus querying]]
* [[Treebank querying]]
 
===Machine translation===
====Translation Engines====
Publicly available machine translation engines from or to Dutch:
*[https://www.deepl.com/translator DeepL]
*[https://translate.google.com/ Google translate]
*[https://www.bing.com/translator Bing Microsoft translator]
*[https://www.reverso.net/ Reverso]
*[https://webgate.ec.europa.eu/etranslation/public/welcome.html eTranslation from the European Union]
*[https://mateo.ivdnt.org/Translate MATEO No Language Left Behind]
 
====MT Evaluation====
*[https://mateo.ivdnt.org/Evaluate MATEO Machine Translation Evaluation Online]


* Language models
===Terminology extraction===
* Dictionaries
* [https://termtreffer.org/ Termtreffer]. Ask for login at [mailto:terminologie@ivdnt.org terminologie@ivdnt.org].
* Treebanks
* [https://lt3.ugent.be/dterminer D-Terminer demo]. Terminology extraction for Dutch, English, French and German. (Rigouts Terryn, A. (2021). D-TERMINE: Data-driven Term Extraction Methodologies Investigated [Doctoral thesis]. Ghent University.)
* Wordnets
* Linked open data
* Ontologies


== CLARIN Resource Families covered ==
===Terminology management===
* Corpora
* [https://iate.europa.eu/home IATE] (Interactive Terminology for Europe) is the EU's terminology management system. It’s the shared terminology management system of the institutions of the European Union and it contains more than 7 million terms in 26 languages covering more than 100 domains of the EU legislation.
** Newspaper corpora
** Parliamentary corpora
** Computer-mediated communication corpora
** Corpora of academic texts
** [[Historical corpora]]
** L2 learner corpora
** Literary corpora
** Manually annotated corpora
** Multimodal corpora
** Parallel corpora
** Reference corpora
** Spoken corpora
* Lexical Resources
** Lexica
** Dictionaries
**Conceptual Resources
**Wordlists


== Generic topics covered, not connected with specific languages ==
===Other===
* Previously unmentioned [[CLARIN projects]] at INT
* [https://webservices.cls.ru.nl/ Language and Speech Tools] at Radboud Nijmegen. e.g. [https://webservices.cls.ru.nl/tscan T-scan], an analysis tool for dutch texts to assess the complexity of the text.
* [https://www.opener-project.eu/ OpeNER] is a language analysis toolchain helping (academic) researchers and companies make sense out of natural language analysis”. It consist of easy to install, improve and configure components to e.g. detect the language of a text, determine polarisation of texts (sentiment analysis), detect what topics are included in the text,... The supported language set currently consists of: English, Spanish, Italian, German and Dutch.
* [https://gate.ac.uk/ GATE] (General Architecture for Text Engineering) is a Java suite of tools originally developed at the University of Sheffield and it is used for many natural language processing tasks, including information extraction. ([https://cloud.gate.ac.uk/shopfront#tagged=Dutch Dutch services in GATE Cloud]).
* [https://speech-repository.webcloud.ec.europa.eu/ Speech Repository] is an online e-learning tool. It contains video recordings of real-life speeches and tailor-made pedagogical material speeches which give the interpreter and interpreting students an opportunity to practise and improve their interpretation skills.
* [https://subworkshop.sourceforge.net/ Subtitle Workshop] is a free application for creating, editing, and converting text-based subtitle files.
* [https://youdescribe.org/ YouDescribe] is a free, web-based platform for adding audio description to YouTube content.
* [https://www.audacityteam.org/ Audacity] is an audio recording and editing software application that is open source.


* Artificial intelligence
==Helpdesk==
* Natural language processing
For information about Dutch: If you cannot find the answers to your questions on this wiki, you can send your question to [mailto://servicedesk@ivdnt.org servicedesk@ivdnt.org ]. Your questions will be forwarded as soon as possible to the appropriate experts and you should receive an answer within two working days.
* Machine learning
* Data mining
* Lexicography
* Linked data


== Targeted audience ==
You can also ask us for information and assistance with the use of data and tools.


The INT CLARIN K-center is targeted towards all people interested in any aspects of the Dutch language.
==Other Services==
* [[Best practice documents and guidelines]]
* [[Internships]]
* [[Consulting]]
* [[CLARIN]] for Dutch


* Computational linguists
==Questions and Answers==
* Linguists
On the [[Q&A|Questions and Answers page]] we keep track of all questions we receive concerning Dutch. This will grow into a repository of K-Dutch answers to your questions.
* Language teachers
* Historians
* Library staff
* Sociologists
* Citizen scientists


== List of modalities covered by the expertise of the centre (bulleted list of key words and phrases) ==
Note that there is also a very active Discord server concerning Dutch NLP: https://discord.gg/jn94Ux5j


* Audio
</translate>
* Text

Latest revision as of 12:52, 13 March 2024

Other languages:

Mediawiki:Mainpage

Welcome to K-Dutch, the place for anyone who wants to know anything about the Dutch language: linguistic properties, language advice, available tools and resources, etymology, dialects...

K-Dutch is a CLARIN Knowledge Centre. It is hosted by the Instituut voor de Nederlandse Taal (Dutch Language Institute) , which is also a CLARIN-B centre and host of many resources for Dutch, which are, in general, freely available for research purposes. K-Dutch is an initiative of CLARIN-ERIC and CLARIN-BE.

The status of Dutch with respect to language technologies is described in

You are most welcome to contribute to these pages, please contact servicedesk@ivdnt.org with as subject K-Dutch, and we will be in touch.

Linguisitic topics

Grammar

Lexicography

Terminology

Spelling

Linguistic resources: datasets

Corpora

Lexical Resources

N-grams

Tools for Dutch

Normalisation

Language Learning

Automatic linguistic annotation

Speech processing

Natural Language Processing

Resource querying

Machine translation

Translation Engines

Publicly available machine translation engines from or to Dutch:

MT Evaluation

Terminology extraction

  • Termtreffer. Ask for login at terminologie@ivdnt.org.
  • D-Terminer demo. Terminology extraction for Dutch, English, French and German. (Rigouts Terryn, A. (2021). D-TERMINE: Data-driven Term Extraction Methodologies Investigated [Doctoral thesis]. Ghent University.)

Terminology management

  • IATE (Interactive Terminology for Europe) is the EU's terminology management system. It’s the shared terminology management system of the institutions of the European Union and it contains more than 7 million terms in 26 languages covering more than 100 domains of the EU legislation.

Other

  • Previously unmentioned CLARIN projects at INT
  • Language and Speech Tools at Radboud Nijmegen. e.g. T-scan, an analysis tool for dutch texts to assess the complexity of the text.
  • OpeNER is a language analysis toolchain helping (academic) researchers and companies make sense out of natural language analysis”. It consist of easy to install, improve and configure components to e.g. detect the language of a text, determine polarisation of texts (sentiment analysis), detect what topics are included in the text,... The supported language set currently consists of: English, Spanish, Italian, German and Dutch.
  • GATE (General Architecture for Text Engineering) is a Java suite of tools originally developed at the University of Sheffield and it is used for many natural language processing tasks, including information extraction. (Dutch services in GATE Cloud).
  • Speech Repository is an online e-learning tool. It contains video recordings of real-life speeches and tailor-made pedagogical material speeches which give the interpreter and interpreting students an opportunity to practise and improve their interpretation skills.
  • Subtitle Workshop is a free application for creating, editing, and converting text-based subtitle files.
  • YouDescribe is a free, web-based platform for adding audio description to YouTube content.
  • Audacity is an audio recording and editing software application that is open source.

Helpdesk

For information about Dutch: If you cannot find the answers to your questions on this wiki, you can send your question to servicedesk@ivdnt.org . Your questions will be forwarded as soon as possible to the appropriate experts and you should receive an answer within two working days.

You can also ask us for information and assistance with the use of data and tools.

Other Services

Questions and Answers

On the Questions and Answers page we keep track of all questions we receive concerning Dutch. This will grow into a repository of K-Dutch answers to your questions.

Note that there is also a very active Discord server concerning Dutch NLP: https://discord.gg/jn94Ux5j