K-Dutch: Difference between revisions

From Clarin K-Centre
Jump to navigation Jump to search
No edit summary
Line 181: Line 181:
* [https://www.opener-project.eu/ OpeNER] is a language analysis toolchain helping (academic) researchers and companies make sense out of natural language analysis”. It consist of easy to install, improve and configure components to e.g. detect the language of a text, determine polarisation of texts (sentiment analysis), detect what topics are included in the text,... The supported language set currently consists of: English, Spanish, Italian, German and Dutch.
* [https://www.opener-project.eu/ OpeNER] is a language analysis toolchain helping (academic) researchers and companies make sense out of natural language analysis”. It consist of easy to install, improve and configure components to e.g. detect the language of a text, determine polarisation of texts (sentiment analysis), detect what topics are included in the text,... The supported language set currently consists of: English, Spanish, Italian, German and Dutch.
* [https://gate.ac.uk/ GATE] (General Architecture for Text Engineering) is a Java suite of tools originally developed at the University of Sheffield and it is used for many natural language processing tasks, including information extraction. ([https://cloud.gate.ac.uk/shopfront#tagged=Dutch Dutch services in GATE Cloud]).
* [https://gate.ac.uk/ GATE] (General Architecture for Text Engineering) is a Java suite of tools originally developed at the University of Sheffield and it is used for many natural language processing tasks, including information extraction. ([https://cloud.gate.ac.uk/shopfront#tagged=Dutch Dutch services in GATE Cloud]).
* [https://speech-repository.webcloud.ec.europa.eu/ Speech Repository] is an online e-learning tool. It contains video recordings of real-life speeches and tailor-made pedagogical material speeches which give the interpreter
and interpreting students an opportunity to practise and improve their interpretation skills.


==Helpdesk==
==Helpdesk==

Revision as of 14:56, 29 September 2023

Mediawiki:Mainpage

Welcome to K-Dutch, the place for anyone who wants to know anything about the Dutch language: linguistic properties, language advice, available tools and resources, etymology, dialects...

K-Dutch is a CLARIN Knowledge Centre. It is hosted by the Instituut voor de Nederlandse Taal (Dutch Language Institute) , which is also a CLARIN-B centre and host of many resources for Dutch, which are, in general, freely available for research purposes. K-Dutch is an initiative of CLARIN-ERIC and CLARIN-BE.

The status of Dutch with respect to language technologies is described in Steurs, Vandeghinste and Daelemans (2022). Report on Dutch. Project deliverable. European Language Equality.

You are most welcome to contribute to these pages, please contact servicedesk@ivdnt.org with as subject K-Dutch, and we will be in touch.

Linguisitic topics

Grammar

See the Grammar page for full details.

Lexicography

Dutch dictionaries

We provide a special page with more details about different types of dictionaries that are available for Dutch. See Dictionaries.

Elexis

ELEXIS is an acronym for European Lexicographic Infrastructure. This project is carried out as part of the Horizon 2020 programme and aims to create a durable infrastructure for e-lexicography. A large amount of high-quality semantic information is now still kept in individual lexicographic sources, spread out over Europe. ELEXIS makes it possible to link, share, distribute and save all of these different European sources on a large scale. Besides, the project helps diminish the gap between communities with great lexicographic expertise and those with little.

White Paper: The Future of Academic Lexicography

Terminology

The Centre of Expertise for Dutch Terminology (Expertisecentrum Nederlandstalige Terminologie or ENT) supports people and organisations involved with terminology. They can find terminological information and tools here, on the website of the Dutch Language Institute (INT). A newsletter is sent round several times a year, describing developments and events in the field of terminology.

Higher Education Terminology

HOTNeV

HOTNeV is an acronym for Hoger Onderwijs Terminologie in Nederland en Vlaanderen (Higher Education Terminology in the Netherlands and Flanders). This project was prompted by a sharp increase in educational terms, generated by the EU’s education policy and implemented by the Tuning Project. HOTNeV has a dual purpose. Until now, Dutch equivalents for the English terminology were created mainly ad hoc, but this project focuses on the need to coordinate the provision of terms that have been approved by parties in the Dutch-speaking educational sector. It also wants to show the feasibility of this ambition.

Academic Phrases

A collection of academic phrases in the Academic Phrasesbank for Dutch, made by the Vrije Universiteit Amsterdam.

Medical Terminology: Medical Pilot

The Medical Pilot is an experimental database in which a small part of the medical vocabulary is described at various levels, from scientific to accessible to people with low literacy, and in which differences between Flemish and Dutch terms are also shown.

See also [1] for a medical dictionary.

Dutch as a scientific language

With support of the Taalunie, corpus-based terminology lists for two different scientific domains have been made available. Currently as pdfs, but a nice search interface is expected around late 2021.

Academic Dutch

Tools for training academic Dutch

Spelling

Woordenlijst.org (Official Dutch Word List)

The Word List of the Dutch Language is online available for free at woordenlijst.org. In 2015, the online version grew from approximately 100,000 entries to roughly 168,000 entries. All words from the previous printed edition have been retained.

The newly added words are derived from text files collected at the Dutch Language Institute, containing newspaper texts, literary texts and texts from the internet. In addition, a selection was made from all words that had been looked up in vain in the online Word List.

Since 2015, woordenlijst.org has been updated several times a year with hundreds of new words. At the end of 2019 it contained a total of 186,000 words. With all plural forms, diminutive forms, past tenses and past participles, the digital version of the Word List now contains information about approximately 680,000 word forms.

Spelling Certification Mark

The Spelling Certification Mark ([Keurmerk Spelling]) is a guarantee given by the Union for the Dutch Language (Taalunie) that a reference work can be used to look up the official spelling.

For the automatic spell check of word lists (for example provided by dictionary suppliers), the Dutch Language Institute uses the Spelling Certification Mark, also known as the HulK. Our spelling specialists manually correct the words the HulK does not recognize and add these to our own material. From then on the words can be processed automatically.

Any word list compiled in accordance with the rules and principles of the official spelling receives the Spelling Certification Mark.

Linguistic resources: datasets

Corpora

Lexical Resources

N-grams

Tools for Dutch

Normalisation

Language Learning

Automatic linguistic annotation

Speech processing

Natural Language Processing

Resource querying

Machine translation engines

Publicly available machine translation engines from or to Dutch

Terminology extraction

  • Termtreffer. Ask for login at terminologie@ivdnt.org.
  • D-Terminer demo. Terminology extraction for Dutch, English, French and German. (Rigouts Terryn, A. (2021). D-TERMINE: Data-driven Term Extraction Methodologies Investigated [Doctoral thesis]. Ghent University.)

Terminology management

  • IATE (Interactive Terminology for Europe) is the EU's terminology management system. It’s the shared terminology management system of the institutions of the European Union and it contains more than 7 million terms in 26 languages covering more than 100 domains of the EU legislation.

Other

  • Previously unmentioned CLARIN projects at INT
  • Language and Speech Tools at Radboud Nijmegen. e.g. T-scan, an analysis tool for dutch texts to assess the complexity of the text.
  • OpeNER is a language analysis toolchain helping (academic) researchers and companies make sense out of natural language analysis”. It consist of easy to install, improve and configure components to e.g. detect the language of a text, determine polarisation of texts (sentiment analysis), detect what topics are included in the text,... The supported language set currently consists of: English, Spanish, Italian, German and Dutch.
  • GATE (General Architecture for Text Engineering) is a Java suite of tools originally developed at the University of Sheffield and it is used for many natural language processing tasks, including information extraction. (Dutch services in GATE Cloud).
  • Speech Repository is an online e-learning tool. It contains video recordings of real-life speeches and tailor-made pedagogical material speeches which give the interpreter

and interpreting students an opportunity to practise and improve their interpretation skills.

Helpdesk

For information about Dutch: If you cannot find the answers to your questions on this wiki, you can send your question to servicedesk@ivdnt.org . Your questions will be forwarded as soon as possible to the appropriate experts and you should receive an answer within two working days.

You can also ask us for information and assistance with the use of data and tools.

Other Services

Questions and Answers

On the Questions and Answers page we keep track of all questions we receive concerning Dutch. This will grow into a repository of K-Dutch answers to your questions.