Q&A: Difference between revisions

From Clarin K-Centre
Jump to navigation Jump to search
No edit summary
No edit summary
Line 9: Line 9:
==Is there a speech recognition engine available for Belgian Dutch?==
==Is there a speech recognition engine available for Belgian Dutch?==
Currently, the release of a Belgian Dutch speech recognizer that can be freely used for research is in preparation.
Currently, the release of a Belgian Dutch speech recognizer that can be freely used for research is in preparation.
==Which corpora are available for Automatic Simplification for Dutch?==
There are currently no parallel corpora available in which regular Dutch has been simplified, so this makes it impossible to straightforwardly treat this as a machine translation problem.
If you would consider to develop a form of unsupervised simplification, there are, however, a number of corpora available which can be considered to be in a form of easy Dutch. These corpora are the [http://hdl.handle.net/10032/tm-a2-q6 Wablieft-corpus] (Easy Belgian Dutch), the [http://hdl.handle.net/10032/tm-a2-n4 Basilex-corpus] (Texts for children in Dutch primary schools), and [http://hdl.handle.net/10032/tm-a2-t9 WAI-NOT] (Very easy Belgian Dutch).

Revision as of 08:00, 8 October 2021

This page lists the questions we received.

Do you have any domain specific corpora?

On the main page you find a listing of different types of corpora we have. Domain specific corpora are the Parliamentary corpora and the Corpora of academic texts. Under the Parallel corpora there are also domain specific corpora.

Are there literary texts available?

From the Public Domain Page you can find a link to the downloadable public domain files in DBNL.

Is there a speech recognition engine available for Belgian Dutch?

Currently, the release of a Belgian Dutch speech recognizer that can be freely used for research is in preparation.

Which corpora are available for Automatic Simplification for Dutch?

There are currently no parallel corpora available in which regular Dutch has been simplified, so this makes it impossible to straightforwardly treat this as a machine translation problem.

If you would consider to develop a form of unsupervised simplification, there are, however, a number of corpora available which can be considered to be in a form of easy Dutch. These corpora are the Wablieft-corpus (Easy Belgian Dutch), the Basilex-corpus (Texts for children in Dutch primary schools), and WAI-NOT (Very easy Belgian Dutch).