Translations:Parallel Monolingual Corpora/15/en

From Clarin K-Centre
Jump to navigation Jump to search

2) The second dataset is created by UWV Nederland as part of the "Leesplank" project to ensure ethical and legal soundness. It comprises 2.87 million paragraphs and its simplified text as corresponding result. The paragraphs are based on the Dutch Wikipedia extract from Gigacorpus. The text was filtered and cleaned by using GPT-4 1106 preview.