Translations:Parallel Monolingual Corpora/15/en

From Clarin K-Centre
Revision as of 14:02, 11 June 2024 by FuzzyBot (talk | contribs) (Importing a new version from external source)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

2) The second dataset is created by UWV Nederland as part of the "Leesplank" project to ensure ethical and legal soundness. It comprises 2.87 million paragraphs and its simplified text as corresponding result. The paragraphs are based on the Dutch Wikipedia extract from Gigacorpus. The text was filtered and cleaned by using GPT-4 1106 preview.