Translations:Simplification Data/30/en: Difference between revisions
Appearance
Importing a new version from external source |
(No difference)
|
Latest revision as of 10:29, 3 December 2024
The Synthetic Simplification Dataset was compiled within the Duidelijke Taal project and is based on the WR-P-E-I component (websites) of the SoNaR corpus. The dataset consists of three parts: 6,986 sentences from the SoNaR corpus, a synthetic simplification of the SoNaR sentences created by GPT-4 and sentence pairs consisting of one SoNaR sentence and its simplified version each.