Jump to content

Translations:Simplification Data/30/en

From Clarin K-Centre

The Synthetic Simplification Dataset was compiled within the Duidelijke Taal project and is based on the WR-P-E-I component (websites) of the SoNaR corpus. The dataset consists of three parts: 6,986 sentences from the SoNaR corpus, a synthetic simplification of the SoNaR sentences created by GPT-4 and sentence pairs consisting of one SoNaR sentence and its simplified version each.