Translations:Reference corpora/8/en

From Clarin K-Centre
Jump to navigation Jump to search

SoNaR-500 contains more than 500 million words of text from various domains and genres. All texts were tokenized, POS tagged and lemmatized. The named entities were also labeled. All SoNaR-500 annotations were generated automatically.