Jump to content

Translations:Reference corpora/8/en

From Clarin K-Centre
Revision as of 15:57, 19 March 2024 by FuzzyBot (talk | contribs) (Importing a new version from external source)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

SoNaR-500 contains more than 500 million words of text from various domains and genres. All texts were tokenized, POS tagged and lemmatized. The named entities were also labeled. All SoNaR-500 annotations were generated automatically.