Corpus querying

From Clarin K-Centre
Revision as of 08:57, 22 February 2021 by Vincent (talk | contribs) (Created page with "==[http://portal.clarin.inl.nl/autocorp/ Autosearch]== This demonstrator allows users to define one or more corpora and upload data for the corpora, after which the corpora wi...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Autosearch

This demonstrator allows users to define one or more corpora and upload data for the corpora, after which the corpora will be made automatically searchable in a private workspace.

Users can upload text data annotated with lemma + part of speech tags in TEI or FoLiA format, either as a single XML file or as an archive (zip or tar.gz) containing several XML files. Corpus size is limited to begin with (25 MB limit per uploaded file; 500,000 token limit for an entire corpus), but these limits may be increased at a later point in time. The search application is powered by the INL BlackLab corpus search engine. The search interface is the same as the one used in for example the Corpus of Contemporary Dutch / Corpus Hedendaags Nederlands.