Jump to content

Translations:Q&A/106/en: Difference between revisions

From Clarin K-Centre
FuzzyBot (talk | contribs)
Importing a new version from external source
 
(No difference)

Latest revision as of 16:00, 13 November 2025

Information about message (contribute)
This message has no documentation. If you know where or how this message is used, you can help other translators by adding documentation to this message.
Message definition (Q&A)
* I assume the is/uncertain is something that was hard to understand (as it is a transcribed spoken corpus) and is therefore transcibed as is/uncertain.
* I’ve double checked the CGN.woordvorm.txt frequency of “is” in the online version of CGN: [https://portal.clarin.ivdnt.org/opensonar_frontend/opensonar/search/hits?filter=Corpus_title%3A%28%22CGN%22%29&first=0&group=hit%3Aword%3Ai&number=20&patt=%5B%5D&interface=%7B%22form%22%3A%22explore%22%2C%22exploreMode%22%3A%22ngram%22%7D Here]. There the number is the same, and it is 1.41% of all tokens.
* The numbers in SONAR500 are the frequency of the word, the cumulative frequency, and the cumulative relative frequency. SONAR500 contains about 500 million tokens. The total frequency of “is” amounts to 1.09% of the total corpus, which is actually quite similar.
  • I assume the is/uncertain is something that was hard to understand (as it is a transcribed spoken corpus) and is therefore transcibed as is/uncertain.
  • I’ve double checked the CGN.woordvorm.txt frequency of “is” in the online version of CGN: Here. There the number is the same, and it is 1.41% of all tokens.
  • The numbers in SONAR500 are the frequency of the word, the cumulative frequency, and the cumulative relative frequency. SONAR500 contains about 500 million tokens. The total frequency of “is” amounts to 1.09% of the total corpus, which is actually quite similar.