Translations:Q&A/84/en
We are comparing the Dutch and Turkish translations of the Linguistic Inquiry and Word Count [LIWC] dictionaries. Do you know of any corpora that would be suitable? I found several candidates on OPUS (https://opus.nlpl.eu/), and downloaded the TED2020 talks. However these are .xml files with paragraph/line IDs and I need .txt files. Would you have a script or a way to automatically recode them and remove the unnecessary tags?