Compound splitting: Difference between revisions
Jump to navigation
Jump to search
(Created page with "==Compound splitter demo== A compound splitter splits compounds into their component parts, e.g. liefde+s+drank or [post+zegel]+verzamelaar. This demo allows Dutch input up to...") |
No edit summary |
||
(5 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
<languages/> | |||
<translate> | |||
<!--T:1--> | |||
==Compound splitter demo== | ==Compound splitter demo== | ||
A compound splitter splits compounds into their component parts, e.g. liefde+s+drank or [post+zegel]+verzamelaar. | A compound splitter splits compounds into their component parts, e.g. liefde+s+drank or [post+zegel]+verzamelaar. | ||
This demo allows Dutch input up to 500 characters. You can either input running text or single words (one word per line). If you are interested in using the compound splitter for other purposes contact Lieve.Macken@UGent.be. | This demo allows Dutch input up to 500 characters. You can either input running text or single words (one word per line). If you are interested in using the compound splitter for other purposes contact Lieve.Macken@UGent.be. | ||
[https://biblio.ugent.be/publication/7126122 | <!--T:2--> | ||
[https://lt3.ugent.be/compound-splitter-demo/ Demo] | *[https://biblio.ugent.be/publication/7126122 Lieve Macken and Arda Tezcan. 2018. “Dutch Compound Splitting for Bilingual Terminology Extraction.” In Multiword Units in Machine Translation and Translation Technology, ed. Ruslan Mitkov, Johanna Monti, Gloria Corpas Pastor, and Violeta Seretan. Vol. 341. John Benjamins, pp. 148–162.] | ||
*[https://lt3.ugent.be/compound-splitter-demo/ Demo] | |||
==CharSplit - An ngram-based compound splitter== | |||
Python module that splits a compound into its body and head. So far German and Dutch are supported. | |||
*[https://pypi.org/project/compound-split/ Webpage] | |||
==Wordbuilder== | |||
*[https://www.aclweb.org/anthology/L02-1004/ Vincent Vandeghinste (2002). Lexicon Optimization: Maximizing Lexical Coverage in Speech Recognition through Automated Compounding.] Proceedings of the Third International Conference on Language Resources and Evaluation (LREC2002). ELRA. Paris. | |||
</translate> |
Latest revision as of 15:29, 6 November 2024
Compound splitter demo
A compound splitter splits compounds into their component parts, e.g. liefde+s+drank or [post+zegel]+verzamelaar. This demo allows Dutch input up to 500 characters. You can either input running text or single words (one word per line). If you are interested in using the compound splitter for other purposes contact Lieve.Macken@UGent.be.
- Lieve Macken and Arda Tezcan. 2018. “Dutch Compound Splitting for Bilingual Terminology Extraction.” In Multiword Units in Machine Translation and Translation Technology, ed. Ruslan Mitkov, Johanna Monti, Gloria Corpas Pastor, and Violeta Seretan. Vol. 341. John Benjamins, pp. 148–162.
- Demo
CharSplit - An ngram-based compound splitter
Python module that splits a compound into its body and head. So far German and Dutch are supported.
Wordbuilder
- Vincent Vandeghinste (2002). Lexicon Optimization: Maximizing Lexical Coverage in Speech Recognition through Automated Compounding. Proceedings of the Third International Conference on Language Resources and Evaluation (LREC2002). ELRA. Paris.