Parallel corpora: Difference between revisions

Revision as of 14:02, 11 March 2021

PacoMT Parallel Corpora

During the STEVIN project PaCo-MT (Parse and Corpus-based Machine Translation), two existing parallel corpora were enriched with syntactic annotations and node alignments. The annotations were generated automatically.

Language Pairs: English to Dutch, Dutch to English, French to Dutch, Dutch to French.

version 1.0
data set from 2014
38.8 MB
Download page
Project website

The Dutch Parallel Corpus

The Dutch Parallel Corpus (DPC) is a 10-million-word, sentence-aligned parallel corpus for the language pairs Dutch-English and Dutch-French, with Dutch as the central language.

The corpus contains five different text types and is balanced with respect to text type and translation direction. The entire corpus has been aligned at sentence level and further enriched with linguistic information (lemmas and PoS-tags). A small subset of the Dutch-English part has also been manually aligned at the sub-sentential level.

@@ Line 2: / Line 2: @@
 During the STEVIN project PaCo-MT (Parse and Corpus-based Machine Translation), two existing parallel corpora were enriched with syntactic annotations and node alignments. The annotations were generated automatically.
-Language Pairs:
+Language Pairs: English to Dutch, Dutch to English, French to Dutch, Dutch to French.
-*English to Dutch
-*Dutch to English
-*French to Dutch
-*Dutch to French
 *version 1.0
@@ Line 13: / Line 9: @@
 *[http://hdl.handle.net/10032/tm-a2-f7 Download page]
 *[http://www.ccl.kuleuven.be/Projects/PACO/paco.php Project website]
 ==The Dutch Parallel Corpus==

Parallel corpora: Difference between revisions

Revision as of 14:02, 11 March 2021

PacoMT Parallel Corpora

The Dutch Parallel Corpus

Navigation menu

Search