Parallel corpora: Difference between revisions

From Clarin K-Centre
Jump to navigation Jump to search
(Created page with "==The Dutch Parallel Corpus== The Dutch Parallel Corpus (DPC) is a 10-million-word, sentence-aligned parallel corpus for the language pairs Dutch-English and Dutch-French, wi...")
 
No edit summary
Line 5: Line 5:
The corpus contains five different text types and is balanced with respect to text type and translation direction. The entire corpus has been aligned at sentence level and further enriched with linguistic information (lemmas and PoS-tags). A small subset of the Dutch-English part has also been manually aligned at the sub-sentential level.
The corpus contains five different text types and is balanced with respect to text type and translation direction. The entire corpus has been aligned at sentence level and further enriched with linguistic information (lemmas and PoS-tags). A small subset of the Dutch-English part has also been manually aligned at the sub-sentential level.


===[http://dpc.inl.nl/indexd.php Online search]===
*[http://dpc.inl.nl/indexd.php Online search]
*[http://hdl.handle.net/10032/tm-a2-h3 Download page]

Revision as of 11:37, 22 February 2021

The Dutch Parallel Corpus

The Dutch Parallel Corpus (DPC) is a 10-million-word, sentence-aligned parallel corpus for the language pairs Dutch-English and Dutch-French, with Dutch as the central language.

The corpus contains five different text types and is balanced with respect to text type and translation direction. The entire corpus has been aligned at sentence level and further enriched with linguistic information (lemmas and PoS-tags). A small subset of the Dutch-English part has also been manually aligned at the sub-sentential level.