Parliamentary corpora: Difference between revisions

From Clarin K-Centre
Jump to navigation Jump to search
No edit summary
(Marked this version for translation)
Line 1: Line 1:
<translate>
<translate>
<!--T:1-->
We currently have no specific Dutch parliamentary corpora available, but have worked on this topic in the framework of [https://www.clarin.eu/content/parlamint-towards-comparable-parliamentary-corpora ParlaMint], a project that aims to bring together as many parliamentary corpora of different European languages as possible.  
We currently have no specific Dutch parliamentary corpora available, but have worked on this topic in the framework of [https://www.clarin.eu/content/parlamint-towards-comparable-parliamentary-corpora ParlaMint], a project that aims to bring together as many parliamentary corpora of different European languages as possible.  


<!--T:2-->
To this end, the different datasets must be converted to a uniform format and provided with linguistic information. The INT has implemented this for the bilingual [https://www.dekamer.be/kvvcr/index.cfm Belgian Federal Parliament (French & Dutch)]. The aim of the project is to provide suitable research data for targeted observations of trends, opinions and decision-making. This will be tested by conducting a case study of the debate on the COVID-19 epidemic.
To this end, the different datasets must be converted to a uniform format and provided with linguistic information. The INT has implemented this for the bilingual [https://www.dekamer.be/kvvcr/index.cfm Belgian Federal Parliament (French & Dutch)]. The aim of the project is to provide suitable research data for targeted observations of trends, opinions and decision-making. This will be tested by conducting a case study of the debate on the COVID-19 epidemic.


<!--T:3-->
* [https://www.clarin.si/repository/xmlui/handle/11356/1432 Multilingual comparable data set available]
* [https://www.clarin.si/repository/xmlui/handle/11356/1432 Multilingual comparable data set available]


== European Parliament data==
== European Parliament data== <!--T:4-->


<!--T:5-->
[https://opus.nlpl.eu/Europarl.php Europarl data] on the OPUS website: a parallel corpus extracted from the European Parliament web site by Philipp Koehn (University of Edinburgh). The main intended use is to aid statistical machine translation research.
[https://opus.nlpl.eu/Europarl.php Europarl data] on the OPUS website: a parallel corpus extracted from the European Parliament web site by Philipp Koehn (University of Edinburgh). The main intended use is to aid statistical machine translation research.
</translate>
</translate>

Revision as of 12:42, 13 March 2024

We currently have no specific Dutch parliamentary corpora available, but have worked on this topic in the framework of ParlaMint, a project that aims to bring together as many parliamentary corpora of different European languages as possible.

To this end, the different datasets must be converted to a uniform format and provided with linguistic information. The INT has implemented this for the bilingual Belgian Federal Parliament (French & Dutch). The aim of the project is to provide suitable research data for targeted observations of trends, opinions and decision-making. This will be tested by conducting a case study of the debate on the COVID-19 epidemic.

European Parliament data

Europarl data on the OPUS website: a parallel corpus extracted from the European Parliament web site by Philipp Koehn (University of Edinburgh). The main intended use is to aid statistical machine translation research.