Historical corpora: Difference between revisions
No edit summary |
(Marked this version for translation) |
||
(4 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
<languages/> | |||
<translate> | <translate> | ||
== Nederlab == | == Nederlab == <!--T:1--> | ||
<!--T:38--> | |||
A user-friendly and tool-enriched open access web interface that aims at containing all digitized texts relevant for the Dutch national heritage and the history of Dutch language and culture (c. 800 - present). | A user-friendly and tool-enriched open access web interface that aims at containing all digitized texts relevant for the Dutch national heritage and the history of Dutch language and culture (c. 800 - present). | ||
<!--T:2--> | |||
*[https://www.nederlab.nl/ Project website] | *[https://www.nederlab.nl/ Project website] | ||
*[https://dev.clarin.nl/node/4234 CLAPOP description] | *[https://dev.clarin.nl/node/4234 CLAPOP description] | ||
==Before 12th century: Corpus of Old Dutch== | ==Before 12th century: Corpus of Old Dutch== <!--T:3--> | ||
<!--T:4--> | |||
The Corpus of Old Dutch is the collection of all texts in Old Dutch that served as source material for the Dictionary of Old Dutch (ONW). The texts originate from the period between 475 and 1200. | The Corpus of Old Dutch is the collection of all texts in Old Dutch that served as source material for the Dictionary of Old Dutch (ONW). The texts originate from the period between 475 and 1200. | ||
<!--T:5--> | |||
The texts in Old Dutch that Maurits Gysseling had collected and transcribed formed the basis of this collection. They have been supplemented with texts like the Mittelfränkische Reimbibel, glosses like the Malbergse glossen to the Lex Salica, and anthroponymic and toponymic material. The corpus has been annotated with word classes and lemmas. The annotation of the entire corpus has been manually verified. | The texts in Old Dutch that Maurits Gysseling had collected and transcribed formed the basis of this collection. They have been supplemented with texts like the Mittelfränkische Reimbibel, glosses like the Malbergse glossen to the Lex Salica, and anthroponymic and toponymic material. The corpus has been annotated with word classes and lemmas. The annotation of the entire corpus has been manually verified. | ||
<!--T:6--> | |||
What is Old Dutch? | What is Old Dutch? | ||
<!--T:7--> | |||
Old Dutch is the collective term for several related dialects that – just like Old English, Old Frisian, Old Saxon, and Old High German – developed out of West Germanic around the beginning of the fifth century. It was spoken in an area that does not entirely correspond with the current Dutch-speaking region. | Old Dutch is the collective term for several related dialects that – just like Old English, Old Frisian, Old Saxon, and Old High German – developed out of West Germanic around the beginning of the fifth century. It was spoken in an area that does not entirely correspond with the current Dutch-speaking region. | ||
<!--T:8--> | |||
Differentiating between Old Dutch, Old Saxon, and Old Frisian is sometimes difficult. The editors of the Dictionary of Old Dutch, who were responsible for the compilation of the corpus, applied a liberal admission policy. Nevertheless, not all texts from Gysseling’s original Old Dutch collection were incorporated into the corpus. One example is the Heliand, a poem that was left out because it was written in Old Saxon. | Differentiating between Old Dutch, Old Saxon, and Old Frisian is sometimes difficult. The editors of the Dictionary of Old Dutch, who were responsible for the compilation of the corpus, applied a liberal admission policy. Nevertheless, not all texts from Gysseling’s original Old Dutch collection were incorporated into the corpus. One example is the Heliand, a poem that was left out because it was written in Old Saxon. | ||
<!--T:9--> | |||
*[https://ivdnt.org/corpora-lexica/corpus-oudnederlands/ Project page] | *[https://ivdnt.org/corpora-lexica/corpus-oudnederlands/ Project page] | ||
*[https://corpusoudnederlands.ivdnt.org/corpus-frontend/ONL/search/ Online search] | *[https://corpusoudnederlands.ivdnt.org/corpus-frontend/ONL/search/ Online search] | ||
==13th century: Gysseling Corpus== | ==13th century: Gysseling Corpus== <!--T:10--> | ||
<!--T:11--> | |||
The Gysseling Corpus is the collection of all 13th-century texts that have served as source material for the Dictionary of Early Middle Dutch (VMNW). The corpus consists mainly of official and literary sources of thirteenth-century texts that have been handed down in 13th-century manuscripts. | The Gysseling Corpus is the collection of all 13th-century texts that have served as source material for the Dictionary of Early Middle Dutch (VMNW). The corpus consists mainly of official and literary sources of thirteenth-century texts that have been handed down in 13th-century manuscripts. | ||
<!--T:12--> | |||
The texts are diplomatic editions, which means that the source texts have been rendered in modern script as accurately as possible. The corpus has been linguistically annotated with word classes and modern Dutch lemmas (entry words) to enhance its searchability. The annotation of the entire corpus has been manually verified. | The texts are diplomatic editions, which means that the source texts have been rendered in modern script as accurately as possible. The corpus has been linguistically annotated with word classes and modern Dutch lemmas (entry words) to enhance its searchability. The annotation of the entire corpus has been manually verified. | ||
<!--T:13--> | |||
*[http://corpusgysseling.ivdnt.org/corpus-frontend/Gysseling/search Online search] | *[http://corpusgysseling.ivdnt.org/corpus-frontend/Gysseling/search Online search] | ||
*[http://hdl.handle.net/10032/tm-a2-j4 Download page] | *[http://hdl.handle.net/10032/tm-a2-j4 Download page] | ||
*[https://ivdnt.org/corpora-lexica/corpus-gysseling/ Project page] | *[https://ivdnt.org/corpora-lexica/corpus-gysseling/ Project page] | ||
==14th - 16th century: Corpus of Middle Dutch== | ==14th - 16th century: Corpus of Middle Dutch== <!--T:14--> | ||
<!--T:15--> | |||
The Corpus of Middle Dutch is a collection of rhyming texts and prose from the period of 1300-1550. It contains classics such as Beatrijs, Van den vos Reynaerde, the abele spelen, the stories about King Arthur and about Charlemagne, all texts from the famous Gruuthuuse manuscript (including the Egidius song), but also many of the lesser known or less researched texts, such as prose adaptations of the rhyming knight’s tales (the so-called ‘folk books’), collections of songs such as the Antwerp Songbook, various Bible translations, hagiographies, books of prayer, chronicles, and all kinds of religious, didactic and scientific treatises, medical manuals and recipes. | The Corpus of Middle Dutch is a collection of rhyming texts and prose from the period of 1300-1550. It contains classics such as Beatrijs, Van den vos Reynaerde, the abele spelen, the stories about King Arthur and about Charlemagne, all texts from the famous Gruuthuuse manuscript (including the Egidius song), but also many of the lesser known or less researched texts, such as prose adaptations of the rhyming knight’s tales (the so-called ‘folk books’), collections of songs such as the Antwerp Songbook, various Bible translations, hagiographies, books of prayer, chronicles, and all kinds of religious, didactic and scientific treatises, medical manuals and recipes. | ||
<!--T:16--> | |||
The corpus was compiled on the basis of mainly critical, scientifically sound text editions. In time, it will be annotated with word classes and lemmas, to improve searchability. | The corpus was compiled on the basis of mainly critical, scientifically sound text editions. In time, it will be annotated with word classes and lemmas, to improve searchability. | ||
<!--T:17--> | |||
*[http://hdl.handle.net/10032/tm-a2-j6 Download page] | *[http://hdl.handle.net/10032/tm-a2-j6 Download page] | ||
*[http://corpusmiddelnederlands.ivdnt.org/corpus-frontend/MNL/search/ Online search] | *[http://corpusmiddelnederlands.ivdnt.org/corpus-frontend/MNL/search/ Online search] | ||
*[https://ivdnt.org/corpora-lexica/corpus-middelnederlands/ Project page] | *[https://ivdnt.org/corpora-lexica/corpus-middelnederlands/ Project page] | ||
==17th century: Newspaper Corpus== | ==17th century: Newspaper Corpus== <!--T:18--> | ||
<!--T:19--> | |||
The Couranten Corpus comprises the seventeenth-century Dutch newspapers available on Delpher (delpher.nl/kranten). The oldest surviving newspapers were published in 1618. For the Delpher-website the Koninklijke Bibliotheek in The Hague has scanned the newspapers. In a citizen science project all newspapers were transcribed and corrected by more than 300 volunteers of the Stichting Vrijwilligersnetwerk Nederlandse Taal, led by Nicoline van der Sijs. Subsequently, metadata were added and checked, for instance on genre (advertisements, national news, international news). | The Couranten Corpus comprises the seventeenth-century Dutch newspapers available on Delpher (delpher.nl/kranten). The oldest surviving newspapers were published in 1618. For the Delpher-website the Koninklijke Bibliotheek in The Hague has scanned the newspapers. In a citizen science project all newspapers were transcribed and corrected by more than 300 volunteers of the Stichting Vrijwilligersnetwerk Nederlandse Taal, led by Nicoline van der Sijs. Subsequently, metadata were added and checked, for instance on genre (advertisements, national news, international news). | ||
<!--T:20--> | |||
This sizeable corpus currently contains the contents of 13 newspapers, 109.532 articles and 18.926.425 words. The information in these newspapers is of interest to researchers of various disciplines, ranging from historians to historical linguists, literature scholars and art historians. | This sizeable corpus currently contains the contents of 13 newspapers, 109.532 articles and 18.926.425 words. The information in these newspapers is of interest to researchers of various disciplines, ranging from historians to historical linguists, literature scholars and art historians. | ||
<!--T:21--> | |||
In the future, transcriptions of newly digitised newspapers from the seventeenth century and newspapers from the eighteenth century will be added to the Couranten Corpus. | In the future, transcriptions of newly digitised newspapers from the seventeenth century and newspapers from the eighteenth century will be added to the Couranten Corpus. | ||
<!--T:22--> | |||
This first online accessible version of the Couranten Corpus was released on 12th May 2022. | This first online accessible version of the Couranten Corpus was released on 12th May 2022. | ||
<!--T:23--> | |||
*[https://couranten.ivdnt.org/corpus-frontend/couranten/search/ Online search] | *[https://couranten.ivdnt.org/corpus-frontend/couranten/search/ Online search] | ||
*[https://ivdnt.org/corpora-lexica/courantencorpus/ Project page] | *[https://ivdnt.org/corpora-lexica/courantencorpus/ Project page] | ||
==17th - 19th century: Letters as Loot== | ==17th - 19th century: Letters as Loot== <!--T:24--> | ||
<!--T:25--> | |||
Approximately 40,000 Dutch letters from the second half of the 17th to the early 19th century have been gathering dust for centuries in British archives. They were sent home by sailors and others from abroad but also vice versa by those staying behind who needed to keep in touch with their loved ones. Many letters did not reach their destinations: they were taken as loot by privateers and confiscated by the High Court of Admiralty during the wars fought between The Netherlands and England. These confiscated letters of men, women and even children represent priceless material for historical linguists. They allow us to gain access to the as yet mainly unknown everyday Dutch of the past, the colloquial Dutch of people from the middle and lower classes. | Approximately 40,000 Dutch letters from the second half of the 17th to the early 19th century have been gathering dust for centuries in British archives. They were sent home by sailors and others from abroad but also vice versa by those staying behind who needed to keep in touch with their loved ones. Many letters did not reach their destinations: they were taken as loot by privateers and confiscated by the High Court of Admiralty during the wars fought between The Netherlands and England. These confiscated letters of men, women and even children represent priceless material for historical linguists. They allow us to gain access to the as yet mainly unknown everyday Dutch of the past, the colloquial Dutch of people from the middle and lower classes. | ||
<!--T:26--> | |||
The first extensive sociolinguistic analysis of these Dutch letters was conducted in the Letters as Loot research programme (2008-2013) at Leiden University. This research concentrated on a selection of about one thousand Dutch private letters from the late seventeenth and late eighteenth centuries, written by more than 700 different letter writers. | The first extensive sociolinguistic analysis of these Dutch letters was conducted in the Letters as Loot research programme (2008-2013) at Leiden University. This research concentrated on a selection of about one thousand Dutch private letters from the late seventeenth and late eighteenth centuries, written by more than 700 different letter writers. | ||
<!--T:27--> | |||
*[https://taalmaterialen.ivdnt.org/download/brieven-als-buit-2021/ Download page] | *[https://taalmaterialen.ivdnt.org/download/brieven-als-buit-2021/ Download page] | ||
*[http://brievenalsbuit.ivdnt.org/corpus-frontend/BaB/search/ Online search] | *[http://brievenalsbuit.ivdnt.org/corpus-frontend/BaB/search/ Online search] | ||
*[https://www.universiteitleiden.nl/en/research/research-projects/humanities/letters-as-loot.-towards-a-non-standard-view-on-the-history-of-dutch Project page] | *[https://www.universiteitleiden.nl/en/research/research-projects/humanities/letters-as-loot.-towards-a-non-standard-view-on-the-history-of-dutch Project page] | ||
==17th - 19th century: Letters as Loot-2== | ==17th - 19th century: Letters as Loot-2== <!--T:28--> | ||
<!--T:29--> | |||
Letters as Loot-2 is a spin-off of the Letters as Loot research programme (2008-2013) at Leiden University. This corpus is an addition to the original Letters as Loot corpus. It comprises more than 1300 Dutch letters which were taken as loot by privateers and confiscated by the High Court of Admiralty during the wars fought between The Netherlands and England from the second half of the 17th to the early 19th centuries. | Letters as Loot-2 is a spin-off of the Letters as Loot research programme (2008-2013) at Leiden University. This corpus is an addition to the original Letters as Loot corpus. It comprises more than 1300 Dutch letters which were taken as loot by privateers and confiscated by the High Court of Admiralty during the wars fought between The Netherlands and England from the second half of the 17th to the early 19th centuries. | ||
<!--T:30--> | |||
*[https://taalmaterialen.ivdnt.org/download/brieven-als-buit2/ Download page] | *[https://taalmaterialen.ivdnt.org/download/brieven-als-buit2/ Download page] | ||
*[http://brievenalsbuit2.ivdnt.org/corpus-frontend/BaBa/search/ Online search] | *[http://brievenalsbuit2.ivdnt.org/corpus-frontend/BaBa/search/ Online search] | ||
== 17th - 19th century: Letters as Loot - Gold Standard == | == 17th - 19th century: Letters as Loot - Gold Standard == <!--T:31--> | ||
<!--T:39--> | |||
Letters as Loot – Gold Standard contains ca. 1000 source files from the Letters as Loot program (directed by Prof. Dr. M.J. van der Wal), each enriched with main part-of-speech and modern lemmata. | Letters as Loot – Gold Standard contains ca. 1000 source files from the Letters as Loot program (directed by Prof. Dr. M.J. van der Wal), each enriched with main part-of-speech and modern lemmata. | ||
<!--T:32--> | |||
* [https://www.universiteitleiden.nl/onderzoek/onderzoeksprojecten/geesteswetenschappen/brieven-als-buit#tab-2 Project page] | * [https://www.universiteitleiden.nl/onderzoek/onderzoeksprojecten/geesteswetenschappen/brieven-als-buit#tab-2 Project page] | ||
* [http://hdl.handle.net/10032/Tm-a2-a7 Download page] | * [http://hdl.handle.net/10032/Tm-a2-a7 Download page] | ||
== 20th century: The VU-DNC Corpus== | == 20th century: The VU-DNC Corpus== <!--T:33--> | ||
<!--T:40--> | |||
A diachronic Dutch newspaper corpus (VU Free University Dutch Newspaper Corpus). | A diachronic Dutch newspaper corpus (VU Free University Dutch Newspaper Corpus). | ||
(More info under [[Newspaper corpora]]) | (More info under [[Newspaper corpora]]) | ||
<!--T:34--> | |||
* [https://portal.clarin.inl.nl/vu-dnc/index.html Corpus webpage] | * [https://portal.clarin.inl.nl/vu-dnc/index.html Corpus webpage] | ||
== Public Domain Data @ DBNL == | == Public Domain Data @ DBNL == <!--T:35--> | ||
<!--T:41--> | |||
A corpus of public domain books and texts available from the Royal Library in the Netherlands | A corpus of public domain books and texts available from the Royal Library in the Netherlands | ||
<!--T:36--> | |||
* [https://dbnl.org/letterkunde/pd/index.php Download page] | * [https://dbnl.org/letterkunde/pd/index.php Download page] | ||
== Delpher: historical newspapers, magazines, books and radio bulletins == | == Delpher: historical newspapers, magazines, books and radio bulletins == <!--T:37--> | ||
<!--T:42--> | |||
Delpher is a freely accessible website, developed and operated by the Koninklijke Bibliotheek, featuring digitized historical Dutch newspapers, books, magazines and radio bulletins from libraries, museums and other heritage institutions. | Delpher is a freely accessible website, developed and operated by the Koninklijke Bibliotheek, featuring digitized historical Dutch newspapers, books, magazines and radio bulletins from libraries, museums and other heritage institutions. | ||
* [https://www.delpher.nl Delpher.nl] | * [https://www.delpher.nl Delpher.nl] | ||
==Dutch Renaissance poetry corpus== <!--T:43--> | |||
<!--T:44--> | |||
This corpus contains alexandrines and iambic pentameters written by a selection of Dutch Renaissance poets (end of 16th and 17th century). Its creation and annotation was part of a PhD project at the Meertens Institute (https://www.meertens.knaw.nl) which was funded by the Koninklijke Nederlandse Akademie van Wetenschappen (KNAW). | |||
<!--T:45--> | |||
*[https://github.com/mirsdes/Dutch_Renaissance_poetry_corpus Github page] | |||
</translate> | </translate> |
Latest revision as of 11:53, 3 September 2024
Nederlab
A user-friendly and tool-enriched open access web interface that aims at containing all digitized texts relevant for the Dutch national heritage and the history of Dutch language and culture (c. 800 - present).
Before 12th century: Corpus of Old Dutch
The Corpus of Old Dutch is the collection of all texts in Old Dutch that served as source material for the Dictionary of Old Dutch (ONW). The texts originate from the period between 475 and 1200.
The texts in Old Dutch that Maurits Gysseling had collected and transcribed formed the basis of this collection. They have been supplemented with texts like the Mittelfränkische Reimbibel, glosses like the Malbergse glossen to the Lex Salica, and anthroponymic and toponymic material. The corpus has been annotated with word classes and lemmas. The annotation of the entire corpus has been manually verified.
What is Old Dutch?
Old Dutch is the collective term for several related dialects that – just like Old English, Old Frisian, Old Saxon, and Old High German – developed out of West Germanic around the beginning of the fifth century. It was spoken in an area that does not entirely correspond with the current Dutch-speaking region.
Differentiating between Old Dutch, Old Saxon, and Old Frisian is sometimes difficult. The editors of the Dictionary of Old Dutch, who were responsible for the compilation of the corpus, applied a liberal admission policy. Nevertheless, not all texts from Gysseling’s original Old Dutch collection were incorporated into the corpus. One example is the Heliand, a poem that was left out because it was written in Old Saxon.
13th century: Gysseling Corpus
The Gysseling Corpus is the collection of all 13th-century texts that have served as source material for the Dictionary of Early Middle Dutch (VMNW). The corpus consists mainly of official and literary sources of thirteenth-century texts that have been handed down in 13th-century manuscripts.
The texts are diplomatic editions, which means that the source texts have been rendered in modern script as accurately as possible. The corpus has been linguistically annotated with word classes and modern Dutch lemmas (entry words) to enhance its searchability. The annotation of the entire corpus has been manually verified.
14th - 16th century: Corpus of Middle Dutch
The Corpus of Middle Dutch is a collection of rhyming texts and prose from the period of 1300-1550. It contains classics such as Beatrijs, Van den vos Reynaerde, the abele spelen, the stories about King Arthur and about Charlemagne, all texts from the famous Gruuthuuse manuscript (including the Egidius song), but also many of the lesser known or less researched texts, such as prose adaptations of the rhyming knight’s tales (the so-called ‘folk books’), collections of songs such as the Antwerp Songbook, various Bible translations, hagiographies, books of prayer, chronicles, and all kinds of religious, didactic and scientific treatises, medical manuals and recipes.
The corpus was compiled on the basis of mainly critical, scientifically sound text editions. In time, it will be annotated with word classes and lemmas, to improve searchability.
17th century: Newspaper Corpus
The Couranten Corpus comprises the seventeenth-century Dutch newspapers available on Delpher (delpher.nl/kranten). The oldest surviving newspapers were published in 1618. For the Delpher-website the Koninklijke Bibliotheek in The Hague has scanned the newspapers. In a citizen science project all newspapers were transcribed and corrected by more than 300 volunteers of the Stichting Vrijwilligersnetwerk Nederlandse Taal, led by Nicoline van der Sijs. Subsequently, metadata were added and checked, for instance on genre (advertisements, national news, international news).
This sizeable corpus currently contains the contents of 13 newspapers, 109.532 articles and 18.926.425 words. The information in these newspapers is of interest to researchers of various disciplines, ranging from historians to historical linguists, literature scholars and art historians.
In the future, transcriptions of newly digitised newspapers from the seventeenth century and newspapers from the eighteenth century will be added to the Couranten Corpus.
This first online accessible version of the Couranten Corpus was released on 12th May 2022.
17th - 19th century: Letters as Loot
Approximately 40,000 Dutch letters from the second half of the 17th to the early 19th century have been gathering dust for centuries in British archives. They were sent home by sailors and others from abroad but also vice versa by those staying behind who needed to keep in touch with their loved ones. Many letters did not reach their destinations: they were taken as loot by privateers and confiscated by the High Court of Admiralty during the wars fought between The Netherlands and England. These confiscated letters of men, women and even children represent priceless material for historical linguists. They allow us to gain access to the as yet mainly unknown everyday Dutch of the past, the colloquial Dutch of people from the middle and lower classes.
The first extensive sociolinguistic analysis of these Dutch letters was conducted in the Letters as Loot research programme (2008-2013) at Leiden University. This research concentrated on a selection of about one thousand Dutch private letters from the late seventeenth and late eighteenth centuries, written by more than 700 different letter writers.
17th - 19th century: Letters as Loot-2
Letters as Loot-2 is a spin-off of the Letters as Loot research programme (2008-2013) at Leiden University. This corpus is an addition to the original Letters as Loot corpus. It comprises more than 1300 Dutch letters which were taken as loot by privateers and confiscated by the High Court of Admiralty during the wars fought between The Netherlands and England from the second half of the 17th to the early 19th centuries.
17th - 19th century: Letters as Loot - Gold Standard
Letters as Loot – Gold Standard contains ca. 1000 source files from the Letters as Loot program (directed by Prof. Dr. M.J. van der Wal), each enriched with main part-of-speech and modern lemmata.
20th century: The VU-DNC Corpus
A diachronic Dutch newspaper corpus (VU Free University Dutch Newspaper Corpus). (More info under Newspaper corpora)
Public Domain Data @ DBNL
A corpus of public domain books and texts available from the Royal Library in the Netherlands
Delpher: historical newspapers, magazines, books and radio bulletins
Delpher is a freely accessible website, developed and operated by the Koninklijke Bibliotheek, featuring digitized historical Dutch newspapers, books, magazines and radio bulletins from libraries, museums and other heritage institutions.
Dutch Renaissance poetry corpus
This corpus contains alexandrines and iambic pentameters written by a selection of Dutch Renaissance poets (end of 16th and 17th century). Its creation and annotation was part of a PhD project at the Meertens Institute (https://www.meertens.knaw.nl) which was funded by the Koninklijke Nederlandse Akademie van Wetenschappen (KNAW).