Jump to content
Main menu
Main menu
move to sidebar
hide
Navigation
Main page
Recent changes
Random page
Help about MediaWiki
Clarin K-Centre
Search
Search
English
Appearance
Log in
Personal tools
Log in
Translations:Language modeling/13/nl: Difference between revisions
Translation unit
Discussion
Nederlands
Read
View source
View history
Tools
Tools
move to sidebar
hide
Actions
Read
View source
View history
General
What links here
Related changes
Special pages
Printable version
Permanent link
Page information
In other languages
Appearance
move to sidebar
hide
Help
From Clarin K-Centre
Newer edit →
Visual
Wikitext
Revision as of 12:45, 24 June 2024
view source
Vincent
(
talk
|
contribs
)
Bureaucrats
,
Administrators
1,473
edits
Created page with "* [https://openai.com/ GPT-3] * [https://huggingface.co/docs/transformers/model_doc/mbart MBart]"
Newer edit →
(No difference)
Revision as of 12:45, 24 June 2024
Information about message (
contribute
)
This message has no documentation. If you know where or how this message is used, you can help other translators by adding documentation to this message.
Message definition (
Language modeling
)
* [https://huggingface.co/CohereLabs/aya-expanse-8b Aya Expanse 8B]: Aya Expanse 8B is an open-weight research release of a model with highly advanced multilingual capabilities. Model Architecture: Aya Expanse 8B is an auto-regressive language model that uses an optimized transformer architecture. Post-training includes supervised finetuning, preference training, and model merging. Decoder-encoder model. Original paper: https://arxiv.org/abs/2412.04261
* [https://huggingface.co/EuropeanParliament/EUBERT EUBERT]: This is a pretrained BERT uncased model that has been trained on a vast corpus of documents registered by the European Publications Office. EUBERT serves as a starting point for building more specific natural language understanding models. Its versatility makes it suitable for a wide range of tasks, including but not limited to: text classification, question answering and language understanding. Model architecture: BERT (Bidirectional Encoder Representations from Transformers)
* [https://huggingface.co/utter-project/EuroLLM-1.7B EuroLLM-1.7B]: The EuroLLM project has the goal of creating a suite of LLMs capable of understanding and generating text in all European Union languages as well as some additional relevant languages. EuroLLM-1.7B is a 1.7B parameter model trained on 4 trillion tokens divided across the considered languages and several data sources: Web data, parallel data (en-xx and xx-en), and high-quality datasets. EuroLLM-1.7B-Instruct was further instruction tuned on EuroBlocks, an instruction tuning dataset with focus on general instruction-following and machine translation. Model type: A 1.7B parameter multilingual transfomer LLM. Original paper: https://arxiv.org/abs/2409.16235
* [https://huggingface.co/utter-project/EuroLLM-9B EuroLLM-9B]: The EuroLLM project has the goal of creating a suite of LLMs capable of understanding and generating text in all European Union languages as well as some additional relevant languages. EuroLLM-9B is a 9B parameter model trained on 4 trillion tokens divided across the considered languages and several data sources: Web data, parallel data (en-xx and xx-en), and high-quality datasets.EuroLLM-9B-Instruct was further instruction tuned on EuroBlocks, an instruction tuning dataset with focus on general instruction-following and machine translation. Model type: A 9B parameter multilingual transformer LLM. Original paper: https://arxiv.org/abs/2409.16235
* [https://huggingface.co/BSC-LT/salamandra-7b-instruct Salamandra-7B-instruct]: Salamandra is a highly multilingual model pre-trained from scratch that comes in three different sizes — 2B, 7B and 40B parameters — with their respective base and instruction-tuned variants. This model corresponds to the 7B instructed version. Model type: transformer-based decoder-only language model that has been pre-trained from scratch on 12.875 trillion tokens of highly curated data. The pre-training corpus contains text in 35 European languages and code. Original paper: https://arxiv.org/abs/2502.08489
* [https://github.com/tiiuae/falcon-h1 Falcon H1]: Falcon-H1 is the latest evolution in the Falcon family of large language models and is built upon an advanced hybrid architecture—where each block integrates both State Space Models (SSMs) and Attention Mechanisms. Falcon-H1 was initially trained with support for 18 core languages, including Dutch, with scalability to 100+ languages, achieving state-of-the-art multilingual and reasoning performances in instruction following, maths, coding, and multilingual tasks. Original paper: https://arxiv.org/abs/2507.22448
* [https://neo-babel.github.io/ Neo Babel] This is a novel multilingual image generation framework. The model is trained using a combination of large-scale multilingual pretraining and high-resolution instruction tuning. Original paper: https://arxiv.org/abs/2507.06137v1
* [https://huggingface.co/lightonai/alfred-40b-1023 Alfred-40b-1023]: Alfred-40B-1023 can be used as a chat model or as an instruct model. It has limited capacities in Dutch. Model type: Causal decoder-only.
* [https://huggingface.co/docs/transformers/model_doc/mbart MBart]: Multilingual Denoising Pre-training for Neural Machine Translation
* [https://huggingface.co/docs/transformers/v4.14.1/model_doc/mt5 mT5:] mT5: A massively multilingual pre-trained text-to-text transformer
GPT-3
MBart