Jump to content

Speech recognition: Difference between revisions

From Clarin K-Centre
Marked this version for translation
Marked this version for translation
 
(8 intermediate revisions by 2 users not shown)
Line 2: Line 2:
<translate>
<translate>


== BAS Web Services== <!--T:10-->
<!--T:17-->
This page contains information on Dutch speech recognition systems.
 
==Online services== <!--T:18-->
 
=== BAS Web Services=== <!--T:10-->


<!--T:11-->
<!--T:11-->
Line 13: Line 18:


<!--T:12-->
<!--T:12-->
*[https://clarin.phonetik.uni-muenchen.de/BASWebServices/interface Webinterface]
*[https://clarin.phonetik.uni-muenchen.de/BASWebServices/interface Webinterface] (requires CLARIN login)
 
===Digital Europe Speech-to-Text=== <!--T:25-->
 
<!--T:26-->
Speech recognition built by the European Commission. Requires an EU login.
 
<!--T:27-->
*[https://language-tools.ec.europa.eu/SpeechServices/Transcription Website]
 
===LaMachine webservices=== <!--T:1-->
 
<!--T:28-->
LaMachine is end-of-life and being deprecated. See [https://github.com/proycon/LaMachine/issues/214 this post] for reasons and alternative solutions.


===Speech Recognition for Belgian Dutch: NeLF=== <!--T:2-->


<!--T:1-->
<!--T:13-->
==LaMachine webservices==
API and browser access to a state-of-the-art speech recognition system for Belgian Dutch, including dialect speech recognition, developed by KU Leuven and UGent.
There are several speech recognition [https://webservices.cls.ru.nl/ web services] at Radboud University


<!--T:2-->
<!--T:14-->
==Speech Recognition for Belgian Dutch==
Requires a login which can be requested, but you have to await manual approval.
Since April 2022, there is a new ASR engine available, specifically suited for speech recognition for Belgian Dutch. It is running at KU Leuven.


<!--T:3-->
<!--T:15-->
*[https://www.spraak.org/webservice/dutch_asr/ Online webservice]
[https://www.nelfproject.be/web_service.php NeLF Website]
*[https://clinjournal.org/clinj/article/view/119 Scientific publication about speech recognition engine]


<!--T:4-->
<!--T:16-->
==HENSOLDT ANALYTICS Speech-to-text for Dutch==
===HENSOLDT ANALYTICS Speech-to-text for Dutch (demo)===
The [https://european-language-grid.eu European Language Grid] hosts this speech recognition service with demo at
The [https://european-language-grid.eu European Language Grid] hosts this speech recognition service with demo at
[https://live.european-language-grid.eu/catalogue/tool-service/20900 https://live.european-language-grid.eu/catalogue/tool-service/20900]
[https://live.european-language-grid.eu/catalogue/tool-service/23090/try%20out/ https://live.european-language-grid.eu/catalogue/tool-service/23090/try%20out/]
 
===Microsoft Transcriber=== <!--T:9-->
 
<!--T:19-->
* in Word 365
*[https://support.microsoft.com/nl-nl/office/uw-opnamen-transcriberen-7fc2efec-245e-45f0-b053-2a97531ecf57 Website in Dutch]


<!--T:5-->
==To install== <!--T:20-->
==Punctuation Insertion==
 
AS ASR output often consists of streams of words, you may want to automatically insert punctuation.
===noScribe=== <!--T:21-->


<!--T:6-->
<!--T:22-->
*[https://huggingface.co/oliverguhr/fullstop-dutch-sonar-punctuation-prediction?text=hervatting+van+de+zitting+ik+verklaar+de+zitting+van+het+europees+parlement+die+op+vrijdag+17+december+werd+onderbroken+te+zijn+hervat HuggingFace model]
*AI-based software that transcribes interviews for qualitative social research or journalistic use
*[https://github.com/VincentCCL/Segment_FullStop/blob/main/Segment_FullStop.py Python script that accepts txt file as input and returns punctuated txt as output]
*free and open source (GPL-3.0)
*runs completely local on your computer
* can distinguish different speakers and understands around 60 languages
* includes a nice editor to review, verify and correct the resulting transcript
* standing on the shoulders of giants: Whisper from OpenAI, faster-whisper by Guillaume Klein and pyannote from Hervé Bredin
* [https://github.com/kaixxx/noScribe Github page]


<!--T:7-->
<!--T:23-->
==Whisper model from OpenAI==
===Whisper model from OpenAI===
ASR for multiple languages, including Dutch is available from Whisper. Full model download is possible.
ASR for multiple languages, including Dutch is available from Whisper. Full model download is possible.


Line 50: Line 77:
*[https://www.youtube.com/watch?v=ABFqbY_rmEk YouTube video] explaining how to install whisper on your windows machine
*[https://www.youtube.com/watch?v=ABFqbY_rmEk YouTube video] explaining how to install whisper on your windows machine


<!--T:9-->
<!--T:24-->
==Microsoft Transcriber==
==Leaderboard==
*[https://support.microsoft.com/nl-nl/office/uw-opnamen-transcriberen-7fc2efec-245e-45f0-b053-2a97531ecf57 Website in Dutch]
* [https://opensource-spraakherkenning-nl.github.io/ASR_NL_results/UT/N-Best/nbest_res.html Website]
 
<!--T:5-->
==Punctuation Insertion==
AS ASR output often consists of streams of words, you may want to automatically insert punctuation.
 
<!--T:6-->
*[https://huggingface.co/oliverguhr/fullstop-dutch-sonar-punctuation-prediction?text=hervatting+van+de+zitting+ik+verklaar+de+zitting+van+het+europees+parlement+die+op+vrijdag+17+december+werd+onderbroken+te+zijn+hervat HuggingFace model]
*[https://github.com/VincentCCL/Segment_FullStop/blob/main/Segment_FullStop.py Python script that accepts txt file as input and returns punctuated txt as output]
 
</translate>
</translate>

Latest revision as of 18:05, 13 November 2025

This page contains information on Dutch speech recognition systems.

Online services

BAS Web Services

The BAS Web Services are a rich set of tools for speech sciences and technology. Tools include:

  • Automated speech recognition, including several models for Dutch
  • Anonymizer
  • Audio segmentation tool on the basis of transcripts
  • Speaker diarisation
  • Voice activity detection

Digital Europe Speech-to-Text

Speech recognition built by the European Commission. Requires an EU login.

LaMachine webservices

LaMachine is end-of-life and being deprecated. See this post for reasons and alternative solutions.

Speech Recognition for Belgian Dutch: NeLF

API and browser access to a state-of-the-art speech recognition system for Belgian Dutch, including dialect speech recognition, developed by KU Leuven and UGent.

Requires a login which can be requested, but you have to await manual approval.

NeLF Website

HENSOLDT ANALYTICS Speech-to-text for Dutch (demo)

The European Language Grid hosts this speech recognition service with demo at https://live.european-language-grid.eu/catalogue/tool-service/23090/try%20out/

Microsoft Transcriber

To install

noScribe

  • AI-based software that transcribes interviews for qualitative social research or journalistic use
  • free and open source (GPL-3.0)
  • runs completely local on your computer
  • can distinguish different speakers and understands around 60 languages
  • includes a nice editor to review, verify and correct the resulting transcript
  • standing on the shoulders of giants: Whisper from OpenAI, faster-whisper by Guillaume Klein and pyannote from Hervé Bredin
  • Github page

Whisper model from OpenAI

ASR for multiple languages, including Dutch is available from Whisper. Full model download is possible.

Leaderboard

Punctuation Insertion

AS ASR output often consists of streams of words, you may want to automatically insert punctuation.