NFDI4DS | UHH-SEMS - Publication Details

Wietse de Vries

ORCID: 0000-0002-0433-090X

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5051972245

Research Areas

Natural Language Processing Techniques
Topic Modeling
Phonetics and Phonology Research
Speech Recognition and Synthesis
Text Readability and Simplification
Multimodal Machine Learning Applications
Speech and Audio Processing
Speech and dialogue systems
Sentiment Analysis and Opinion Mining
Authorship Attribution and Profiling
Education and Teacher Training
Ethics and bioethics in healthcare
Higher Education Teaching and Evaluation
Hate Speech and Cyberbullying Detection
Ultrasonics and Acoustic Wave Propagation
Computational and Text Analysis Methods
Music and Audio Processing
Comparative International Legal Studies
Children's Physical and Motor Development
Neurobiology of Language and Bilingualism
Educational Practices and Policies
Ergonomics and Musculoskeletal Disorders
Early Childhood Education and Development
Domain Adaptation and Few-Shot Learning
Misinformation and Its Impacts

University of Groningen
2020-2025

Association for Computational Linguistics
2024

BERTje: A Dutch BERT Model

OPENALEX - Publications

Wietse de Vries Andreas van Cranenburgh Arianna Bisazza Tommaso Caselli Gertjan van Noord and 1 more

The transformer-based pre-trained language model BERT has helped to improve state-of-the-art performance on many natural processing (NLP) tasks. Using the same architecture and parameters, we developed evaluated a monolingual Dutch called BERTje. Compared multilingual model, which includes but is only based Wikipedia text, BERTje large diverse dataset of 2.4 billion tokens. consistently outperforms equally-sized downstream NLP tasks (part-of-speech tagging, named-entity recognition, semantic...

10.48550/arxiv.1912.09582 preprint EN cc-by arXiv (Cornell University) 2019-01-01

What’s so special about BERT’s layers? A closer look at the NLP pipeline in monolingual and multilingual models

OPENALEX - Publications

Wietse de Vries Andreas van Cranenburgh Malvina Nissim

Peeking into the inner workings of BERT has shown that its layers resemble classical NLP pipeline, with progressively more complex tasks being concentrated in later layers. To investigate to what extent these results also hold for a language other than English, we probe Dutch BERT-based model and multilingual tasks. In addition, through deeper analysis part-of-speech tagging, show within given task, information is spread over different parts network pipeline might not be as neat it seems....

10.18653/v1/2020.findings-emnlp.389 preprint EN cc-by 2020-01-01

Make the Best of Cross-lingual Transfer: Evidence from POS Tagging with over 100 Languages

OPENALEX - Publications

Wietse de Vries Martijn Wieling Malvina Nissim

Cross-lingual transfer learning with large multilingual pre-trained models can be an effective approach for low-resource languages no labeled training data. Existing evaluations of zero-shot cross-lingual generalisability use datasets English data, and test data in a selection target languages. We explore more extensive setup 65 different source 105 part-of-speech tagging. Through our analysis, we show that pre-training both language, as well matching language families, writing systems, word...

10.18653/v1/2022.acl-long.529 article EN cc-by Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2022-01-01

Neural representations for modeling variation in speech

OPENALEX - Publications

Martijn Bartelds Wietse de Vries Faraz Sanal Caitlin Richter Mark Liberman and 1 more

Variation in speech is often quantified by comparing phonetic transcriptions of the same utterance. However, manually transcribing time-consuming and error prone. As an alternative, therefore, we investigate extraction acoustic embeddings from several self-supervised neural models. We use these representations to compute word-based pronunciation differences between non-native native speakers English, Norwegian dialect speakers. For comparison with earlier studies, evaluate how well match...

10.1016/j.wocn.2022.101137 article EN cc-by Journal of Phonetics 2022-03-05

Evaluating Standard and Dialectal Frisian ASR: Multilingual Fine-tuning and Language Identification for Improved Low-resource Performance

OPENALEX - Publications

Reihaneh Amooie Wietse de Vries Yun Hao Jelske Dijkstra Matt Coler and 1 more

Automatic Speech Recognition (ASR) performance for low-resource languages is still far behind that of higher-resource such as English, due to a lack sufficient labeled data. State-of-the-art methods deploy self-supervised transfer learning where model pre-trained on large amounts data fine-tuned using little in target language. In this paper, we present and examine method fine-tuning an SSL-based order improve the Frisian its regional dialects (Clay Frisian, Wood South Frisian). We show ASR...

10.48550/arxiv.2502.04883 preprint EN arXiv (Cornell University) 2025-02-07

Enhancing Standard and Dialectal Frisian ASR: Multilingual Fine-tuning and Language Identification for Improved Low-resource Performance

OPENALEX - Publications

Reihaneh Amooie Wietse de Vries Yun Hao Jelske Dijkstra Matt Coler and 1 more

10.1109/icassp49660.2025.10889692 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2025-03-12

As Good as New. How to Successfully Recycle English GPT-2 to Make Models for Other Languages

OPENALEX - Publications

Wietse de Vries Malvina Nissim

Large generative language models have been very successful for English, but other languages lag behind, in part due to data and computational limitations.We propose a method that may overcome these problems by adapting existing pre-trained new languages.Specifically, we describe the adaptation of English GPT-2 Italian Dutch retraining lexical embeddings without tuning Transformer layers.As result, obtain are aligned with original embeddings.Additionally, scale up complexity transforming...

10.18653/v1/2021.findings-acl.74 preprint EN cc-by 2021-01-01

Adapting Monolingual Models: Data can be Scarce when Language Similarity is High

OPENALEX - Publications

Wietse de Vries Martijn Bartelds Malvina Nissim Martijn Wieling

For many (minority) languages, the resources needed to train large models are not available. We investigate performance of zero-shot transfer learning with as little data possible, and influence language similarity in this process. retrain lexical layers four BERT-based using from two low-resource target varieties, while Transformer independently fine-tuned on a POS-tagging task model's source language. By combining new layers, we achieve high for both languages. With similarity, 10MB...

10.18653/v1/2021.findings-acl.433 preprint EN cc-by 2021-01-01

A Multilingual Approach to Identify and Classify Exceptional Measures against COVID-19

OPENALEX - Publications

Georgios Tziafas Eugénie de Saint-Phalle Wietse de Vries Clara Egger Tommaso Caselli

The COVID-19 pandemic has witnessed the implementations of exceptional measures by governments across world to counteract its impact. This work presents initial results an on-going project, EXCEPTIUS, aiming automatically identify, classify and com- pare against 32 countries in Europe. To this goal, we created a corpus legal documents with sentence-level annotations eight different classes that are im- plemented these countries. We evalu- ated multiple multi-label classifiers on manu- ally...

10.18653/v1/2021.nllp-1.5 article EN cc-by 2021-01-01

Extracting and classifying exceptional COVID‐19 measures from multilingual legal texts: The merits and limitations of automated approaches

OPENALEX - Publications

Clara Egger Tommaso Caselli Georgios Tziafas Eugénie de Saint-Phalle Wietse de Vries

Abstract This paper contributes to ongoing scholarly debates on the merits and limitations of computational legal text analysis by reflecting results a research project documenting exceptional COVID‐19 management measures in Europe. The variety adopted countries characterized different systems natural languages, as well rapid evolution such measures, pose considerable challenges manual textual methods traditionally used social sciences. To address these challenges, we develop supervised...

10.1111/rego.12557 article EN cc-by-nc Regulation & Governance 2023-10-02

DUMB: A Dutch Model Benchmark

OPENALEX - Publications

Wietse de Vries Martijn Wieling Malvina Nissim

We introduce the Dutch Model Benchmark: DUMB. The benchmark includes a diverse set of datasets for low-, medium- and high-resource tasks. total nine tasks four that were previously not available in Dutch. Instead relying on mean score across tasks, we propose Relative Error Reduction (RER), which compares DUMB performance language models to strong baseline can be referred future even when assessing different sets models. Through comparison 14 pre-trained (mono- multi-lingual, varying sizes),...

10.18653/v1/2023.emnlp-main.447 article EN cc-by Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2023-01-01

Exploring Self-Supervised Speech Representations for Cross-lingual Acoustic-to-Articulatory Inversion

OPENALEX - Publications

Yun Hao Reihaneh Amooie Wietse de Vries Thomas Tienkamp Rik van Noord and 1 more

Acoustic-to-articulatory inversion (AAI) is the process of inferring vocal tract movements from acoustic speech signals. Despite its diverse potential applications, AAI research in languages other than English scarce due to challenges collecting articulatory data. In recent years, self-supervised learning (SSL) based representations have shown great for addressing low-resource tasks. We utilize wav2vec 2.0 and data training systems investigates their effectiveness a different language:...

10.21437/interspeech.2024-1740 article EN Interspeech 2022 2024-09-01

DUMB: A Benchmark for Smart Evaluation of Dutch Models

OPENALEX - Publications

Wietse de Vries Martijn Wieling Malvina Nissim

10.48550/arxiv.2305.13026 preprint EN cc-by-sa arXiv (Cornell University) 2023-01-01

Coming Soon ...