- Natural Language Processing Techniques
- Topic Modeling
- Speech Recognition and Synthesis
- Speech and dialogue systems
- Text Readability and Simplification
- Multimodal Machine Learning Applications
- Semantic Web and Ontologies
- Music and Audio Processing
- Speech and Audio Processing
- ICT in Developing Communities
- Authorship Attribution and Profiling
- Mobile Crowdsensing and Crowdsourcing
- Machine Learning in Bioinformatics
- Neural Networks and Applications
- Translation Studies and Practices
- Phonetics and Phonology Research
- Algorithms and Data Compression
- Text and Document Classification Technologies
- Linguistic Variation and Morphology
- Biomedical Text Mining and Ontologies
- Language, Linguistics, Cultural Analysis
- Healthcare Systems and Practices
- Names, Identity, and Discrimination Research
- Multilingual Education and Policy
- Domain Adaptation and Few-Shot Learning
Administration for Community Living
2023
Tokyo Institute of Technology
2023
Laboratoire d'Informatique de Grenoble
2013-2023
IT University of Copenhagen
2023
American Jewish Committee
2023
RIKEN Center for Advanced Intelligence Project
2023
Mongolia International University
2023
Naver (South Korea)
2019-2022
GIPSA-Lab
2015-2021
Université Grenoble Alpes
2011-2021
We describe a new challenge aimed at discovering subword and word units from raw speech. This is the followup to Zero Resource Speech Challenge 2015. It aims constructing systems that generalize across languages adapt speakers. The design features evaluation metrics of are presented results seventeen models discussed.
établissements d'enseignement et de recherche français ou étrangers, des laboratoires publics privés.
We investigate end-to-end speech-to-text translation on a corpus of audiobooks specifically augmented for this task. Previous works investigated the extreme case where source language transcription is not available during learning nor decoding, but we also study midway at training time only. In case, single model trained to decode speech into target text in pass. Experimental results show that it possible train compact and efficient models setup. distribute hope our baseline will be...
Language models have become a key step to achieve state-of-the art results in many different Natural Processing (NLP) tasks. Leveraging the huge amount of unlabeled texts nowadays available, they provide an efficient way pre-train continuous word representations that can be fine-tuned for downstream task, along with their contextualization at sentence level. This has been widely demonstrated English using contextualized (Dai and Le, 2015; Peters et al., 2018; Howard Ruder, Radford Devlin...
Today, the growth of aging population in Europe needs an increasing number health care professionals and facilities for aged persons. Medical telemonitoring at home (and, more generally, telemedicine) improves patient's comfort reduces hospitalization costs. Using sound surveillance as alternative solution to video telemonitoring, this paper deals with detection classification alarming sounds a noisy environment. The proposed analysis system can detect distress or everyday everywhere...
The project Breaking the Unwritten Language Barrier (BULB), which brings together linguists and computer scientists, aims at supporting in documenting unwritten languages. In order to achieve this we develop tools tailored needs of documentary by building upon technology expertise from area natural language processing, most prominently automatic speech recognition machine translation. As a development test bed for have chosen three less-resourced African languages Bantu family: Basaa, Myene...
Simultaneous machine translation consists in starting output generation before the entire input sequence is available.Wait-k decoders offer a simple but efficient approach for this problem.They first read k source tokens, after which they alternate between producing target token and reading another token.We investigate behavior of wait-k decoding low resource settings spoken corpora using IWSLT datasets.We improve training these models unidirectional encoders, across multiple values...
Adapter modules were recently introduced as an efficient alternative to fine-tuning in NLP. tuning consists freezing pretrained parameters of a model and injecting lightweight between layers, resulting the addition only small number task-specific trainable parameters. While adapter was investigated for multilingual neural machine translation, this paper proposes comprehensive analysis adapters speech translation (ST). Starting from different pre-trained models (a ST trained on parallel data...
This paper presents our work in automatic speech recognition (ASR) the context of under-resourced languages with application to Vietnamese. Different techniques for bootstrapping acoustic models are presented. First, we present use acoustic-phonetic unit distances and potential crosslingual modeling languages. Experimental results on Vietnamese showed that only a few hours target language data, independent worked better than dependent modeling. However, it was outperformed by latter one,...
Most speech and language technologies are trained with massive amounts of text information. However, most the world languages do not have such resources or stable orthography. Systems constructed under these almost zero resource conditions only promising for technology but also computational documentation. The goal documentation is to help field linguists (semi-)automatically analyze annotate audio recordings endangered unwritten languages. Example tasks automatic phoneme discovery lexicon...
This paper reports on our ongoing efforts to collect speech data in under-resourced or endangered languages of Africa. Data collection is carried out using an improved version the Android application Aikuma developed by Steven Bird and colleagues 1. Features were added app order facilitate parallel line with requirements French-German ANR/DFG BULB (Breaking Unwritten Language Barrier) project. The resulting app, called Lig-Aikuma, runs various mobile phones tablets proposes a range different...
This paper addresses the problem of automatic detection and recognition impulsive sounds, such as glass breaks, human screams, gunshots, explosions or door slams. A complete system is described evaluated on a sound database containing more than 800 signals distributed among six different classes. Emphasis set robust techniques, allowing use this in noisy environment. The algorithm, based median filter, features highly performance even under important background noise conditions. In stage,...
Jérémy Ferrero, Laurent Besacier, Didier Schwab, Frédéric Agnès. Proceedings of the 15th Conference European Chapter Association for Computational Linguistics: Volume 2, Short Papers. 2017.
Self-supervised learning from raw speech has been proven beneficial to improve automatic recognition (ASR). We investigate here its impact on end-to-end translation (AST) performance. use a contrastive predic-tive coding (CPC) model pre-trained unlabeled as feature extractor for downstream AST task. show that self-supervised pre-training is particularly efficient in low resource settings and fine-tuning CPC models the training data further improves Even higher settings, ensembling trained...
Arabic has a large number of affixes that can modify stem to form words. In automatic speech recognition (ASR) this leads high out-of-vocabulary (OOV) rate for typical lexicon size, and hence potential increase in WER. This is even more pronounced dialects where additional are often introduced the available data typically sparse. To address problem we introduce simple word decomposition algorithm which only requires text corpus predefined list affixes. Using create Iraqi ASR results about...
We summarize the accomplishments of a multi-disciplinary workshop exploring computational and scientific issues surrounding discovery linguistic units (subwords words) in language without orthography. study replacement orthographic transcriptions by images and/or translated text well-resourced to help unsupervised from raw speech.
We consider the problem of multilingual unsupervised machine translation, translating to and from languages that only have monolingual data by using auxiliary parallel language pairs. For this standard procedure so far leverage is _back-translation_, which computationally costly hard tune. In paper we propose instead use _denoising adapters_, adapter layers with a denoising objective, on top pre-trained mBART-50. addition modularity flexibility such an approach show resulting translations...
Alireza Mohammadshahi, Vassilina Nikoulina, Alexandre Berard, Caroline Brun, James Henderson, Laurent Besacier. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. 2022.
This paper describes our Word-level QE system for WMT 2014 shared task on Spanish -English pair.Compared to 2013, this year's is different due the lack of SMT setting information and additional resources.We report how we overcome challenge retain most important features which performed well last year in system.Novel related availability multiple systems output (new point year) are also proposed experimented along with baseline set.The optimized by several ways: tuning classification...
We investigate the behaviour of attention in neural models visually grounded speech trained on two languages: English and Japanese. Experimental results show that focuses nouns this holds true for very typologically different languages. also draw parallels between artificial human word endings as it has been theorised attention. Finally, we how monolingual can be used to perform cross-lingual speech-to-speech retrieval. For both languages, enriched bilingual (speech-image) corpora with...