Hanan Aldarmaki

ORCID: 0000-0003-1706-1777
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Natural Language Processing Techniques
  • Speech Recognition and Synthesis
  • Topic Modeling
  • Speech and Audio Processing
  • Speech and dialogue systems
  • Sentiment Analysis and Opinion Mining
  • Music and Audio Processing
  • Text Readability and Simplification
  • Infant Health and Development
  • Semantic Web and Ontologies
  • Network Packet Processing and Optimization
  • Advanced Chemical Sensor Technologies
  • Multimodal Machine Learning Applications
  • Authorship Attribution and Profiling
  • Advanced Text Analysis Techniques
  • Library Science and Information Systems
  • Mathematics, Computing, and Information Processing
  • Imbalanced Data Classification Techniques
  • Infrastructure Maintenance and Monitoring
  • Seismology and Earthquake Studies
  • Neural Networks and Applications
  • Text and Document Classification Technologies
  • Machine Learning and Data Classification
  • Biomedical Text Mining and Ontologies

Mohamed bin Zayed University of Artificial Intelligence
2023-2025

IT University of Copenhagen
2023

Tokyo Institute of Technology
2023

Administration for Community Living
2023

American Jewish Committee
2023

United Arab Emirates University
2019-2022

George Washington University
2015-2019

Software (Spain)
2019

Hanan Aldarmaki, Mona Diab. Proceedings of the 2019 Conference North American Chapter Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2019.

10.18653/v1/n19-1391 article EN 2019-01-01

10.1109/icassp49660.2025.10890051 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2025-03-12

With the growing influence of Large Language Models (LLMs), there is increasing interest in integrating speech representations with them to enable more seamless multi-modal processing and understanding. This study introduces a novel approach that leverages self-supervised combination instruction-tuned LLMs for speech-to-text translation. The proposed modality adapter align extracted features using English-language data. Our experiments demonstrate this method effectively preserves semantic...

10.48550/arxiv.2502.09284 preprint EN arXiv (Cornell University) 2025-02-13

Most existing methods for automatic bilingual dictionary induction rely on prior alignments between the source and target languages, such as parallel corpora or seed dictionaries. For many language pairs, supervised are not readily available. We propose an unsupervised approach learning a pair of languages given their independently-learned monolingual word embeddings. The proposed method exploits local global structures in vector spaces to align them that similar words mapped each other....

10.1162/tacl_a_00014 article EN cc-by Transactions of the Association for Computational Linguistics 2018-12-01

Nada Almarwani, Hanan Aldarmaki, Mona Diab. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint (EMNLP-IJCNLP). 2019.

10.18653/v1/d19-1380 article EN cc-by 2019-01-01

Text word embeddings that encode distributional semantics work by modeling contextual similarities of frequently occurring words. Acoustic embeddings, on the other hand, typically low-level phonetic similarities. Semantic for spoken words have been previously explored using analogous algorithms to Word2Vec, but resulting vectors still mainly encoded rather than semantic features. In this paper, we examine assumptions and architectures used in previous works show experimentally how shallow...

10.21437/interspeech.2024-2181 article EN Interspeech 2022 2024-09-01

We evaluated various compositional models, from bag-of-words representations to RNN-based on several extrinsic supervised and unsupervised evaluation benchmarks. Our results confirm that weighted vector averaging can outperform context-sensitive models in most benchmarks, but structural features encoded RNN also be useful certain classification tasks. analyzed some of the datasets identify aspects meaning they measure characteristics explain their performance variance.

10.48550/arxiv.1806.04713 preprint EN other-oa arXiv (Cornell University) 2018-01-01

We present a new and improved part of speech tagger for Arabic text that incorporates set novel features constraints.This framework is presented within the MADAMIRA software suite, state-of-the-art toolkit language processing.Starting from linear SVM model with basic lexical features, we add range derived morphological analysis clustering methods.We show using these significantly improves part-of-speech tagging accuracy, especially unseen words, which results in better generalization across...

10.18653/v1/w15-3222 article EN cc-by 2015-01-01

10.21437/interspeech.2023-2344 article EN Interspeech 2022 2023-08-14

Lexical ambiguity, a challenging phenomenon in all natural languages, is particularly prevalent for languages with diacritics that tend to be omitted writing, such as Arabic. Omitting leads an increase the number of homographs: different words same spelling. Diacritic restoration could theoretically help disambiguate these words, but practice, overall sparsity performance degradation NLP applications. In this paper, we propose approaches automatically marking subset diacritic restoration,...

10.18653/v1/w19-4606 preprint EN 2019-01-01

We present a matrix factorization model for learning cross-lingual representations sentences. Using sentence-aligned corpora, the proposed learns distributed by factoring given data into language-dependent factors and one shared factor. As result, input sentences from both languages can be mapped fixed-length vectors then compared directly using cosine similarity measure, which achieves 0.8 Pearson correlation on Spanish-English semantic textual similarity.

10.18653/v1/s16-1101 article EN Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022) 2016-01-01

Recently, large pre-trained multilingual speech models have shown potential in scaling Automatic Speech Recognition (ASR) to many low-resource languages. Some of these employ language adapters their formulation, which helps improve monolingual performance and avoids some the drawbacks multi-lingual modeling on resource-rich However, this formulation restricts usability code-switched speech, where two languages are mixed together same utterance. In work, we propose ways effectively fine-tune...

10.48550/arxiv.2310.07423 preprint EN cc-by-nc-sa arXiv (Cornell University) 2023-01-01

We develop and investigate several cross-lingual alignment approaches for neural sentence embedding models, such as the supervised inference classifier, InferSent, sequential encoder-decoder models. evaluate three frameworks applied to these models: joint modeling, representation transfer learning, mapping, using parallel text guide alignment. Our results support a scalable approach modular of embeddings, where we observe better performance compared models in intrinsic extrinsic evaluations,...

10.18653/v1/s19-1006 article EN cc-by 2019-01-01

We present a matrix factorization model for learning cross-lingual representations.Using sentence-aligned corpora, the proposed learns distributed representations by factoring given data into language-dependent factors and one shared factor.Moreover, can quickly learn more than two languages without undermining quality of monolingual components.The achieves an accuracy 88% on English to German document classification, 0.8 Pearson correlation Spanish-English semantic textual similarity.While...

10.18653/v1/w16-1201 article EN cc-by 2016-01-01

This paper introduces Mixat: a dataset of Emirati speech code-mixed with English. Mixat was developed to address the shortcomings current recognition resources when applied speech, and in particular, bilignual speakers who often mix switch between their local dialect The data set consists 15 hours derived from two public podcasts featuring native speakers, one which is form conversations host guest. Therefore, collection contains examples Emirati-English code-switching both formal natural...

10.48550/arxiv.2405.02578 preprint EN arXiv (Cornell University) 2024-05-04

Neural multi-channel speech enhancement models, in particular those based on the U-Net architecture, demonstrate promising performance and generalization potential. These models typically encode input channels independently, integrate during later stages of network. In this paper, we propose a novel modification these by incorporating relative information from outset, where each channel is processed conjunction with reference through stacking. This strategy exploits comparative differences...

10.48550/arxiv.2410.05019 preprint EN arXiv (Cornell University) 2024-10-07

Speech recognition and speech synthesis models are typically trained separately, each with its own set of learning objectives, training data, model parameters, resulting in two distinct large networks. We propose a parameter-efficient approach to ASR TTS jointly via multi-task objective shared parameters. Our evaluation demonstrates that the performance our is comparable individually while significantly saving computational memory costs ($\sim$50\% reduction total number parameters required...

10.48550/arxiv.2410.18607 preprint EN arXiv (Cornell University) 2024-10-24

Developing robust automatic speech recognition (ASR) systems for Arabic, a language characterized by its rich dialectal diversity and often considered low-resource in technology, demands effective strategies to manage complexity. This study explores three critical factors influencing ASR performance: the role of coverage pre-training, effectiveness dialect-specific fine-tuning compared multi-dialectal approach, ability generalize unseen dialects. Through extensive experiments across...

10.48550/arxiv.2411.05872 preprint EN arXiv (Cornell University) 2024-11-07
Coming Soon ...