Tasnim Mohiuddin

ORCID: 0009-0003-0955-7200
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Topic Modeling
  • Natural Language Processing Techniques
  • Speech and dialogue systems
  • Multimodal Machine Learning Applications
  • Speech Recognition and Synthesis
  • Text Readability and Simplification
  • Domain Adaptation and Few-Shot Learning
  • Text and Document Classification Technologies
  • Advanced Text Analysis Techniques
  • Video Analysis and Summarization
  • Generative Adversarial Networks and Image Synthesis
  • Handwritten Text Recognition Techniques

Nanyang Technological University
2018-2021

Han Cheol Moon, Tasnim Mohiuddin, Shafiq Joty, Chi Xu. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint (EMNLP-IJCNLP). 2019.

10.18653/v1/d19-1231 article EN cc-by 2019-01-01

Tasnim Mohiuddin, Shafiq Joty. Proceedings of the 2019 Conference North American Chapter Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2019.

10.18653/v1/n19-1386 article EN 2019-01-01

Most of the successful and predominant methods for Bilingual Lexicon Induction (BLI) are mapping-based, where a linear mapping function is learned with assumption that word embedding spaces different languages exhibit similar geometric structures (i.e. approximately isomorphic). However, several recent studies have criticized this simplified showing it does not hold in general even closely related languages. In work, we propose novel semi-supervised method to learn cross-lingual embeddings...

10.18653/v1/2020.emnlp-main.215 article EN cc-by 2020-01-01

We present Fanar, a platform for Arabic-centric multimodal generative AI systems, that supports language, speech and image generation tasks. At the heart of Fanar are Star Prime, two highly capable Arabic Large Language Models (LLMs) best in class on well established benchmarks similar sized models. is 7B (billion) parameter model was trained from scratch nearly 1 trillion clean deduplicated Arabic, English Code tokens. Prime 9B continually Gemma-2 base same token set. Both models...

10.48550/arxiv.2501.13944 preprint EN arXiv (Cornell University) 2025-01-18

Crosslingual word embeddings learned from monolingual have a crucial role in many downstream tasks, ranging machine translation to transfer learning. Adversarial training has shown impressive success learning crosslingual and the associated task without any parallel data by mapping shared space. However, recent work superior performance for non-adversarial methods more challenging language pairs. In this article, we investigate adversarial autoencoder unsupervised propose two novel...

10.1162/coli_a_00374 article EN cc-by-nc-nd Computational Linguistics 2020-03-23

M Saiful Bari, Tasnim Mohiuddin, Shafiq Joty. Proceedings of the 59th Annual Meeting Association for Computational Linguistics and 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2021.

10.18653/v1/2021.acl-long.154 article EN cc-by 2021-01-01

Neural Machine Translation (NMT) models are typically trained on heterogeneous data that concatenated and randomly shuffled. However, not all of the training equally useful to model. Curriculum aims present NMT in a meaningful order. In this work, we introduce two-stage framework for where fine-tune base model subsets data, selected by both deterministic scoring using pre-trained methods online considers prediction scores emerging Through comprehensive experiments six language pairs...

10.18653/v1/2022.findings-emnlp.113 article EN cc-by 2022-01-01

Participants in an asynchronous conversation (e.g., forum, e-mail) interact with each other at different times, performing certain communicative acts, called speech acts question, request). In this article, we propose a hybrid approach to act recognition conversations. Our works two main steps: long short-term memory recurrent neural network (LSTM-RNN) first encodes sentence separately into task-specific distributed representation, and is then used conditional random field (CRF) model...

10.1162/coli_a_00339 article EN cc-by-nc-nd Computational Linguistics 2018-09-18

The success of Neural Machine Translation (NMT) largely depends on the availability large bitext training corpora.Due to lack such corpora in low-resource language pairs, NMT systems often exhibit poor performance.Extra relevant monolingual data helps, but acquiring it could be quite expensive, especially for languages.Moreover, domain mismatch between (train/test) and might degrade performance.To alleviate issues, we propose AUGVIC, a novel augmentation framework which exploits vicinal...

10.18653/v1/2021.findings-acl.267 article EN cc-by 2021-01-01

Although coherence modeling has come a long way in developing novel models, their evaluation on downstream applications for which they are purportedly developed largely been neglected. With the advancements made by neural approaches such as machine translation (MT), summarization and dialog systems, need of these tasks is now more crucial than ever. However, models typically evaluated only synthetic tasks, may not be representative performance applications. To investigate how use cases, we...

10.18653/v1/2021.eacl-main.308 article EN cc-by 2021-01-01

Transfer learning has yielded state-of-the-art (SoTA) results in many supervised NLP tasks. However, annotated data for every target task language is rare, especially low-resource languages. We propose UXLA, a novel unsupervised augmentation framework zero-resource transfer scenarios. In particular, UXLA aims to solve cross-lingual adaptation problems from source distribution an unknown distribution, assuming no training label the language. At its core, performs simultaneous self-training...

10.48550/arxiv.2004.13240 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Tasnim Mohiuddin, Thanh-Tung Nguyen, Shafiq Joty. Proceedings of the 2019 Conference North American Chapter Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2019.

10.18653/v1/n19-1134 article EN 2019-01-01

Most of the successful and predominant methods for bilingual lexicon induction (BLI) are mapping-based, where a linear mapping function is learned with assumption that word embedding spaces different languages exhibit similar geometric structures (i.e., approximately isomorphic). However, several recent studies have criticized this simplified showing it does not hold in general even closely related languages. In work, we propose novel semi-supervised method to learn cross-lingual embeddings...

10.48550/arxiv.2004.13889 preprint EN public-domain arXiv (Cornell University) 2020-01-01

Recently, neural approaches to coherence modeling have achieved state-of-the-art results in several evaluation tasks. However, we show that most of these models often fail on harder tasks with more realistic application scenarios. In particular, the existing underperform require model be sensitive local contexts such as candidate ranking conversational dialogue and machine translation. this paper, propose a unified incorporates sentence grammar, inter-sentence relations, global patterns into...

10.48550/arxiv.1909.00349 preprint EN cc-by arXiv (Cornell University) 2019-01-01

Recent advancements in speech-language models have yielded significant improvements speech tokenization and synthesis. However, effectively mapping the complex, multidimensional attributes of into discrete tokens remains challenging. This process demands acoustic, semantic, contextual information for precise representations. Existing representations generally fall two categories: acoustic from audio codecs semantic self-supervised learning models. Although recent efforts unified improved...

10.48550/arxiv.2410.15017 preprint EN arXiv (Cornell University) 2024-10-19

This paper presents a comprehensive overview of the first edition Academic Essay Authenticity Challenge, organized as part GenAI Content Detection shared tasks collocated with COLING 2025. challenge focuses on detecting machine-generated vs. human-authored essays for academic purposes. The task is defined follows: "Given an essay, identify whether it generated by machine or authored human.'' involves two languages: English and Arabic. During evaluation phase, 25 teams submitted systems 21...

10.48550/arxiv.2412.18274 preprint EN arXiv (Cornell University) 2024-12-24

The ability to edit images in a realistic and visually appealing manner is fundamental requirement various computer vision applications. In this paper, we present ImEW, unified framework designed for solving image editing tasks. ImEW utilizes off-the-shelf foundation models address four essential tasks: object removal, translation, replacement, generative fill beyond the frame. These tasks are accomplished by leveraging capabilities of state-of-the-art models, namely Segment Anything Model,...

10.1145/3607827.3616840 article EN 2023-10-26

We propose a novel coherence model for written asynchronous conversations (e.g., forums, emails), and show its applications in assessment thread reconstruction tasks. conduct our research two steps. First, we improvements to the recently proposed neural entity grid by lexicalizing transitions. Then, extend incorporating underlying conversational structure representation feature computation. Our achieves state of art results on standard tasks monologue outperforming existing models. also...

10.48550/arxiv.1805.02275 preprint EN other-oa arXiv (Cornell University) 2018-01-01
Coming Soon ...