Khyati Mahajan

ORCID: 0000-0002-6233-2583
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Topic Modeling
  • Natural Language Processing Techniques
  • Sentiment Analysis and Opinion Mining
  • Text Readability and Simplification
  • Speech and dialogue systems
  • Misinformation and Its Impacts
  • Digital Communication and Language
  • Advanced Text Analysis Techniques
  • Translation Studies and Practices
  • Antiplatelet Therapy and Cardiovascular Diseases
  • Mental Health via Writing
  • Education and Critical Thinking Development
  • Mental Health Research Topics
  • Educational Assessment and Pedagogy
  • Fuzzy Logic and Control Systems
  • Hate Speech and Cyberbullying Detection
  • Intelligent Tutoring Systems and Adaptive Learning
  • Cell Adhesion Molecules Research
  • Digital Mental Health Interventions
  • Target Tracking and Data Fusion in Sensor Networks
  • Complex Network Analysis Techniques
  • Orthopaedic implants and arthroplasty
  • Neural Networks and Applications
  • Service-Oriented Architecture and Web Services
  • Discourse Analysis in Language Studies

University of North Carolina at Charlotte
2019-2023

Dhirubhai Ambani Institute of Information and Communication Technology
2018

Sebastian Gehrmann, Tosin Adewumi, Karmanya Aggarwal, Pawan Sasanka Ammanamanchi, Anuoluwapo Aremu, Antoine Bosselut, Khyathi Raghavi Chandu, Miruna-Adriana Clinciu, Dipanjan Das, Kaustubh Dhole, Wanyu Du, Esin Durmus, Ondřej Dušek, Chris Chinenye Emezue, Varun Gangal, Cristina Garbacea, Tatsunori Hashimoto, Yufang Hou, Yacine Jernite, Harsh Jhamtani, Yangfeng Ji, Shailza Jolly, Mihir Kale, Dhruv Kumar, Faisal Ladhak, Aman Madaan, Mounica Maddela, Khyati Mahajan, Saad Mahamood, Bodhisattwa...

10.18653/v1/2021.gem-1.10 preprint ID cc-by 2021-01-01

We introduce GEM, a living benchmark for natural language Generation (NLG), its Evaluation, and Metrics. Measuring progress in NLG relies on constantly evolving ecosystem of automated metrics, datasets, human evaluation standards. Due to this moving target, new models often still evaluate divergent anglo-centric corpora with well-established, but flawed, metrics. This disconnect makes it challenging identify the limitations current opportunities progress. Addressing limitation, GEM provides...

10.48550/arxiv.2102.01672 preprint EN cc-by arXiv (Cornell University) 2021-01-01

Erfan Al-Hossami, Razvan Bunescu, Ryan Teehan, Laurel Powell, Khyati Mahajan, Mohsen Dorodchi. Proceedings of the 18th Workshop on Innovative Use NLP for Building Educational Applications (BEA 2023). 2023.

10.18653/v1/2023.bea-1.57 article EN cc-by 2023-01-01

We present a comprehensive survey of available corpora for multi-party dialogue. over 300 publications related to dialogue and catalogue all in novel taxonomy. analyze methods data collection identify several lacunae existing approaches used collect such this survey, the first focus exclusively on corpora, motivate research area. Through our discussion methods, we desiderata guiding principles contribute further towards advancing area research.

10.18653/v1/2021.sigdial-1.36 article EN cc-by 2021-01-01

We present Community Connect, a custom social media platform for conducting controlled experiments of human behavior. The key distinguishing factor Connect is the ability to control visibility user posts based on groups they belong to, allowing careful and investigation into how information propagates through network. release this as resource broader community, facilitate research data collected networks.

10.1145/3437963.3441698 article EN 2021-03-06

Instruction finetuning (IFT) is critical for aligning Large Language Models (LLMs) to follow instructions. Numerous effective IFT datasets have been proposed in the recent past, but most focus on high resource languages such as English. In this work, we propose a fully synthetic, novel taxonomy (Evol) guided Multilingual, Multi-turn instruction dataset, called M2Lingual, better align LLMs diverse set of and tasks. M2Lingual contains total 182K pairs that are built upon seeds, covering 70...

10.48550/arxiv.2406.16783 preprint EN arXiv (Cornell University) 2024-06-24

Multilingual LLMs have achieved remarkable benchmark performance, but we find they continue to underperform on non-Latin script languages across contemporary LLM families. This discrepancy arises from the fact that are pretrained with orthographic scripts, which dominated by Latin characters obscure their shared phonology scripts. We propose leveraging phonemic transcriptions as complementary signals induce script-invariant representations. Our study demonstrates integrating improves...

10.48550/arxiv.2411.02398 preprint EN arXiv (Cornell University) 2024-11-04

We highlight the contribution of emotional and moral language towards information contagion online. find that retweet count on Twitter is significantly predicted by use negative emotions with language. a tweet less likely to be retweeted (hence engagement potential for contagion) when it has expressed as anger along specific type language, known authority-vice. Conversely, sadness authority-vice, more retweeted. Our findings indicate how can interact in predicting contagion.

10.18653/v1/2020.winlp-1.34 article EN cc-by 2020-01-01

Recent research in the field of conversational AI has emphasized need for standardization metrics used evaluation.In this work, we focus on evaluation methods multi-party dialogue systems.We present an expanded taxonomy focusing based dimensions that address challenges associated with presence multiple participants.We also survey utilized current research, and our findings regards to inconsistencies within existing work.Furthermore, discuss subsequent have more consistent methodologies...

10.18653/v1/2022.inlg-main.23 article EN cc-by 2022-01-01

We present our work on augmenting dialog act recognition capabilities utilizing synthetically generated data. Our is motivated by the limitations of current datasets, and need to adapt for new domains as well ambiguity in utterances written humans. list observations findings towards how data can contribute meaningfully more robust dialogue models extending domains. major finding shows that synthetic data, which linguistically varied, be very useful this goal increase performance from (0.39,...

10.18653/v1/2022.gem-1.44 article EN cc-by 2022-01-01

Lolo Aboufoul, Khyati Mahajan, Tiffany Gallicano, Sara Levens, Samira Shaikh. Proceedings of the 59th Annual Meeting Association for Computational Linguistics and 11th International Joint Conference on Natural Language Processing: Student Research Workshop. 2021.

10.18653/v1/2021.acl-srw.31 article EN cc-by 2021-01-01
Coming Soon ...