Anjana Arunkumar

ORCID: 0000-0003-3513-8600
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Explainable Artificial Intelligence (XAI)
  • Topic Modeling
  • Data Visualization and Analytics
  • Adversarial Robustness in Machine Learning
  • Natural Language Processing Techniques
  • Software Engineering Research
  • Anomaly Detection Techniques and Applications
  • Software System Performance and Reliability
  • Smart Grid Security and Resilience
  • Time Series Analysis and Forecasting
  • Data Management and Algorithms
  • Computational Physics and Python Applications
  • Machine Learning and Data Classification
  • Power System Optimization and Stability
  • Aesthetic Perception and Analysis
  • Advanced Text Analysis Techniques
  • Healthcare Operations and Scheduling Optimization
  • Creativity in Education and Neuroscience
  • Domain Adaptation and Few-Shot Learning
  • Multimodal Machine Learning Applications
  • Mind wandering and attention
  • Neural Networks and Applications
  • Text Readability and Simplification
  • Online Learning and Analytics
  • Operations Management Techniques

Arizona State University
2021-2025

Northeastern University
2025

Joseph Eye Hospital
2011

Yizhong Wang, Swaroop Mishra, Pegah Alipoormolabashi, Yeganeh Kordi, Amirreza Mirzaei, Atharva Naik, Arjun Ashok, Arut Selvan Dhanasekaran, Anjana Arunkumar, David Stap, Eshaan Pathak, Giannis Karamanolakis, Haizhi Lai, Ishan Purohit, Ishani Mondal, Jacob Anderson, Kirby Kuznia, Krima Doshi, Kuntal Kumar Pal, Maitreya Patel, Mehrad Moradshahi, Mihir Parmar, Mirali Neeraj Varshney, Phani Rohitha Kaza, Pulkit Verma, Ravsehaj Singh Puri, Rushang Karia, Savan Shailaja Keyur Sampat, Siddhartha...

10.18653/v1/2022.emnlp-main.340 article ID cc-by Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2022-01-01

Large language models (LLMs) have gained widespread popularity due to their ability perform ad-hoc natural processing (NLP) tasks with simple prompts. Part of the appeal for LLMs is approachability general public, including individuals little technical expertise in NLP. However, prompts can vary significantly terms linguistic structure, context, and other semantics, modifying one or more these aspects result significant differences task performance. Non-expert users may find it challenging...

10.1109/tvcg.2025.3535332 article EN IEEE Transactions on Visualization and Computer Graphics 2025-01-01

How well can NLP models generalize to a variety of unseen tasks when provided with task instructions? To address this question, we first introduce Super-NaturalInstructions, benchmark 1,616 diverse and their expert-written instructions. Our collection covers 76 distinct types, including but not limited classification, extraction, infilling, sequence tagging, text rewriting, composition. This large enables rigorous benchmarking cross-task generalization under instructions -- training follow...

10.48550/arxiv.2204.07705 preprint EN cc-by arXiv (Cornell University) 2022-01-01

Large Language Models (LLMs) have gained widespread popularity due to their ability perform ad-hoc Natural Processing (NLP) tasks with a simple natural language prompt. Part of the appeal for LLMs is approachability general public, including individuals no prior technical experience in NLP techniques. However, prompts can vary significantly terms linguistic structure, context, and other semantics. Modifying one or more these aspects result significant differences task performance. Non-expert...

10.48550/arxiv.2304.01964 preprint EN other-oa arXiv (Cornell University) 2023-01-01

User experience in data visualization is typically assessed through post-viewing self-reports, but these overlook the dynamic cognitive processes during interaction. This study explores use of mind wandering- a phenomenon where attention spontaneously shifts from primary task to internal, task-related thoughts or unrelated distractions- as measure exploration. Participants reported wandering while viewing visualizations pre-labeled database and then provided quantitative ratings trust,...

10.1109/tvcg.2024.3456344 article EN IEEE Transactions on Visualization and Computer Graphics 2024-01-01

Models that top leaderboards often perform unsatisfactorily when deployed in real world applications; this has necessitated rigorous and expensive pre-deployment model testing. A hitherto unexplored facet of performance is: Are our doing equitable evaluation? In paper, we introduce a task-agnostic method to probe by weighting samples based on their 'difficulty' level. We find can be adversarially attacked performing models may not always the best models. subsequently propose alternate...

10.1609/aaai.v35i15.17599 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2021-05-18

Neural language models have achieved human level performance across several NLP datasets. However, recent studies shown that these are not truly learning the desired task; rather, their high is attributed to overfitting using spurious biases, which suggests capabilities of AI systems been over-estimated. We introduce a generic formula for Data Quality Index (DQI) help dataset creators create datasets free such unwanted biases. evaluate this recently proposed approach adversarial filtering,...

10.48550/arxiv.2005.00816 preprint EN other-oa arXiv (Cornell University) 2020-01-01

How do people internalize visualizations: as images or information? In this study, we investigate the nature of internalization for visualizations (i.e., how mind encodes in memory) and memory encoding affects its retrieval. This exploratory work examines influence various design elements on a user's perception chart. Specifically, which lead to perceptions visualization an image (aims provide visual references, evoke emotions, express creativity, inspire philosophic thought) information...

10.1109/tvcg.2023.3326919 article EN IEEE Transactions on Visualization and Computer Graphics 2023-01-01

Models that surpass human performance on several popular benchmarks display significant degradation in exposure to Out of Distribution (OOD) data. Recent research has shown models overfit spurious biases and `hack' datasets, lieu learning generalizable features like humans. In order stop the inflation model -- thus overestimation AI systems' capabilities we propose a simple novel evaluation metric, WOOD Score, encourages generalization during evaluation.

10.48550/arxiv.2007.06898 preprint EN other-oa arXiv (Cornell University) 2020-01-01

The electrical power grid is a critical infrastructure, with disruptions in transmission having severe repercussions on daily activities, across multiple sectors. To identify, prevent, and mitigate such events, grids are being refurbished as 'smart' systems that include the widespread deployment of GPS-enabled phasor measurement units (PMUs). PMUs provide fast, precise, time-synchronized measurements voltage current, enabling real-time wide-area monitoring control. However, potential...

10.1109/tvcg.2022.3209380 article EN IEEE Transactions on Visualization and Computer Graphics 2022-01-01

A `state of the art' model surpasses humans in a benchmark B, but fails on similar benchmarks C, D, and E. What does B have that other do not? Recent research provides answer: spurious bias. However, developing to solve through E not guarantee it will future benchmarks. To progress towards `truly learns' an underlying task, we need quantify differences between successive benchmarks, as opposed existing binary black-box approaches. We propose novel approach this underexplored task quantifying...

10.48550/arxiv.2008.03964 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Alluvial diagrams are a popular technique for visualizing flow and relational data. However, successfully reading interpreting the data shown in an alluvial diagram is likely influenced by factors such as volume, complexity, chart layout. To understand how consumption impacted its visual features, we conduct two crowdsourced user studies with set of varying examine (i) participant performance on analysis tasks, (ii) perceived complexity charts. Using study results, employ Bayesian modelling...

10.1109/vis49827.2021.9623282 article EN 2021-10-01

User experience in data visualization is typically assessed through post-viewing self-reports, but these overlook the dynamic cognitive processes during interaction. This study explores use of mind wandering -- a phenomenon where attention spontaneously shifts from primary task to internal, task-related thoughts or unrelated distractions as measure exploration. Participants reported while viewing visualizations pre-labeled database and then provided quantitative ratings trust, engagement,...

10.48550/arxiv.2408.03576 preprint EN arXiv (Cornell University) 2024-08-07

Electric transmission power grids are being revamped with the widespread deployment of GPS-enabled phasor measurement units (PMUs) for real-time wide-area monitoring and control via precise, time-synchronized measurements voltage current. Large, concurrently produced volumes noisy data hinder PMU usability, particularly analysis oscillation load fluctuation events in grid. We examine visualization challenges electric grid develop PMUVis, a platform that supports scalable network topology...

10.1109/mcg.2022.3171506 article EN IEEE Computer Graphics and Applications 2022-04-29

Delays and avoidable waiting in hospitals is a major concern for providers patients. This article explains how Queueing model can be applied to reduce unwanted delays providing quality services an eye hospital using bottleneck analysis. A process map was developed 1413 samples were included this study. The entry exit time at each measured queuing theory server utilization rates calculated. study revealed significant bottlenecks. Using mathematical simulations, appropriate solutions also developed.

10.17010/pijom/2011/v4i4/62414 article EN Prabandhan Indian Journal of Management 2011-04-01

Delays and avoidable waiting in hospitals is a major concern for providers patients. This article explains how Queueing model can be applied to reduce unwanted delays providing quality services an eye hospital using bottleneck analysis. A process map was developed 1413 samples were included this study. The entry exit time at each measured queuing theory server utilization rates calculated. study revealed significant bottlenecks. Using mathematical simulations, appropriate solutions also developed.

10.17010//2011/v4i4/62414 article EN Prabandhan Indian Journal of Management 2011-04-01

Even though deep neural models have achieved superhuman performance on many popular benchmarks, they failed to generalize OOD or adversarial datasets. Conventional approaches aimed at increasing robustness include developing increasingly large and augmentation with scale However, orthogonal these trends, we hypothesize that a smaller, high quality dataset is what need. Our hypothesis based the fact networks are data driven models, leads/misleads models. In this work, propose an empirical...

10.48550/arxiv.2203.06404 preprint EN other-oa arXiv (Cornell University) 2022-01-01

Several benchmarks have been built with heavy investment in resources to track our progress NLP. Thousands of papers published response those competed top leaderboards, models often surpassing human performance. However, recent studies shown that triumph over several popular just by overfitting on spurious biases, without truly learning the desired task. Despite this finding, benchmarking, while trying tackle bias, still relies workarounds, which do not fully utilize invested benchmark...

10.48550/arxiv.2210.07566 preprint EN other-oa arXiv (Cornell University) 2022-01-01

Evaluation of models on benchmarks is unreliable without knowing the degree sample hardness; this subsequently overestimates capability AI systems and limits their adoption in real world applications. We propose a Data Scoring task that requires assignment each unannotated benchmark score between 0 to 1, where signifies easy 1 hard. Use samples our design inspired from humans who can determine question difficulty its correct answer. This also rules out use methods involving model based...

10.48550/arxiv.2210.07631 preprint EN other-oa arXiv (Cornell University) 2022-01-01

Recent research has shown that language models exploit `artifacts' in benchmarks to solve tasks, rather than truly learning them, leading inflated model performance. In pursuit of creating better benchmarks, we propose VAIDA, a novel benchmark creation paradigm for NLP, focuses on guiding crowdworkers, an under-explored facet addressing idiosyncrasies. VAIDA facilitates sample correction by providing realtime visual feedback and recommendations improve quality. Our approach is domain, model,...

10.48550/arxiv.2302.04434 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Cross-task generalization is a significant outcome that defines mastery in natural language understanding. Humans show remarkable aptitude for this, and can solve many different types of tasks, given definitions the form textual instructions small set examples. Recent work with pre-trained models mimics this learning style: users define exemplify task model to attempt as series prompts or instructions. While prompting approaches have led higher cross-task compared traditional supervised...

10.48550/arxiv.2304.06184 preprint EN other-oa arXiv (Cornell University) 2023-01-01
Coming Soon ...