- Complex Network Analysis Techniques
- Topic Modeling
- Advanced Text Analysis Techniques
- Opinion Dynamics and Social Influence
- Sentiment Analysis and Opinion Mining
- Natural Language Processing Techniques
- Semantic Web and Ontologies
- scientometrics and bibliometrics research
- Computational and Text Analysis Methods
- Misinformation and Its Impacts
- Hate Speech and Cyberbullying Detection
- Social Media and Politics
- Biomedical Text Mining and Ontologies
- Data Quality and Management
- Wikis in Education and Collaboration
- Web Data Mining and Analysis
- Public Relations and Crisis Communication
- Text and Document Classification Technologies
- Software Engineering Research
- Spam and Phishing Detection
- Social Capital and Networks
- Geographic Information Systems Studies
- Mental Health Research Topics
- Crime Patterns and Interventions
- Multimodal Machine Learning Applications
University of Illinois Urbana-Champaign
2016-2025
Technical University of Munich
2024-2025
National Center for Supercomputing Applications
2014-2021
Microsoft Research (United Kingdom)
2019
Carnegie Mellon University
2005-2012
It was recently reported that men self-cite >50% more often than women across a wide variety of disciplines in the bibliographic database JSTOR. Here, we replicate this finding sample 1.6 million papers from Author-ity, version PubMed with computationally disambiguated author names. More importantly, show gender effect largely disappears when accounting for prior publication count multidimensional statistical model. Gender has weakest on probability self-citation among an extensive set...
We investigate the relationship between basic principles of human morality and expression opinions in user-generated text data. assume that people's backgrounds, culture, values are associated with their perceptions expressions everyday topics, language use reflects these perceptions. While personal social effects abstract complex concepts, they have practical implications relevant for a wide range NLP applications. To extract (in this paper, morality) measure (morality stance), we...
Scholars have often relied on name initials to resolve ambiguities in large‐scale coauthorship network research. This approach bears the risk of incorrectly merging or splitting author identities. The use initial‐based disambiguation has been justified by assumption that such errors would not affect research findings too much. paper tests analyzing networks from five academic fields—biology, computer science, nanoscience, neuroscience, and physics—and an interdisciplinary journal, PNAS ....
Ming Jiang, Qiuyuan Huang, Lei Zhang, Xin Wang, Pengchuan Zhe Gan, Jana Diesner, Jianfeng Gao. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint (EMNLP-IJCNLP). 2019.
The Internet of Battlefield Things (IoBT) might be one the most expensive cyber-physical systems next decade, yet much research remains to develop its fundamental enablers. A challenge that distinguishes IoBT from civilian counterparts is resilience a larger spectrum threats.
Abstract Crisis response involves extensive planning and coordination within across a multitude of agencies organisations. This study explores how on‐the‐ground crisis efforts align with guidelines. These guidelines are key to the effectiveness response. To this end, we construct, analyse compare emergency networks by using network analysis natural language processing methods. Differences between plans practice, that is, false positives (actions delivered but not prescribed) negatives...
<title>Abstract</title> Prior analyses and assessments of the impact scientific research has mainly relied on analyzing its scope within academia influence scholarly circles. However, by not considering broader societal, economic, policy implications projects, these studies overlook ways in which discoveries contribute to technological innovation, public health improvements, environmental sustainability, other areas real-world application. We expand upon this prior work developing validating...
Examining the alignment of large language models (LLMs) has become increasingly important, particularly when these systems fail to operate as intended. This study explores challenge aligning LLMs with human intentions and values, specific focus on their political inclinations. Previous research highlighted LLMs' propensity display leanings, ability mimic certain parties' stances various issues. However, extent conditions under which deviate from empirical positions have not been thoroughly...
Gender biases in scholarly metrics remain a persistent concern, despite numerous bibliometric studies exploring their presence and absence across productivity, impact, acknowledgment, self-citations. However, methodological inconsistencies, particularly author name disambiguation gender identification, limit the reliability comparability of these studies, potentially perpetuating misperceptions hindering effective interventions. A review 70 relevant publications over past 12 years reveals...
Multiple studies have linked diversity in scientific collaborations to innovative and impactful research. Here, we explore how different indices—ethnicity, gender, academic age, topical expertise—interact thereby influence impact. Leveraging nearly 900,000 biomedical journal articles from PubMed, published major journals between 1991 2014, investigate the nuanced relationships among these indices their collective on research outcomes. By systematically varying model parametrizations, assess...
The popularity and availability of Twitter as a service data source have fueled the interest in sentiment analysis. Previous research has shed light on challenges that contextualizing effects linguistic complexities pose for accurate classification tweets. We test effect adding manually-annotated, corpus-based hashtags to lexicon, finding this step combination with negation detection increases prediction accuracy by about 7%. then use our enhanced model identify rank candidates Republican...
Structural balance theory assumes triads in networks to gravitate toward stable configurations. The has been verified for undirected graphs. Since real-world social are often directed, we introduce a novel method considering both transitivity and sign consistency calculating signed digraphs. We test our approach on graphs that constructed by using different methods identifying edge signs: natural language processing infer signs from underlying text data, self-reported survey data. Our...
The Ukraine-Russia conflict has brought sizable detrimental impact to the global energy, food, finance, and manufacturing industries, as well many affected people. In this paper, we use Twitter (now X) automatically identify who needs what from text data how types of that categorized standardize evolved throughout conflict. Our findings suggest Ukraine expresses a need for weapons, Russia land, Europe gas, America leadership. majority expressed on during are related categories...
Big social data have enabled new opportunities for evaluating the applicability of science theories that were formulated decades ago and often based on small- to medium-sized samples. Data coupled with powerful computing has potential replace statistical practice sampling estimating effects by measuring phenomena full populations. Preparing these analysis conducting analytics involves a plethora decisions, some which are already embedded in previously collected built tools. These decisions...
An upcoming frontier for distributed computing might literally save lives in future military operations. In civilian scenarios, significant efficiencies were gained from interconnecting devices into networked services and applications that automate much of everyday life smart homes to intelligent transportation. The ecosystem such is collectively called the Internet Things (IoT). Can similar benefits be a context by developing an IoT battlefield? This paper describes unique challenges as...
Abstract Balance theory explains how network structural configurations relate to tension in social systems, which are commonly modeled as static undirected signed graphs. We expand this modeling approach by incorporating directionality of edges and considering three levels analysis for balance assessment: triads, subgroups, the whole network. For triad-level balance, we develop a new measure utilizing semicycles that satisfy condition transitivity. subgroup-level propose measures...