- Topic Modeling
- Natural Language Processing Techniques
- Sentiment Analysis and Opinion Mining
- Mental Health via Writing
- Misinformation and Its Impacts
- Advanced Text Analysis Techniques
- Particle physics theoretical and experimental studies
- Dark Matter and Cosmic Phenomena
- Hate Speech and Cyberbullying Detection
- Speech and dialogue systems
- Authorship Attribution and Profiling
- Social and Intergroup Psychology
- Text Readability and Simplification
- Software Engineering Research
- Mental Health Research Topics
- Model Reduction and Neural Networks
- Speech Recognition and Synthesis
- Personality Traits and Psychology
- Cosmology and Gravitation Theories
- Computational and Text Analysis Methods
- Privacy-Preserving Technologies in Data
- Machine Learning in Healthcare
- AI in Service Interactions
- Artificial Intelligence in Healthcare and Education
- Semantic Web and Ontologies
University of Bonn
2023-2025
Fraunhofer Society
2024
Lamarr Institute for Machine Learning and Artificial Intelligence
2024
Center for Information Technology
2024
Bonn Aachen International Center for Information Technology
2024
Philipps University of Marburg
2021-2023
Universitätsklinikum Gießen und Marburg
2023
Hess (United States)
2022
Hessian Center for Artificial Intelligence
2022
Indraprastha Institute of Information Technology Delhi
2021
Physics beyond the Standard Model predicts possible existence of new particles that can be searched at low energy frontier in sub-eV range. The OSQAR photon regeneration experiment looks for "Light Shining through a Wall" from quantum oscillation optical photons into "Weakly Interacting Sub-eV Particles", such as axion or Axion-Like Particles (ALPs), 9 T transverse magnetic field over unprecedented length $2 \times 14.3$ m. In 2014, this has been run with an outstanding sensitivity, using...
Writing style allows NLP tools to adjust the traits of an author.In this paper, we explore relation between stylistic and syntactic features authors' age income.We confirm our hypothesis that for numerous feature types writing is predictive income even beyond age.We analyze power in a regression task on two data sets around 5,000 Twitter users each.Additionally, use validated study daily variations from distinct groups.Temporal patterns not only provide novel psychological insight into user...
Coarse-grained semantic categories such as supersenses have proven useful for a range of downstream tasks question answering or machine translation.To date, no effort has been put into integrating the distributional word representations.We present novel joint embedding model words and supersenses, providing insights relationship between in same vector space.Using these embeddings deep neural network model, we demonstrate that supersense enrichment leads to significant improvement...
Most NLP models today treat language as universal, even though socio- and psycholingustic research shows that the communicated message is influenced by characteristics of speaker well target audience. This paper surveys landscape personalization in natural processing related fields, offers a path forward to mitigate decades deviation tools from sociolingustic findings, allowing flexibly process “natural” each user rather than enforcing uniform treatment. It outlines possible direction...
Recent psychological studies indicate that individuals exhibiting suicidal ideation increasingly turn to social media rather than mental health practitioners. Contextualizing the build-up of such is critical for identification users at risk. In this work, we focus on identifying intent in tweets by augmenting linguistic models with emotional phases modeled from users' historical context. We propose PHASE, a time-and phase-aware framework adaptively learns features user's spectrum Twitter...
Correlations between input parameters play a crucial role in many scientific classification tasks, since these are often related to fundamental laws of nature. For example, high energy physics, one the common deep learning use-cases is signal and background processes particle collisions. In such cases, principles correlations observables better understood than actual distributions themselves. this work, we present new adversarial attack algorithm called Random Distribution Shuffle Attack...
Lucie Flekova, Jordan Carpenter, Salvatore Giorgi, Lyle Ungar, Daniel Preoţiuc-Pietro. Proceedings of the 54th Annual Meeting Association for Computational Linguistics (Volume 1: Long Papers). 2016.
Ramit Sawhney, Harshit Joshi, Rajiv Ratn Shah, Lucie Flek. Proceedings of the 2021 Conference North American Chapter Association for Computational Linguistics: Human Language Technologies. 2021.
The proliferation of ideological movements into extremist factions via social media has become a global concern. While radicalization been studied extensively within the context specific ideologies, our ability to accurately characterize extremism in more generalizable terms remains underdeveloped. In this paper, we propose novel method for extracting and analyzing discourse across range online community forums. By focusing on verbal behavioral signatures traits, develop framework...
While Large Language Models (LLMs) have shown impressive capabilities in math problem-solving tasks, their robustness to noisy inputs is not well-studied. In this work, we propose ArithmAttack examine how robust the LLMs are when they encounter prompts that contain extra noise form of punctuation marks. being easy implement, does cause any information loss since words added or deleted from context. We evaluate seven LLMs, including LLama3, Mistral, and Mathstral, on GSM8K MultiArith...
Large Language Models (LLMs) have shown impressive performance in various NLP tasks. However, there are concerns about their reliability different domains of linguistic variations. Many works proposed robustness evaluation measures for local adversarial attacks, but we need globally robust models unbiased to language styles. We take a broader approach explore wider range variations across sociodemographic dimensions perform structured tests on the reasoning capacity models. extend SocialIQA...
Large Language Models (LLMs) are trained on Web data that might contain spelling errors made by humans. But do they become robust to similar real-world noise? In this paper, we investigate the effect of mistakes performance 9 language models, with parameters ranging from 0.2B 13B, in 3 different NLP tasks, namely Natural Inference (NLI), Name Entity Recognition (NER), and Intent Classification (IC). We perform our experiments 6 languages build a dictionary noise for them using Wikipedia edit...
In this work, we evaluate annotator disagreement in Word-in-Context (WiC) tasks exploring the relationship between contextual meaning and as part of CoMeDi shared task competition. While prior studies have modeled by analyzing attributes with single-sentence inputs, incorporates WiC to bridge gap sentence-level semantic representation judgment variability. We describe three different methods that developed for task, including a feature enrichment approach combines concatenation, element-wise...
Equitable access to reliable health information is vital for public health, but the quality of online resources varies by language, raising concerns about inconsistencies in Large Language Models (LLMs) healthcare. In this study, we examine consistency responses provided LLMs health-related questions across English, German, Turkish, and Chinese. We largely expand HealthFC dataset categorizing disease type broadening its multilingual scope with Turkish Chinese translations. reveal significant...
Conceptual operationalizations of empathy in NLP are varied, with some having specific behaviors and properties, while others more abstract. How these variations relate to one another capture properties observable text remains unclear. To provide insight into this, we analyze the transfer performance models adapted tasks different theoretical groundings. We study (1) dimensionality definitions, (2) correspondence between defined dimensions measured/observed (3) conduciveness data represent...
Large Language Models (LLMs) are trained on Web data that might contain spelling errors made by humans. But do they become robust to similar real-world noise? In this paper, we investigate the effect of mistakes performance 9 language models, with parameters ranging from 0.2B 13B, in 3 different NLP tasks, namely Natural Inference (NLI), Name Entity Recognition (NER), and Intent Classification (IC). We perform our experiments 6 languages build a dictionary noise for them using Wikipedia edit...
This study focuses on personality prediction of protagonists in novels based the Five-Factor Model personality.We present and publish a novel collaboratively built dataset fictional character design our task as text classification problem.We incorporate range semantic features, including WordNet VerbNet sense-level information word vector representations.We evaluate three machine learning models speech, actions predicatives main characters, show that especially lexical-semantic features...
Social media enable users to share their feelings and emotional struggles. They also offer an opportunity provide community support suicidal users. Recent studies on suicide risk assessment have explored the user's historic timeline information from social network analyze state. However, such methods often require a large amount of user-centric data. A less intrusive alternative is only use conversation trees arising online responses. Modeling conversations between person in distress...
Recent theoretical and experimental studies highlight the possibility of new fundamental particle physics beyond Standard Model that can be probed by sub-eV energy experiments. The OSQAR photon regeneration experiment looks for "Light Shining through a Wall" (LSW) from quantum oscillation optical photons into "Weakly Interacting Sub-eV Particles" (WISPs), like axion or axion-like particles (ALPs), in 9 T transverse magnetic field over unprecedented length $2 \times 14.3$ m. No excess events...
With more than 22 million articles, the largest collaborative knowledge resource never sleeps, experiencing several article edits every second. Over one fifth of these articles describes individual people, majority which are still alive. Such are, by their nature, prone to corruption and vandalism. Manual quality assurance experts can barely cope with this massive amount data. Can it be effectively replaced feedback from crowd? we provide meaningful support for automated text processing...
Abstract Online, social media communication is often ambiguous, and it can encourage speed inattentiveness. We investigated whether Actively Open Minded Thinking (AOT), a dispositional willingness to seek out new or potentially threatening information, may help users avoid these pitfalls. In Study 1, we determined that correctly assessing authors’ traits was positively predicted by raters’ AOT. 2, used data-driven methods devise three-dimensional picture of online behaviors people high low...
Contemporary sentiment analysis approaches rely heavily on lexicon based methods.This is mainly due to their simplicity, although the best empirical results can be achieved by more complex techniques.We introduce a method assess suitability of generic lexicons for given domain, namely identify frequent bigrams where polar word switches polarity.Our are scored using Lexicographers Mutual Information and leveraging large automatically obtained corpora.Our score matches human perception...
People associate certain behaviors with social groups. These stereotypical beliefs consist of both accurate and inaccurate associations. Using large-scale, data-driven methods media as a context, we isolate stereotypes by using verbal expression. Across four categories—gender, age, education level, political orientation—we identify words phrases that lead people to incorrectly guess the category writer. Although raters often correctly categorize authors, they overestimate importance some...
Existing sarcasm detection systems focus on exploiting linguistic markers, context, or user-level priors. However, social studies suggest that the relationship between author and audience can be equally relevant for usage interpretation. In this work, we propose a framework jointly leveraging (1) user context from their historical tweets together with (2) information user’s neighborhood in an interaction graph, to contextualize interpretation of post. We distinguish perceived self-reported...