- Sentiment Analysis and Opinion Mining
- Topic Modeling
- Advanced Text Analysis Techniques
- Hate Speech and Cyberbullying Detection
- Spam and Phishing Detection
- Natural Language Processing Techniques
- Explainable Artificial Intelligence (XAI)
- Biomedical Text Mining and Ontologies
- Text and Document Classification Technologies
- Social and Intergroup Psychology
- Adversarial Robustness in Machine Learning
- Mental Health via Writing
- Computational and Text Analysis Methods
- Humor Studies and Applications
- Misinformation and Its Impacts
- Mental Health Research Topics
- Artificial Intelligence in Healthcare and Education
- Social Media and Politics
- Media Influence and Politics
- Machine Learning in Bioinformatics
- Personality Traits and Psychology
- Data Visualization and Analytics
- Text Readability and Simplification
- Complex Network Analysis Techniques
- Semantic Web and Ontologies
National Research Council Canada
2013-2023
National Academies of Sciences, Engineering, and Medicine
2008-2021
University of Helsinki
2021
Tel Aviv University
2021
Technical University of Darmstadt
2021
University of Copenhagen
2021
Edinburgh Napier University
2021
Universitat Pompeu Fabra
2021
University of Amsterdam
2021
University of Antwerp
2021
We describe a state-of-the-art sentiment analysis system that detects (a) the of short informal textual messages such as tweets and SMS (message-level task) (b) word or phrase within message (term-level task). The is based on supervised statistical text classification approach leveraging variety surface-form, semantic, features. features are primarily derived from novel high-coverage tweet-specific lexicons. These lexicons automatically generated with sentiment-word hashtags emoticons. To...
Here for the first time we present a shared task on detecting stance from tweets: given tweet and target entity (person, organization, etc.), automatic natural language systems must determine whether tweeter is in favor of target, against or neither inference likely.The interest may not be referred to tweet, it opinion.Two tasks are proposed.Task A traditional supervised classification where 70% annotated data used as training rest testing.For Task B, use test all instances new (not A) no...
We present the SemEval-2018 Task 1: Affect in Tweets, which includes an array of subtasks on inferring affectual state a person from their tweet. For each task, we created labeled data English, Arabic, and Spanish tweets. The individual tasks are: 1. emotion intensity regression, 2. ordinal classification, 3. valence (sentiment) 4. 5. classification. Seventy-five teams (about 200 team members) participated shared task. summarize methods, resources, tools used by participating teams, with...
Reviews depict sentiments of customers towards various aspects a product or service. Some these can be grouped into coarser aspect categories. SemEval-2014 had shared task (Task 4) on aspect-level sentiment analysis, with over 30 teams participated. In this paper, we describe our submissions, which stood first in detecting categories, third terms, and second terms the laptop restaurant domains, respectively.
In this paper, we describe how created two state-of-the-art SVM classifiers, one to detect the sentiment of messages such as tweets and SMS (message-level task) a term within submissions stood first in both tasks on tweets, obtaining an F-score 69.02 message-level task 88.93 term-level task. We implemented variety surface-form, semantic, features. with sentiment-word hashtags, from emoticons. task, lexicon-based features provided gain 5 points over all others. Both our systems can be...
We can often detect from a person’s utterances whether he or she is in favor of against given target entity—one’s stance toward the target. However, person may express same by using negative positive language. Here for first time we present dataset tweet–target pairs annotated both and sentiment. The targets not be referred to tweets, they opinion tweets. Partitions this were used as training test sets SemEval-2016 shared task competition. propose simple detection system that outperforms...
Detecting emotions in microblogs and social media posts has applications for industry, health, security. Statistical, supervised automatic methods emotion detection rely on text that is labeled emotions, but such data are rare available only a handful of basic emotions. In this article, we show emotion‐word hashtags good manual labels tweets. We also propose method to generate large lexicon word–emotion associations from emotion‐labeled tweet corpus. This the first with real‐valued...
In this paper, we describe the 2015 iteration of SemEval shared task on Sentiment Analysis in Twitter.This was most popular sentiment analysis to date with more than 40 teams participating each last three years.This year's competition consisted five prediction subtasks.Two were reruns from previous years: (A) expressed by a phrase context tweet, and (B) overall tweet.We further included new subtasks asking predict (C) towards topic single (D) set tweets, (E) degree prior polarity phrase.
Automatic machine learning systems can inadvertently accentuate and perpetuate inappropriate human biases. Past work on examining biases has largely focused just individual systems. Further, there is no benchmark dataset for in Here the first time, we present Equity Evaluation Corpus (EEC), which consists of 8,640 English sentences carefully chosen to tease out towards certain races genders. We use examine 219 automatic sentiment analysis that took part a recent shared task, SemEval-2018...
Objective As clinical text mining continues to mature, its potential as an enabling technology for innovations in patient care and research is becoming a reality. A critical part of that process rigid benchmark testing natural language processing methods on realistic narrative. In this paper, the authors describe design performance three state-of-the-art text-mining applications from National Research Council Canada evaluations within 2010 i2b2 challenge.
Sentiment analysis research has predominantly been on English texts. Thus there exist many sentiment resources for English, but less so other languages. Approaches to improve in a resource-poor focus language include: (a) translate the text into resource-rich such as and apply powerful system text, (b) labeled corpora lexicons from language, use them additional focus-language system. In this paper we systematically examine both options. We Arabic social media posts stand-in text. show that...
Clinical trials are one of the most important sources evidence for guiding evidence-based practice and design new trials. However, this information is available only in free text - e.g., journal publications which labour intensive to process systematic reviews, meta-analyses, other synthesis studies. This paper presents an automatic extraction system, called ExaCT, that assists users with locating extracting key trial characteristics (e.g., eligibility criteria, sample size, drug dosage,...
Rating scales are a widely used method for data annotation; however, they present several challenges, such as difficulty in maintaining inter- and intra-annotator consistency. Best–worst scaling (BWS) is an alternative of annotation that claimed to produce high-quality annotations while keeping the required number similar rating scales. However, veracity this claim has never been systematically established. Here first time, we set up experiment directly compares scale with BWS. We show same...
Mohammad Salameh, Saif Mohammad, Svetlana Kiritchenko. Proceedings of the 2015 Conference North American Chapter Association for Computational Linguistics: Human Language Technologies. 2015.
One may express favor (or disfavor) towards a target by using positive or negative language.Here for the first time we present dataset of tweets annotated whether tweeter is in against pre-chosen targets, as well sentiment.These targets not be referred to tweets, and they opinion tweets.We develop simple stance detection system that outperforms all 19 teams participated recent shared task competition on same (SemEval-2016 Task #6).Additionally, access both sentiment annotations allows us...
Access to word-sentiment associations is useful for many applications, including sentiment analysis, stance detection, and linguistic analysis.However, manually assigning finegrained association scores words has challenges with respect keeping annotations consistent.We apply the annotation technique of Best-Worst Scaling obtain real-valued phrases in three different domains: general English, English Twitter, Arabic Twitter.We show that on all domains ranking by remains remarkably consistent...
This paper describes state-of-the-art statistical systems for automatic sentiment analysis of tweets. In a Semeval-2014 shared task (Task 9), our submissions obtained highest scores in the term-level classification subtask on both 2013 and 2014 tweets test sets. message-level task, LiveJournal blog posts set, sarcastic SMS set. These build SemEval-2013 (Mohammad et al., 2013) which ranked first termand subtasks 2013. Key improvements over are handling negation. We create separate...
We present a shared task on automatically determining sentiment intensity of word or phrase. The words and phrases are taken from three domains: general English, English Twitter, Arabic Twitter. include those composed negators, modals, degree adverbs as well formed by with opposing polarities. For each the domains, we assembled datasets that multi-word their constituent words, both manually annotated for real-valued scores. were presented test sets separate tasks (each focusing specific...
Abstract Objective We executed the Social Media Mining for Health (SMM4H) 2017 shared tasks to enable community-driven development and large-scale evaluation of automatic text processing methods classification normalization health-related from social media. An additional objective was publicly release manually annotated data. Materials Methods organized 3 independent subtasks: self-reports 1) adverse drug reactions (ADRs) 2) medication consumption, medication-mentioning tweets, 3) ADR...
Large language models (LLMs) have advanced to a point that even humans difficulty discerning whether text was generated by another human, or computer. However, knowing produced human artificial intelligence (AI) is important determining its trustworthiness, and has applications in many domains including detecting fraud academic dishonesty, as well combating the spread of misinformation political propaganda. The task AI-generated (AIGT) detection therefore both very challenging, highly...