- Topic Modeling
- Natural Language Processing Techniques
- Advanced Text Analysis Techniques
- Sentiment Analysis and Opinion Mining
- Multimodal Machine Learning Applications
- Speech and dialogue systems
- Text Readability and Simplification
- Misinformation and Its Impacts
- Software Engineering Research
- Text and Document Classification Technologies
- Spam and Phishing Detection
- Hate Speech and Cyberbullying Detection
- Semantic Web and Ontologies
- Data Mining Algorithms and Applications
- Machine Learning and Data Classification
- Machine Learning and Algorithms
- Social Media and Politics
- Expert finding and Q&A systems
- Domain Adaptation and Few-Shot Learning
- Wikis in Education and Collaboration
- Complex Network Analysis Techniques
- Web Data Mining and Analysis
- AI-based Problem Solving and Planning
- Algorithms and Data Compression
- Anomaly Detection Techniques and Applications
Cornell University
2014-2023
The University of Texas at Dallas
2023
Bellevue Hospital Center
2022
Meta (Israel)
2021
Microsoft (United States)
2018-2019
Microsoft Research (United Kingdom)
2018
IBM Research - Zurich
2018
University of Pittsburgh
2018
University of North Carolina at Greensboro
2018
National Centre of Scientific Research "Demokritos"
2018
We present a noun phrase coreference system that extends the work of Soon et al. (2001) and, to our knowledge, produces best results date on MUC-6 and MUC-7 resolution data sets -F-measures 70.4 63.4,respectively.Improvements arise from two sources: extra-linguistic changes learning framework large-scale expansion feature set include more sophisticated linguistic knowledge.
We study automatic question generation for sentences from text passages in reading comprehension. introduce an attention-based sequence learning model the task and investigate effect of encoding sentence- vs. paragraph-level information. In contrast to all previous work, our does not rely on hand-crafted rules or a sophisticated NLP pipeline; it is instead trainable end-to-end via sequence-to-sequence learning. Automatic evaluation results show that system significantly outperforms...
Eneko Agirre, Carmen Banea, Claire Cardie, Daniel Cer, Mona Diab, Aitor Gonzalez-Agirre, Weiwei Guo, Rada Mihalcea, German Rigau, Janyce Wiebe. Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014). 2014.
Eneko Agirre, Carmen Banea, Claire Cardie, Daniel Cer, Mona Diab, Aitor Gonzalez-Agirre, Weiwei Guo, Iñigo Lopez-Gazpio, Montse Maritxalar, Rada Mihalcea, German Rigau, Larraitz Uria, Janyce Wiebe. Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015). 2015.
OpinionFinder is a system that performs subjectivity analysis, automatically identifying when opinions, sentiments, speculations, and other private states are present in text. Specifically, aims to identify subjective sentences mark various aspects of the these sentences, including source (holder) words included phrases expressing positive or negative sentiments.
Consumers increasingly rate, review and research products online. Consequently, websites containing consumer reviews are becoming targets of opinion spam. While recent work has focused primarily on manually identifiable instances spam, in this we study deceptive spam---fictitious opinions that have been deliberately written to sound authentic. Integrating from psychology computational linguistics, develop compare three approaches detecting ultimately a classifier is nearly 90% accurate our...
Recurrent neural networks (RNNs) are connectionist models of sequential data that naturally applicable to the analysis natural language. Recently, “depth in space” — as an orthogonal notion time” RNNs has been investigated by stacking multiple layers and shown empirically bring a temporal hierarchy architecture. In this work we apply these deep task opinion expression extraction formulated token-level sequence-labeling task. Experimental results show deep, narrow outperform traditional...
Recent systems have been developed for sentiment classification, opinion recognition, and analysis (e.g., detecting polarity strength). We pursue another aspect of analysis: identifying the sources opinions, emotions, sentiments. view this problem as an information extraction task adopt a hybrid approach that combines Conditional Random Fields (Lafferty et al., 2001) variation AutoSlog (Riloff, 1996a). While CRFs model source identification sequence tagging task, learns patterns. Our results...
Determining the polarity of a sentiment-bearing expression requires more than simple bag-of-words approach. In particular, words or constituents within can interact with each other to yield particular overall polarity. this paper, we view such subsentential interactions in light compositional semantics, and present novel learning-based approach that incorporates structural inference motivated by semantics into learning procedure. Our experiments show (1) heuristics based on perform better...
The problem of event extraction requires detecting the trigger and extracting its corresponding arguments. Existing work in argument typically relies heavily on entity recognition as a preprocessing/concurrent step, causing well-known error propagation. To avoid this issue, we introduce new paradigm for by formulating it question answering (QA) task that extracts arguments an end-to-end manner. Empirical results demonstrate our framework outperforms prior methods substantially; addition, is...
Consumers' purchase decisions are increasingly influenced by user-generated online reviews.Accordingly, there has been growing concern about the potential for posting deceptive opinion spamfictitious reviews that have deliberately written to sound authentic, deceive reader.In this paper, we explore generalized approaches identifying spam based on a new gold standard dataset, which is comprised of data from three different domains (i.e.Hotel, Restaurant, Doctor), each contains types reviews,...
In recent years great success has been achieved in sentiment classification for English, thanks part to the availability of copious annotated resources. Unfortunately, most languages do not enjoy such an abundance labeled data. To tackle problem low-resource without adequate data, we propose Adversarial Deep Averaging Network (ADAN 1 ) transfer knowledge learned from data on a resource-rich source language where only unlabeled exist. ADAN two discriminative branches: classifier and...
Consumers' purchase decisions are increasingly influenced by user-generated online reviews. Accordingly, there has been growing concern about the potential for posting deceptive opinion spam---fictitious reviews that have deliberately written to sound authentic, deceive reader. But while this practice received considerable public attention and concern, relatively little is known actual prevalence, or rate, of deception in review communities, less still factors influence it.
We present a novel attention-based recurrent neural network for joint extraction of entity mentions and relations. show that attention along with long short term memory (LSTM) can extract semantic relations between without having access to dependency trees. Experiments on Automatic Content Extraction (ACE) corpora our model significantly outperforms feature-based by Li Ji (2014). also compare an end-to-end tree-based LSTM (SPTree) Miwa Bansal (2016) performs within 1% 2% Our fine-grained...
We investigate the efficacy of topic model based approaches to two multi-aspect sentiment analysis tasks: sentence labeling and rating prediction. For labeling, we propose a weakly-supervised approach that utilizes only minimal prior knowledge - in form seed words enforce direct correspondence between topics aspects. This is used label sentences with performance fully supervised baseline. prediction, find overall ratings can be conjunction our labelings achieve reasonable compared When...
Arzoo Katiyar, Claire Cardie. Proceedings of the 2018 Conference North American Chapter Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). 2018.
We present DREAM, the first dialogue-based multiple-choice reading comprehension data set. Collected from English as a Foreign Language examinations designed by human experts to evaluate level of Chinese learners English, our set contains 10,197 questions for 6,444 dialogues. In contrast existing sets, DREAM is focus on in-depth multi-turn multi-party dialogue understanding. likely significant challenges systems: 84% answers are non-extractive, 85% require reasoning beyond single sentence,...