- Natural Language Processing Techniques
- Topic Modeling
- Text Readability and Simplification
- Speech and dialogue systems
- Intelligent Tutoring Systems and Adaptive Learning
- Multimodal Machine Learning Applications
- Software Engineering Research
- Hate Speech and Cyberbullying Detection
- Sentiment Analysis and Opinion Mining
- Advanced Text Analysis Techniques
- Second Language Acquisition and Learning
- Public Relations and Crisis Communication
- Authorship Attribution and Profiling
- Complex Network Analysis Techniques
- Online Learning and Analytics
- Multi-Agent Systems and Negotiation
- Language, Metaphor, and Cognition
- Educational Technology and Assessment
- Misinformation and Its Impacts
- Video Analysis and Summarization
- Humor Studies and Applications
- Data Quality and Management
- Innovative Teaching and Learning Methods
- Explainable Artificial Intelligence (XAI)
- Computational and Text Analysis Methods
Dataminr (United States)
2020-2023
University of Illinois Urbana-Champaign
2023
University of Washington
2022
Yahoo (United Kingdom)
2010-2021
Carnegie Mellon University
2021
University of Maryland, College Park
2018-2021
University of Copenhagen
2020-2021
Bar-Ilan University
2021
University of Helsinki
2021
Tel Aviv University
2021
Detection of abusive language in user generated online content has become an issue increasing importance recent years. Most current commercial methods make use blacklists and regular expressions, however these measures fall short when contending with more subtle, less ham-fisted examples hate speech. In this work, we develop a machine learning based method to detect speech on comments from two domains which outperforms state-of-the-art deep approach. We also corpus annotated for language,...
Style transfer is the task of automatically transforming a piece text in one particular style into another. A major barrier to progress this field has been lack training and evaluation datasets, as well benchmarks automatic metrics. In work, we create largest corpus for stylistic (formality) show that techniques from machine translation community can serve strong baselines future work. We also discuss challenges using
Although word and character n-grams have been used as features in different NLP applications, no systematic comparison or analysis has shown the power of character-based for detecting abusive language.In this study, we investigate effectiveness such language detection user-generated online comments, show that methods outperform previous state-of-theart approaches other strong baselines.
With the recent popularity of animated GIFs on social media, there is need for ways to index them with rich meta-data. To advance research GIF understanding, we collected a new dataset, Tumblr (TGIF), 100K from and 120K natural language descriptions obtained via crowdsourcing. The motivation this work develop testbed image sequence description systems, where task generate or video clips. ensure high quality developed series novel controls validate free-form text input crowd-workers. We show...
We present a new parallel corpus, JHU FLuency-Extended GUG corpus (JFLEG) for developing and evaluating grammatical error correction (GEC). Unlike other corpora, it represents broad range of language proficiency levels uses holistic fluency edits to not only correct errors but also make the original text more native sounding. describe types corrections made benchmark four leading GEC systems on this identifying specific areas in which they do well how can improve. JFLEG fulfills need gold...
NOTE ⁃ A New Edition of This Title is Available: Automated Grammatical Error Detection for Language Learners, Second
ABSTRACT This report presents work on the development of a new corpus non‐native English writing. It will be useful for task native language identification, as well grammatical error detection and correction, automatic essay scoring. In this report, is described in detail.
In this paper we describe a methodology for detecting preposition errors in the writing of non-native English speakers.Our system performs at 84% precision and close to 19% recall on large set student essays.In addition, address problem annotation evaluation domain by showing how current approaches using only one rater can skew evaluation.We present sampling approach circumvent some issues that complicate error detection systems.
Shervin Malmasi, Keelan Evanini, Aoife Cahill, Joel Tetreault, Robert Pugh, Christopher Hamill, Diane Napolitano, Yao Qian. Proceedings of the 12th Workshop on Innovative Use NLP for Building Educational Applications. 2017.
Courtney Napoles, Keisuke Sakaguchi, Matt Post, Joel Tetreault. Proceedings of the 53rd Annual Meeting Association for Computational Linguistics and 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). 2015.
This paper presents ongoing work on the detection of preposition errors non-native speakers English. Since prepositions account for a substantial proportion all grammatical by ESL (English as Second Language) learners, developing an NLP application that can reliably detect these types will provide invaluable learning resource to students. To address this problem, we use maximum entropy classifier combined with rule-based filters in corpus student essays. Although our is preliminary, achieve...
This paper presents an empirical study of linguistic formality. We perform analysis humans’ perceptions formality in four different genres. These findings are used to develop a statistical model for predicting formality, which is evaluated under feature settings and apply our investigation online discussion forums, present consistent with theories coordination.
Jinho D. Choi, Joel Tetreault, Amanda Stent. Proceedings of the 53rd Annual Meeting Association for Computational Linguistics and 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2015.
Emojis are an extremely common occurrence in mobile communications, but their meaning is open to interpretation. We investigate motivations for usage messaging the US. This study asked 228 participants last time that they used one or more emojis a conversational message, and collected along with description of emojis' intended function. discuss functional distinctions between: adding additional emotional situational meaning, adjusting tone, making message engaging recipient, conversation...
Sarcasm is a peculiar form of sentiment expression, where the surface differs from implied sentiment. The detection sarcasm in social media platforms has been applied past mainly to textual utterances lexical indicators (such as interjections and intensifiers), linguistic markers, contextual information user profiles, or conversations) were used detect sarcastic tone. However, modern allow create multimodal messages audiovisual content integrated with text, making analysis mode isolation...
Recent developments in image classification and natural language processing, coupled with the rapid growth social media usage, have enabled fundamental advances detecting breaking events around world real-time. Emergency response is one such area that stands to gain from these advances. By processing billions of texts images a minute, can be automatically detected enable emergency workers better assess rapidly evolving situations deploy resources accordingly. To date, most event detection...
In this paper, we describe and evaluate two state-of-the-art systems for identifying correcting writing errors involving English articles prepositions. Criterion SM , developed by Educational Testing Service, ESL Assistant Microsoft Research, both use machine learning techniques to build models of article preposition usage which enable them identify suggest corrections the writer. We evaluated effects these on users in studies. one, provided feedback about native non-native speakers who were...
Michael Heilman, Aoife Cahill, Nitin Madnani, Melissa Lopez, Matthew Mulholland, Joel Tetreault. Proceedings of the 52nd Annual Meeting Association for Computational Linguistics (Volume 2: Short Papers). 2014.
The field of grammatical error correction (GEC) has grown substantially in recent years, with research directed at both evaluation metrics and improved system performance against those metrics. One unvisited assumption, however, is the reliance GEC on error-coded corpora, which contain specific labeled corrections. We examine current practices show that GEC’s such corpora unnaturally constrains annotation automatic evaluation, resulting (a) sentences do not sound acceptable to native...
Dependency parsers are among the most crucial tools in natural language processing as they have many important applications downstream tasks such information retrieval, machine translation and knowledge acquisition. We introduce Yara Parser, a fast accurate open-source dependency parser based on arc-eager algorithm beam search. It achieves an unlabeled accuracy of 93.32 standard WSJ test set which ranks it top parsers. At its fastest, can parse about 4000 sentences per second when greedy...
Recent work in Dialogue Act classification has treated the task as a sequence labeling problem using hierarchical deep neural networks. We build on this prior by leveraging effectiveness of context-aware self-attention mechanism coupled with recurrent network. conduct extensive evaluations standard datasets and show significant improvement over state-of-the-art results Switchboard (SwDA) Corpus. also investigate impact different utterance-level representation learning methods that our method...