Joel Tetreault

ORCID: 0009-0003-3552-842X
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Natural Language Processing Techniques
  • Topic Modeling
  • Text Readability and Simplification
  • Speech and dialogue systems
  • Intelligent Tutoring Systems and Adaptive Learning
  • Multimodal Machine Learning Applications
  • Software Engineering Research
  • Hate Speech and Cyberbullying Detection
  • Sentiment Analysis and Opinion Mining
  • Advanced Text Analysis Techniques
  • Second Language Acquisition and Learning
  • Public Relations and Crisis Communication
  • Authorship Attribution and Profiling
  • Complex Network Analysis Techniques
  • Online Learning and Analytics
  • Multi-Agent Systems and Negotiation
  • Language, Metaphor, and Cognition
  • Educational Technology and Assessment
  • Misinformation and Its Impacts
  • Video Analysis and Summarization
  • Humor Studies and Applications
  • Data Quality and Management
  • Innovative Teaching and Learning Methods
  • Explainable Artificial Intelligence (XAI)
  • Computational and Text Analysis Methods

Dataminr (United States)
2020-2023

University of Illinois Urbana-Champaign
2023

University of Washington
2022

Yahoo (United Kingdom)
2010-2021

Carnegie Mellon University
2021

University of Maryland, College Park
2018-2021

University of Copenhagen
2020-2021

Bar-Ilan University
2021

University of Helsinki
2021

Tel Aviv University
2021

Detection of abusive language in user generated online content has become an issue increasing importance recent years. Most current commercial methods make use blacklists and regular expressions, however these measures fall short when contending with more subtle, less ham-fisted examples hate speech. In this work, we develop a machine learning based method to detect speech on comments from two domains which outperforms state-of-the-art deep approach. We also corpus annotated for language,...

10.1145/2872427.2883062 article EN 2016-04-11

Style transfer is the task of automatically transforming a piece text in one particular style into another. A major barrier to progress this field has been lack training and evaluation datasets, as well benchmarks automatic metrics. In work, we create largest corpus for stylistic (formality) show that techniques from machine translation community can serve strong baselines future work. We also discuss challenges using

10.18653/v1/n18-1012 article EN cc-by 2018-01-01

Although word and character n-grams have been used as features in different NLP applications, no systematic comparison or analysis has shown the power of character-based for detecting abusive language.In this study, we investigate effectiveness such language detection user-generated online comments, show that methods outperform previous state-of-theart approaches other strong baselines.

10.18653/v1/w16-3638 article EN cc-by 2016-01-01

With the recent popularity of animated GIFs on social media, there is need for ways to index them with rich meta-data. To advance research GIF understanding, we collected a new dataset, Tumblr (TGIF), 100K from and 120K natural language descriptions obtained via crowdsourcing. The motivation this work develop testbed image sequence description systems, where task generate or video clips. ensure high quality developed series novel controls validate free-form text input crowd-workers. We show...

10.1109/cvpr.2016.502 article EN 2016-06-01

We present a new parallel corpus, JHU FLuency-Extended GUG corpus (JFLEG) for developing and evaluating grammatical error correction (GEC). Unlike other corpora, it represents broad range of language proficiency levels uses holistic fluency edits to not only correct errors but also make the original text more native sounding. describe types corrections made benchmark four leading GEC systems on this identifying specific areas in which they do well how can improve. JFLEG fulfills need gold...

10.18653/v1/e17-2037 article EN cc-by 2017-01-01

NOTE ⁃ A New Edition of This Title is Available: Automated Grammatical Error Detection for Language Learners, Second

10.2200/s00275ed1v01y201006hlt009 article EN Synthesis lectures on human language technologies 2010-01-01

ABSTRACT This report presents work on the development of a new corpus non‐native English writing. It will be useful for task native language identification, as well grammatical error detection and correction, automatic essay scoring. In this report, is described in detail.

10.1002/j.2333-8504.2013.tb02331.x article EN ETS Research Report Series 2013-12-01

In this paper we describe a methodology for detecting preposition errors in the writing of non-native English speakers.Our system performs at 84% precision and close to 19% recall on large set student essays.In addition, address problem annotation evaluation domain by showing how current approaches using only one rater can skew evaluation.We present sampling approach circumvent some issues that complicate error detection systems.

10.3115/1599081.1599190 article EN 2008-01-01

Shervin Malmasi, Keelan Evanini, Aoife Cahill, Joel Tetreault, Robert Pugh, Christopher Hamill, Diane Napolitano, Yao Qian. Proceedings of the 12th Workshop on Innovative Use NLP for Building Educational Applications. 2017.

10.18653/v1/w17-5007 article EN cc-by 2017-01-01

Courtney Napoles, Keisuke Sakaguchi, Matt Post, Joel Tetreault. Proceedings of the 53rd Annual Meeting Association for Computational Linguistics and 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). 2015.

10.3115/v1/p15-2097 article EN cc-by 2015-01-01

This paper presents ongoing work on the detection of preposition errors non-native speakers English. Since prepositions account for a substantial proportion all grammatical by ESL (English as Second Language) learners, developing an NLP application that can reliably detect these types will provide invaluable learning resource to students. To address this problem, we use maximum entropy classifier combined with rule-based filters in corpus student essays. Although our is preliminary, achieve...

10.3115/1654629.1654635 article EN 2007-01-01

This paper presents an empirical study of linguistic formality. We perform analysis humans’ perceptions formality in four different genres. These findings are used to develop a statistical model for predicting formality, which is evaluated under feature settings and apply our investigation online discussion forums, present consistent with theories coordination.

10.1162/tacl_a_00083 article EN cc-by Transactions of the Association for Computational Linguistics 2016-12-01

Jinho D. Choi, Joel Tetreault, Amanda Stent. Proceedings of the 53rd Annual Meeting Association for Computational Linguistics and 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2015.

10.3115/v1/p15-1038 article EN cc-by 2015-01-01

Emojis are an extremely common occurrence in mobile communications, but their meaning is open to interpretation. We investigate motivations for usage messaging the US. This study asked 228 participants last time that they used one or more emojis a conversational message, and collected along with description of emojis' intended function. discuss functional distinctions between: adding additional emotional situational meaning, adjusting tone, making message engaging recipient, conversation...

10.1145/2935334.2935370 article EN 2016-08-26

Sarcasm is a peculiar form of sentiment expression, where the surface differs from implied sentiment. The detection sarcasm in social media platforms has been applied past mainly to textual utterances lexical indicators (such as interjections and intensifiers), linguistic markers, contextual information user profiles, or conversations) were used detect sarcastic tone. However, modern allow create multimodal messages audiovisual content integrated with text, making analysis mode isolation...

10.1145/2964284.2964321 preprint EN Proceedings of the 30th ACM International Conference on Multimedia 2016-09-29

Recent developments in image classification and natural language processing, coupled with the rapid growth social media usage, have enabled fundamental advances detecting breaking events around world real-time. Emergency response is one such area that stands to gain from these advances. By processing billions of texts images a minute, can be automatically detected enable emergency workers better assess rapidly evolving situations deploy resources accordingly. To date, most event detection...

10.1109/cvpr42600.2020.01469 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020-06-01

In this paper, we describe and evaluate two state-of-the-art systems for identifying correcting writing errors involving English articles prepositions. Criterion SM , developed by Educational Testing Service, ESL Assistant Microsoft Research, both use machine learning techniques to build models of article preposition usage which enable them identify suggest corrections the writer. We evaluated effects these on users in studies. one, provided feedback about native non-native speakers who were...

10.1177/0265532210364391 article EN Language Testing 2010-07-01

Michael Heilman, Aoife Cahill, Nitin Madnani, Melissa Lopez, Matthew Mulholland, Joel Tetreault. Proceedings of the 52nd Annual Meeting Association for Computational Linguistics (Volume 2: Short Papers). 2014.

10.3115/v1/p14-2029 article EN cc-by 2014-01-01

The field of grammatical error correction (GEC) has grown substantially in recent years, with research directed at both evaluation metrics and improved system performance against those metrics. One unvisited assumption, however, is the reliance GEC on error-coded corpora, which contain specific labeled corrections. We examine current practices show that GEC’s such corpora unnaturally constrains annotation automatic evaluation, resulting (a) sentences do not sound acceptable to native...

10.1162/tacl_a_00091 article EN cc-by Transactions of the Association for Computational Linguistics 2016-12-01

Dependency parsers are among the most crucial tools in natural language processing as they have many important applications downstream tasks such information retrieval, machine translation and knowledge acquisition. We introduce Yara Parser, a fast accurate open-source dependency parser based on arc-eager algorithm beam search. It achieves an unlabeled accuracy of 93.32 standard WSJ test set which ranks it top parsers. At its fastest, can parse about 4000 sentences per second when greedy...

10.48550/arxiv.1503.06733 preprint EN other-oa arXiv (Cornell University) 2015-01-01

Recent work in Dialogue Act classification has treated the task as a sequence labeling problem using hierarchical deep neural networks. We build on this prior by leveraging effectiveness of context-aware self-attention mechanism coupled with recurrent network. conduct extensive evaluations standard datasets and show significant improvement over state-of-the-art results Switchboard (SwDA) Corpus. also investigate impact different utterance-level representation learning methods that our method...

10.48550/arxiv.1904.02594 preprint EN other-oa arXiv (Cornell University) 2019-01-01
Coming Soon ...