NFDI4DS | UHH-SEMS - Publication Details

Joel Tetreault

ORCID: 0009-0003-3552-842X

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5038106045

Research Areas

Natural Language Processing Techniques
Topic Modeling
Text Readability and Simplification
Speech and dialogue systems
Intelligent Tutoring Systems and Adaptive Learning
Multimodal Machine Learning Applications
Software Engineering Research
Hate Speech and Cyberbullying Detection
Sentiment Analysis and Opinion Mining
Advanced Text Analysis Techniques
Second Language Acquisition and Learning
Public Relations and Crisis Communication
Authorship Attribution and Profiling
Complex Network Analysis Techniques
Online Learning and Analytics
Multi-Agent Systems and Negotiation
Language, Metaphor, and Cognition
Educational Technology and Assessment
Misinformation and Its Impacts
Video Analysis and Summarization
Humor Studies and Applications
Data Quality and Management
Innovative Teaching and Learning Methods
Explainable Artificial Intelligence (XAI)
Computational and Text Analysis Methods

Dataminr (United States)
2020-2023

University of Illinois Urbana-Champaign
2023

University of Washington
2022

Yahoo (United Kingdom)
2010-2021

Carnegie Mellon University
2021

University of Maryland, College Park
2018-2021

University of Copenhagen
2020-2021

Bar-Ilan University
2021

University of Helsinki
2021

Tel Aviv University
2021

Abusive Language Detection in Online User Content

OPENALEX - Publications

Chikashi Nobata Joel Tetreault Achint Thomas Yashar Mehdad Yi Chang

Detection of abusive language in user generated online content has become an issue increasing importance recent years. Most current commercial methods make use blacklists and regular expressions, however these measures fall short when contending with more subtle, less ham-fisted examples hate speech. In this work, we develop a machine learning based method to detect speech on comments from two domains which outperforms state-of-the-art deep approach. We also corpus annotated for language,...

10.1145/2872427.2883062 article EN 2016-04-11

Dear Sir or Madam, May I Introduce the GYAFC Dataset: Corpus, Benchmarks and Metrics for Formality Style Transfer

OPENALEX - Publications

Sudha Rao Joel Tetreault

Style transfer is the task of automatically transforming a piece text in one particular style into another. A major barrier to progress this field has been lack training and evaluation datasets, as well benchmarks automatic metrics. In work, we create largest corpus for stylistic (formality) show that techniques from machine translation community can serve strong baselines future work. We also discuss challenges using

10.18653/v1/n18-1012 article EN cc-by 2018-01-01

Do Characters Abuse More Than Words?

OPENALEX - Publications

Yashar Mehdad Joel Tetreault

Although word and character n-grams have been used as features in different NLP applications, no systematic comparison or analysis has shown the power of character-based for detecting abusive language.In this study, we investigate effectiveness such language detection user-generated online comments, show that methods outperform previous state-of-theart approaches other strong baselines.

10.18653/v1/w16-3638 article EN cc-by 2016-01-01

TGIF: A New Dataset and Benchmark on Animated GIF Description

OPENALEX - Publications

Yuncheng Li Yale Song Liangliang Cao Joel Tetreault Larry Goldberg and 2 more

With the recent popularity of animated GIFs on social media, there is need for ways to index them with rich meta-data. To advance research GIF understanding, we collected a new dataset, Tumblr (TGIF), 100K from and 120K natural language descriptions obtained via crowdsourcing. The motivation this work develop testbed image sequence description systems, where task generate or video clips. ensure high quality developed series novel controls validate free-form text input crowd-workers. We show...

10.1109/cvpr.2016.502 article EN 2016-06-01

JFLEG: A Fluency Corpus and Benchmark for Grammatical Error Correction

OPENALEX - Publications

Courtney Napoles Keisuke Sakaguchi Joel Tetreault

We present a new parallel corpus, JHU FLuency-Extended GUG corpus (JFLEG) for developing and evaluating grammatical error correction (GEC). Unlike other corpora, it represents broad range of language proficiency levels uses holistic fluency edits to not only correct errors but also make the original text more native sounding. describe types corrections made benchmark four leading GEC systems on this identifying specific areas in which they do well how can improve. JFLEG fulfills need gold...

10.18653/v1/e17-2037 article EN cc-by 2017-01-01

Automated Grammatical Error Detection for Language Learners

OPENALEX - Publications

Claudia Leacock Martin Chodorow Michael Gamon Joel Tetreault

NOTE ⁃ A New Edition of This Title is Available: Automated Grammatical Error Detection for Language Learners, Second

10.2200/s00275ed1v01y201006hlt009 article EN Synthesis lectures on human language technologies 2010-01-01

TOEFL11: A CORPUS OF NON‐NATIVE ENGLISH

OPENALEX - Publications

Daniel Blanchard Joel Tetreault Derrick Higgins Aoife Cahill Martin Chodorow

ABSTRACT This report presents work on the development of a new corpus non‐native English writing. It will be useful for task native language identification, as well grammatical error detection and correction, automatic essay scoring. In this report, is described in detail.

10.1002/j.2333-8504.2013.tb02331.x article EN ETS Research Report Series 2013-12-01

The ups and downs of preposition error detection in ESL writing

OPENALEX - Publications

Joel Tetreault Martin Chodorow

In this paper we describe a methodology for detecting preposition errors in the writing of non-native English speakers.Our system performs at 84% precision and close to 19% recall on large set student essays.In addition, address problem annotation evaluation domain by showing how current approaches using only one rater can skew evaluation.We present sampling approach circumvent some issues that complicate error detection systems.

10.3115/1599081.1599190 article EN 2008-01-01

A Report on the 2017 Native Language Identification Shared Task

OPENALEX - Publications

Shervin Malmasi Keelan Evanini Aoife Cahill Joel Tetreault Robert Pugh and 3 more

Shervin Malmasi, Keelan Evanini, Aoife Cahill, Joel Tetreault, Robert Pugh, Christopher Hamill, Diane Napolitano, Yao Qian. Proceedings of the 12th Workshop on Innovative Use NLP for Building Educational Applications. 2017.

10.18653/v1/w17-5007 article EN cc-by 2017-01-01

Ground Truth for Grammaticality Correction Metrics

OPENALEX - Publications

Courtney Napoles Keisuke Sakaguchi Matt Post Joel Tetreault

Courtney Napoles, Keisuke Sakaguchi, Matt Post, Joel Tetreault. Proceedings of the 53rd Annual Meeting Association for Computational Linguistics and 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). 2015.

10.3115/v1/p15-2097 article EN cc-by 2015-01-01

Detection of grammatical errors involving prepositions

OPENALEX - Publications

Martin Chodorow Joel Tetreault Na-Rae Han

This paper presents ongoing work on the detection of preposition errors non-native speakers English. Since prepositions account for a substantial proportion all grammatical by ESL (English as Second Language) learners, developing an NLP application that can reliably detect these types will provide invaluable learning resource to students. To address this problem, we use maximum entropy classifier combined with rule-based filters in corpus student essays. Although our is preliminary, achieve...

10.3115/1654629.1654635 article EN 2007-01-01

An Empirical Analysis of Formality in Online Communication

OPENALEX - Publications

Ellie Pavlick Joel Tetreault

This paper presents an empirical study of linguistic formality. We perform analysis humans’ perceptions formality in four different genres. These findings are used to develop a statistical model for predicting formality, which is evaluated under feature settings and apply our investigation online discussion forums, present consistent with theories coordination.

10.1162/tacl_a_00083 article EN cc-by Transactions of the Association for Computational Linguistics 2016-12-01

It Depends: Dependency Parser Comparison Using A Web-based Evaluation Tool

OPENALEX - Publications

Jinho D. Choi Joel Tetreault Amanda Stent

Jinho D. Choi, Joel Tetreault, Amanda Stent. Proceedings of the 53rd Annual Meeting Association for Computational Linguistics and 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2015.

10.3115/v1/p15-1038 article EN cc-by 2015-01-01

Sender-intended functions of emojis in US messaging

OPENALEX - Publications

Henriette Cramer Paloma de Juan Joel Tetreault

Emojis are an extremely common occurrence in mobile communications, but their meaning is open to interpretation. We investigate motivations for usage messaging the US. This study asked 228 participants last time that they used one or more emojis a conversational message, and collected along with description of emojis' intended function. discuss functional distinctions between: adding additional emotional situational meaning, adjusting tone, making message engaging recipient, conversation...

10.1145/2935334.2935370 article EN 2016-08-26

Detecting Sarcasm in Multimodal Social Platforms

OPENALEX - Publications

Rossano Schifanella Paloma de Juan Joel Tetreault Liangliang Cao

Sarcasm is a peculiar form of sentiment expression, where the surface differs from implied sentiment. The detection sarcasm in social media platforms has been applied past mainly to textual utterances lexical indicators (such as interjections and intensifiers), linguistic markers, contextual information user profiles, or conversations) were used detect sarcastic tone. However, modern allow create multimodal messages audiovisual content integrated with text, making analysis mode isolation...

10.1145/2964284.2964321 preprint EN Proceedings of the 30th ACM International Conference on Multimedia 2016-09-29

Multimodal Categorization of Crisis Events in Social Media

OPENALEX - Publications

Mahdi Abavisani Liwei Wu Shengli Hu Joel Tetreault Alejandro Jaimes

Recent developments in image classification and natural language processing, coupled with the rapid growth social media usage, have enabled fundamental advances detecting breaking events around world real-time. Emergency response is one such area that stands to gain from these advances. By processing billions of texts images a minute, can be automatically detected enable emergency workers better assess rapidly evolving situations deploy resources accordingly. To date, most event detection...

10.1109/cvpr42600.2020.01469 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020-06-01

The utility of article and preposition error correction systems for English language learners: Feedback and assessment

OPENALEX - Publications

Martin Chodorow Michael Gamon Joel Tetreault

In this paper, we describe and evaluate two state-of-the-art systems for identifying correcting writing errors involving English articles prepositions. Criterion SM , developed by Educational Testing Service, ESL Assistant Microsoft Research, both use machine learning techniques to build models of article preposition usage which enable them identify suggest corrections the writer. We evaluated effects these on users in studies. one, provided feedback about native non-native speakers who were...

10.1177/0265532210364391 article EN Language Testing 2010-07-01

Predicting Grammaticality on an Ordinal Scale

OPENALEX - Publications

Michael Heilman Aoife Cahill Nitin Madnani Melissa Lopez Matthew Mulholland and 1 more

Michael Heilman, Aoife Cahill, Nitin Madnani, Melissa Lopez, Matthew Mulholland, Joel Tetreault. Proceedings of the 52nd Annual Meeting Association for Computational Linguistics (Volume 2: Short Papers). 2014.

10.3115/v1/p14-2029 article EN cc-by 2014-01-01

Reassessing the Goals of Grammatical Error Correction: Fluency Instead of Grammaticality

OPENALEX - Publications

Keisuke Sakaguchi Courtney Napoles Matt Post Joel Tetreault

The field of grammatical error correction (GEC) has grown substantially in recent years, with research directed at both evaluation metrics and improved system performance against those metrics. One unvisited assumption, however, is the reliance GEC on error-coded corpora, which contain specific labeled corrections. We examine current practices show that GEC’s such corpora unnaturally constrains annotation automatic evaluation, resulting (a) sentences do not sound acceptable to native...

10.1162/tacl_a_00091 article EN cc-by Transactions of the Association for Computational Linguistics 2016-12-01

Yara Parser: A Fast and Accurate Dependency Parser

OPENALEX - Publications

Mohammad Sadegh Rasooli Joel Tetreault

Dependency parsers are among the most crucial tools in natural language processing as they have many important applications downstream tasks such information retrieval, machine translation and knowledge acquisition. We introduce Yara Parser, a fast accurate open-source dependency parser based on arc-eager algorithm beam search. It achieves an unlabeled accuracy of 93.32 standard WSJ test set which ranks it top parsers. At its fastest, can parse about 4000 sentences per second when greedy...

10.48550/arxiv.1503.06733 preprint EN other-oa arXiv (Cornell University) 2015-01-01

Dialogue Act Classification with Context-Aware Self-Attention

OPENALEX - Publications

Vipul Raheja Joel Tetreault

Recent work in Dialogue Act classification has treated the task as a sequence labeling problem using hierarchical deep neural networks. We build on this prior by leveraging effectiveness of context-aware self-attention mechanism coupled with recurrent network. conduct extensive evaluations standard datasets and show significant improvement over state-of-the-art results Switchboard (SwDA) Corpus. also investigate impact different utterance-level representation learning methods that our method...

10.48550/arxiv.1904.02594 preprint EN other-oa arXiv (Cornell University) 2019-01-01

Coming Soon ...