Rebecca Hwa

ORCID: 0000-0003-1158-7014
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Topic Modeling
  • Natural Language Processing Techniques
  • Text Readability and Simplification
  • Sentiment Analysis and Opinion Mining
  • Biomedical Text Mining and Ontologies
  • Multimodal Machine Learning Applications
  • Speech and dialogue systems
  • Machine Learning and Algorithms
  • Software Engineering Research
  • Advanced Text Analysis Techniques
  • Algorithms and Data Compression
  • Domain Adaptation and Few-Shot Learning
  • Speech Recognition and Synthesis
  • Semantic Web and Ontologies
  • Text and Document Classification Technologies
  • Data Visualization and Analytics
  • Machine Learning and Data Classification
  • Language, Metaphor, and Cognition
  • Advanced Neural Network Applications
  • Video Analysis and Summarization
  • Authorship Attribution and Profiling
  • Human Pose and Action Recognition
  • Advanced Graph Neural Networks
  • Reinforcement Learning in Robotics
  • Writing and Handwriting Education

University of Pittsburgh
2014-2024

Laboratoire d'Informatique de Paris-Nord
2016

Carnegie Mellon University
2010

University of California, San Diego
2007

University of Maryland, College Park
2001-2005

Harvard University
1998-2000

Harvard University Press
1998-1999

Cornell University
1994

This paper investigates bootstrapping for statistical parsers to reduce their reliance on manually annotated training data. We consider both a mostly-unsupervised approach, cotraining, in which two are iteratively re-trained each other's output; and semi-supervised corrected co-training, human corrects parser's output before adding it the The selection of labeled examples is an integral part frameworks. propose several methods based criteria minimizing errors data maximizing utility. show...

10.3115/1073445.1073476 article EN 2003-01-01

Broad coverage, high quality parsers are available for only a handful of languages. A prerequisite developing broad coverage more languages is the annotation text with desired linguistic representations (also known as "treebanking"). However, syntactic labor intensive and time-consuming process, it difficult to find linguistically annotated in sufficient quantities. In this article, we explore using parallel help solving problem creating The central idea annotate English side corpus, project...

10.1017/s1351324905003840 article EN Natural Language Engineering 2005-09-21

There has been a recent swell of interest in the automatic identification and extraction opinions emotions text. In this paper, we present first experimental results classifying intensity other types subjectivity deeply nested clauses. We use wide range features, including new syntactic features developed for opinion recognition. vary learning algorithm feature organization to explore effect on classification task. 10‐fold cross‐validation experiments using support vector regression, achieve...

10.1111/j.1467-8640.2006.00275.x article EN Computational Intelligence 2006-05-01

We present a practical co-training method for bootstrapping statistical parsers using small amount of manually parsed training material and much larger pool raw sentences. Experimental results show that unlabelled sentences can be used to improve the performance parsers. In addition, we consider problem boot-strapping when is in different domain either or testing material. continues useful, even though no produced parses from target are used.

10.3115/1067807.1067851 article EN 2003-01-01

Corpus-based statistical parsing relies on using large quantities of annotated text as training examples. Building this kind resource is expensive and labor-intensive. This work proposes to use sample selection find helpful examples reduce human effort spent annotating less informative ones. We consider several criteria for predicting whether unlabeled data might be a example. Experiments are performed across two syntactic learning tasks within the single task models compare effect different...

10.1162/0891201041850894 article EN Computational Linguistics 2004-09-01

Recently, statistical machine translation models have begun to take advantage of higher level linguistic structures such as syntactic dependencies. Underlying these is an assumption about the directness translational correspondence between sentences in two languages; however, extent which this valid and useful not well understood. In paper, we present empirical study that quantifies degree dependencies are preserved when parses projected directly from English Chinese. Our results show...

10.3115/1073083.1073149 article EN 2001-01-01

Natural Language Processing applications often require large amounts of annotated training data, which are expensive to obtain. In this paper we investigate the applicability Co-training train classifiers that predict emotions in spoken dialogues. order do so, have first applied wrapper approach with Forward Selection and Naïve Bayes, reduce dimensionality our feature set. Our results show can be highly effective when a good set features chosen.

10.3115/1219044.1219072 article EN 2004-01-01

Images and text in advertisements interact complex, non-literal ways. The two channels are usually complementary, with each channel telling a different part of the story. Current approaches, such as image captioning methods, only examine literal, redundant relationships, where show exactly same content. To understand more complex we first collect dataset advertisement interpretations for whether slogan visual form parallel (conveying message without literally saying thing) or non-parallel...

10.48550/arxiv.1807.08205 preprint EN other-oa arXiv (Cornell University) 2018-01-01

Corpus-based grammar induction relies on using many hand-parsed sentences as training examples. However, the construction of a corpus with detailed syntactic analysis for every sentence is labor-intensive task. We propose to use sample selection methods minimize amount annotation needed in data, thereby reducing workload human annotators. This paper shows that annotated data can be reduced by 36% without degrading quality induced grammars.

10.3115/1117794.1117800 article EN 2000-01-01

Corpus-based grammar induction generally relies on hand-parsed training data to learn the structure of language. Unfortunately, cost building large annotated corpora is prohibitively expensive. This work aims improve strategy when there are few labels in data. We show that most informative linguistic constituents higher nodes parse trees, typically denoting complex noun phrases and sentential clauses. They account for only 20% all constituents. For inducing grammars from sparsely labeled...

10.3115/1034678.1034699 article EN 1999-01-01

The gap between domain experts and natural language processing expertise is a barrier to extracting understanding from clinical text. We describe prototype tool for interactive review revision of models binary concepts extracted notes. evaluated our in user study involving 9 physicians, who used build revise 2 colonoscopy quality variables. report changes performance relative the quantity feedback. Using initial training sets as small 10 documents, expert led final F1scores...

10.1093/jamia/ocx070 article EN Journal of the American Medical Informatics Association 2017-06-10

This paper presents ArgRewrite, a corpus of between-draft revisions argumentative essays. Drafts are manually aligned at the sentence level, and writer’s purpose for each revision is annotated with categories analogous to those used in argument mining discourse analysis. The should enable advanced research writing comparison analysis, as demonstrated via our own studies student behavior automatic prediction.

10.18653/v1/p17-1144 article EN cc-by Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2017-01-01

Abstract We present The Chinese Room , a visualization interface that allows users to explore and interact with multitude of linguistic resources in order decode correct poor machine translations. target are not bilingual familiar translation technologies. investigate the ability our system assist such decoding correcting faulty found by collaborating application, end‐users can overcome many difficult errors disambiguate translated passages were otherwise baffling. also examine utility...

10.1111/j.1467-8659.2009.01443.x article EN Computer Graphics Forum 2009-06-01

While intelligent writing assistants have become more common, they typically little support for revision behavior.We present Ar-gRewrite, a novel web-based assistant that focus on rewriting analysis.The system supports two major functionalities: 1) to assist students as revise, the automatically extracts and analyzes revisions; 2) teachers, provides an overview of students' revisions allows teachers correct analyzed results, ensuring get feedback.

10.18653/v1/n16-3008 article EN cc-by 2016-01-01

We present the design and evaluation of a web-based intelligent writing assistant that helps students recognize their revisions argumentative essays. To understand how our revision can best support students, we have implemented four versions system with differences in unit span (sentence versus sub-sentence) analysis level feedback provided (none, binary, or detailed purpose categorization). first discuss decisions behind relevant components system, then analyze efficacy different through...

10.1145/3411764.3445683 preprint EN 2021-05-06

The lack of annotated data is an obstacle to the development many natural language processing applications; problem especially severe when non-English. Previous studies suggested possibility acquiring resources for non-English languages by bootstrapping from high quality English NLP tools and parallel corpora; however, success these approaches seems limited dissimilar pairs. In this paper, we propose a novel approach combining bootstrapped resource with small amount manually data. We compare...

10.3115/1220575.1220682 article EN 2005-01-01

Previous studies have shown automatic evaluation metrics to be more reliable when compared against many human translations. However, multiple references may not always available. It is common only a single reference (extracted from parallel texts) or no at all. Our earlier work suggested that one way address this problem train metric evaluate sentence by comparing it pseudo references, imperfect "references" produced off-the-shelf MT systems. In paper, we further examine the approach both in...

10.3115/1626394.1626424 article EN 2008-01-01

Many idiomatic expressions can be interpreted figuratively or literally depending on their contexts. This paper proposes an unsupervised learning method for recognizing the intended usages of idioms. We treat as a latent variable in probabilistic models and train them linguistically motivated feature space. Crucially, we show that distributional semantics is helpful heuristic distinguishing literal usage idioms, giving us way to formulate metric estimate likelihood idiom literally....

10.18653/v1/d18-1199 article EN cc-by Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2018-01-01

10.18653/v1/n16-1040 article EN Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 2016-01-01
Coming Soon ...