- Natural Language Processing Techniques
- Topic Modeling
- Semantic Web and Ontologies
- Text Readability and Simplification
- Syntax, Semantics, Linguistic Variation
- Speech and dialogue systems
- Advanced Text Analysis Techniques
- Language, Discourse, Communication Strategies
- Neurobiology of Language and Bilingualism
- Language and cultural evolution
- Multimodal Machine Learning Applications
- Reading and Literacy Development
- Speech Recognition and Synthesis
- Language Development and Disorders
- Logic, Reasoning, and Knowledge
- Sentiment Analysis and Opinion Mining
- EEG and Brain-Computer Interfaces
- Neuroscience and Music Perception
- Domain Adaptation and Few-Shot Learning
- Biomedical Text Mining and Ontologies
- Neural dynamics and brain function
- Authorship Attribution and Profiling
- Wikis in Education and Collaboration
- Neural Networks and Applications
- Data Quality and Management
University of Rochester
2018-2025
Mississippi State University
2024
Johns Hopkins University
2016-2021
University of Maryland, College Park
2015
Aaron Steven White, Drew Reisinger, Keisuke Sakaguchi, Tim Vieira, Sheng Zhang, Rachel Rudinger, Kyle Rawlins, Benjamin Van Durme. Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 2016.
Adam Poliak, Aparajita Haldar, Rachel Rudinger, J. Edward Hu, Ellie Pavlick, Aaron Steven White, Benjamin Van Durme. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2018.
We develop a probabilistic model of S(emantic)-selection that encodes both the notion systematic mappings from semantic type signature to syntactic distribution—i.e., projection rules—and selectional noise—e.g., C(ategory)-selection, L(exical)-selection, and/or other independent processes. train this on data large-scale judgment study assessing acceptability 1,000 English clause-taking verbs in 50 distinct frames, finding infers coherent signatures. focus signatures relevant interrogative...
Rachel Rudinger, Aaron Steven White, Benjamin Van Durme. Proceedings of the 2018 Conference North American Chapter Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). 2018.
Abstract Propositional attitude verbs, such as think and want , have long held interest for both theoretical linguists language acquisitionists because their syntactic, semantic, pragmatic properties display complex interactions that proven difficult to fully capture from either perspective. This paper explores the granularity with which these verbs’ semantic are recoverable syntactic distributions, using three behavioral experiments aimed at explicitly quantifying relationship between two...
We ask whether text understanding has progressed to where we may extract event information through incremental refinement of bleached statements derived from annotation manuals. Such a capability would allow for the trivial construction and extension an extraction framework by intended end-users declarations such as, “Some person was born in some location at time.” introduce example model that employs statements, with experiments illustrating can events under closed ontologies generalize...
We present a novel semantic framework for modeling temporal relations and event durations that maps pairs of events to real-valued scales. use this construct the largest dataset date, covering entirety Universal Dependencies English Web Treebank. train models jointly predicting fine-grained durations. report strong results on our data show efficacy transfer-learning approach categorical relations.
Adam Poliak, Aparajita Haldar, Rachel Rudinger, J. Edward Hu, Ellie Pavlick, Aaron Steven White, Benjamin Van Durme. Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP. 2018.
Theories of clause selection that aim to explain the distribution interrogative and declarative complement clauses often take as a starting point predicates like think, believe, hope, fear are incompatible with complements. After discussing experimental evidence against generalizations on which these theories rest, I give corpus even core data faulty: in fact compatible complements, suggesting any theory predicting they should not be must jettisoned. EARLY ACCESS
We investigate which patterns of lexically triggered doxastic, bouletic, neg(ation)-raising, and veridicality inferences are (un)attested across clause-embedding verbs in English. To carry out this investigation, we use a multiview mixed effects mixture model to discover the inference captured three lexicon-scale judgment datasets: two existing datasets, MegaVeridicality MegaNegRaising, capture neg-raising wide swath English lexicon, new dataset, MegaIntensionality, similarly captures...
We investigate neural models’ ability to capture lexicosyntactic inferences: inferences triggered by the interaction of lexical and syntactic information. take task event factuality prediction as a case study build judgment dataset for all English clause-embedding verbs in various contexts. use this dataset, which we make publicly available, probe behavior current state-of-the-art systems, showing that these systems certain systematic errors are clearly visible through lens prediction.
We introduce five new natural language inference (NLI) datasets focused on temporal reasoning. recast four existing annotated for event duration—how long an lasts—and ordering—how events are temporally arranged—into more than one million NLI examples. use these to investigate how well neural models trained a popular corpus capture forms of
We investigate the relationship between frequency with which verbs are found in particular subcategorization frames and acceptability of those frames, focusing on subordinate clause-taking verbs, such as think, want, tell. show that verbs’ frame distributions poor predictors their frames—explaining, at best, less than ⅓ total information about across lexicon—and, further, common matrix factorization techniques used to model acquisition fare only marginally better.
Patrick Xia, Guanghui Qin, Siddharth Vashishtha, Yunmo Chen, Tongfei Chandler May, Craig Harman, Kyle Rawlins, Aaron Steven White, Benjamin Van Durme. Proceedings of the 16th Conference European Chapter Association for Computational Linguistics: System Demonstrations. 2021.
Mahsa Yarmohammadi, Shijie Wu, Marc Marone, Haoran Xu, Seth Ebner, Guanghui Qin, Yunmo Chen, Jialiang Guo, Craig Harman, Kenton Murray, Aaron Steven White, Mark Dredze, Benjamin Van Durme. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 2021.
We show that when analyzing data from inference judgment tasks, it can be important to incorporate into one's analysis regime an explicit representation of the semantics natural language prompt used guide participants on task. To demonstrate this, we conduct two experiments within existing experimental paradigm focused measuring factive inferences, while manipulating receive in small but semantically potent ways. In statistical model comparisons couched framework probabilistic dynamic...
In recent years, it has become clear that EEG indexes the comprehension of natural, narrative speech. One particularly compelling demonstration this fact can be seen by regressing responses to speech against measures how individual words in linguistically relate their preceding context. This approach produces a so-called temporal response function displays centro-parietal negativity reminiscent classic N400 component event-related potential. shortcoming previous implementations is they have...
Fine-tuning is known to improve NLP models by adapting an initial model trained on more plentiful but less domain-salient examples data in a target domain. Such domain adaptation typically done using one stage of fine-tuning. We demonstrate that gradually fine-tuning multi-stage process can yield substantial further gains and be applied without modifying the or learning objective.
We present a novel semantic framework for modeling linguistic expressions of generalization— generic, habitual, and episodic statements—as combinations simple, real-valued referential properties predicates their arguments. use this to construct dataset covering the entirety Universal Dependencies English Web Treebank. probe efficacy type-level token-level information—including hand-engineered features static (GloVe) contextual (ELMo) word embeddings—for predicting generalization.
We introduce a transductive model for parsing into Universal Decompositional Semantics (UDS) representations, which jointly learns to map natural language utterances UDS graph structures and annotate the with decompositional semantic attribute scores. also strong pipeline structure, show that our parser performs comparably while additionally performing prediction. By analyzing prediction errors, we find captures relationships between groups.
We propose the semantic proto-role linking model, which jointly induces both predicate-specific roles and predicate-general proto-roles based on property likelihood judgments. use this model to empirically evaluate Dowty’s thematic theory.
We present a novel iterative extraction model, IterX, for extracting complex relations, or templates, i.e., N-tuples representing mapping from named slots to spans of text within document. Documents may feature zero more instances template any given type, and the task entails identifying templates in document each template's slot values. Our imitation learning approach casts problem as Markov decision process (MDP), relieves need use predefined orders train an extractor. It leads...