- Natural Language Processing Techniques
- Topic Modeling
- Advanced Text Analysis Techniques
- Software Engineering Research
- Speech and dialogue systems
- Text Readability and Simplification
- Sentiment Analysis and Opinion Mining
- Semantic Web and Ontologies
- Multimodal Machine Learning Applications
- Authorship Attribution and Profiling
- Artificial Intelligence in Games
- linguistics and terminology studies
- Language, Metaphor, and Cognition
- Text and Document Classification Technologies
- Wikis in Education and Collaboration
- Syntax, Semantics, Linguistic Variation
- Advanced Malware Detection Techniques
- Expert finding and Q&A systems
- Genetic and Clinical Aspects of Sex Determination and Chromosomal Abnormalities
- Software Testing and Debugging Techniques
- Knowledge Management and Sharing
- Web Data Mining and Analysis
- Language and cultural evolution
- Second Language Acquisition and Learning
- Data Quality and Management
IT University of Copenhagen
2023
Google (United States)
2020-2023
Tokyo Institute of Technology
2023
Administration for Community Living
2023
American Jewish Committee
2023
Google (United Kingdom)
2020-2023
Bar-Ilan University
2021
University of Helsinki
2021
Tel Aviv University
2021
Technical University of Darmstadt
2021
We present a series of experiments on automatically identifying the sense implicit discourse relations, i.e. relations that are not marked with connective such as "but" or "because". work corpus in newspaper text and report results test set is representative naturally occurring distribution senses. use several linguistically informed features, including polarity tags, Levin verb classes, length phrases, modality, context, lexical features. In addition, we revisit past approaches using pairs...
The most widely adopted approaches for evaluation of summary content follow some protocol comparing a with gold-standard human summaries, which are traditionally called model summaries. This paradigm falls short when summaries not available and becomes less accurate only single is available. We propose three novel techniques. Two them model-free do rely on gold standard the assessment. third technique improves automatic evaluations by expanding set chosen system show that quantifying...
The LSDSem’17 shared task is the Story Cloze Test, a new evaluation for story understanding and script learning. This test provides system with four-sentence two possible endings, must choose correct ending to story. Successful narrative (getting closer human performance of 100%) requires systems link various levels semantics commonsense knowledge. A total eight participated in task, variety approaches including.
We present a fully automatic method for content selection evaluation in summarization that does not require the creation of human model summaries. Our work capitalizes on assumption distribution words input and an informative summary should be similar to each other. Results large scale from Text Analysis Conference show input-summary comparisons are very effective selection. methods rank participating systems similarly manual model-based pyramid judgments responsiveness. The best feature,...
Great writing is rare and highly admired. Readers seek out articles that are beautifully written, informative entertaining. Yet information-access technologies lack capabilities for predicting article quality at this level. In paper we present first experiments on prediction in the science journalism domain. We introduce a corpus of great pieces journalism, along with typical from genre. implement features to capture aspects writing, including surprising, visual emotional content, as well...
Abstract The ability to convey relevant and faithful information is critical for many tasks in conditional generation yet remains elusive neural seq-to-seq models whose outputs often reveal hallucinations fail correctly cover important details. In this work, we advocate planning as a useful intermediate representation rendering less opaque more grounded. We propose new conceptualization of text plans sequence question-answer (QA) pairs enhance existing datasets (e.g., summarization) with QA...
Sentiment analysis is pivotal in extracting insights from textual data, enabling organizations to understand customer opinions, market trends, and brand perception. This study introduces a novel approach, SentimentLP, which integrates Leptotila optimization (LPO) with gradient boosting machines (GBM) for sentiment tasks. The proposed framework leverages LPO’s dynamic capabilities enhance GBM models’ performance classification. Through iterative refinement adaptive learning, SentimentLP...
Matt Grenander, Yue Dong, Jackie Chi Kit Cheung, Annie Louis. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint (EMNLP-IJCNLP). 2019.
We revisit a pragmatic inference problem in dialog: Understanding indirect responses to questions. Humans can interpret ‘I’m starving.’ response ‘Hungry?’, even without direct cue words such as ‘yes’ and ‘no’. In dialog systems, allowing natural rather than closed vocabularies would be similarly beneficial. However, today’s systems are only sensitive these moves their language model allows. create release the first large-scale English corpus ‘Circa’ with 34,268 (polar question, answer) pairs...
Hannah Rohde, Anna Dickinson, Nathan Schneider, Christopher N. L. Clark, Annie Louis, Bonnie Webber. Proceedings of the 10th Linguistic Annotation Workshop held in conjunction with ACL 2016 (LAW-X 2016). 2016.
General Video Game Playing (GVGP) algorithms are usually focused on winning and maximizing score but combining different objectives could turn out to be a solution that has not been deeply investigated yet. This paper presents the results obtained when five GVGP agents play set of games using heuristics with objectives: winning, exploration, discovery elements presented in game (and interactions them) acquisition knowledge order accurately estimate outcome each possible interaction. The show...
Online forum discussions proceed differently from face-to-face conversations and any single thread on an online contains posts different subtopics.This work aims to characterize the content of a as conversation tree topics.We present models that jointly perform two tasks: segment into subparts, assign topic each part.Our core idea is definition structure using probabilistic grammars.By leveraging flexibility grammar formalisms, Context-Free Grammars Linear Rewriting Systems, our create...
In order to summarize a document, it is often useful have background set of documents from the domain serve as reference for determining new and important information in input document.We present model based on Bayesian surprise which provides an intuitive way identify surprising summarization with respect corpus.Specifically, method quantifies degree pieces change one's beliefs' about world represented background.We develop systems generic update this idea.Our competitive content selection...