- Topic Modeling
- Natural Language Processing Techniques
- Authorship Attribution and Profiling
- Semantic Web and Ontologies
- Hate Speech and Cyberbullying Detection
- Mechanics and Biomechanics Studies
- Biomedical Text Mining and Ontologies
- Multimodal Machine Learning Applications
- Robotic Mechanisms and Dynamics
- Engineering Technology and Methodologies
- Software Engineering Research
- Humor Studies and Applications
- Discourse Analysis and Cultural Communication
- Video Analysis and Summarization
- Anomaly Detection Techniques and Applications
- Social Media and Politics
- Language, Communication, and Linguistic Studies
- Handwritten Text Recognition Techniques
- Opinion Dynamics and Social Influence
- Ethics and Social Impacts of AI
- Advanced Graph Neural Networks
- Economic and Technological Systems Analysis
- Domain Adaptation and Few-Shot Learning
- Names, Identity, and Discrimination Research
- Machine Learning in Healthcare
Ural State University of Economics
2024
Bauman Moscow State Technical University
2013-2023
Huawei Technologies (China)
2023
Institute of Machines Science
2021
Microsoft Research (United Kingdom)
2021
University of Southern California
2020
University of Massachusetts Lowell
2015-2019
Saarland University
2018
Georgia Institute of Technology
2018
Universitat Politècnica de València
2018
Olga Kovaleva, Alexey Romanov, Anna Rogers, Rumshisky. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint (EMNLP-IJCNLP). 2019.
State of the art models using deep neural networks have become very good in learning an accurate mapping from inputs to outputs. However, they still lack generalization capabilities conditions that differ ones encountered during training. This is even more challenging specialized, and knowledge intensive domains, where training data limited. To address this gap, we introduce MedNLI - a dataset annotated by doctors, performing natural language inference task (NLI), grounded medical history...
This paper demonstrates the effectiveness of a Long Short-Term Memory language model in our initial efforts to generate unconstrained rap lyrics.The goal this is lyrics that are similar style given rapper, but not identical existing lyrics: task ghostwriting.Unlike previous work, which defines explicit templates for lyric generation, its own rhyme scheme, line length, and verse length.Our experiments show produces better "ghostwritten" than baseline model.
We present a large-scale study of gender bias in occupation classification, task where the use machine learning may lead to negative outcomes on peoples' lives. analyze potential allocation harms that can result from semantic representation bias. To do so, we impact classification including explicit indicators---such as first names and pronouns---in different representations online biographies. Additionally, quantify remains when these indicators are "scrubbed," describe proxy behavior...
This paper describes a new shared task for humor understanding that attempts to eschew the ubiquitous binary approach detection and focus on comparative ranking instead. The is based dataset of funny tweets posted in response hashtags, collected from 'Hashtag Wars' segment TV show @midnight. results are evaluated two subtasks require participants generate either correct pairwise comparisons (subtask A), or B) terms how they are. 7 teams participated subtask A, 5 B. best accuracy A was 0.675....
In this paper, we propose to use a set of simple, uniform in architecture LSTM-based models recover different kinds temporal relations from text. Using the shortest dependency path between entities as input, same is used extract intra-sentence, cross-sentence, and document creation time relations. A "double-checking" technique reverses entity pairs classification, boosting recall positive cases reducing misclassifications opposite classes. An efficient pruning algorithm resolves conflicts...
In order to determine argument structure in text, one must understand how individual components of the overall are linked. This work presents first neural network-based approach link extraction mining. Specifically, we propose a novel architecture that applies Pointer Network sequence-to-sequence attention modeling structural prediction discourse parsing tasks. We then develop joint model extends this simultaneously address task and classification components. The proposed achieves...
Alexey Romanov, Anna Rumshisky, Rogers, David Donahue. Proceedings of the 2019 Conference North American Chapter Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2019.
There is a growing body of work that proposes methods for mitigating bias in machine learning systems. These typically rely on access to protected attributes such as race, gender, or age. However, this raises two significant challenges: (1) may not be available it legal use them, and (2) often desirable simultaneously consider multiple attributes, well their intersections. In the context occupation classification, we propose method discouraging correlation between predicted probability an...
The paper proposes a novel machine learning-based approach to the pathfinding problem on extremely large graphs. This method leverages diffusion distance estimation via neural network and uses beam search for pathfinding. We demonstrate its efficiency by finding solutions 4x4x4 5x5x5 Rubik's cubes with unprecedentedly short solution lengths, outperforming all available solvers introducing first learning solver beyond 3x3x3 case. In particular, it surpasses every single case of combined best...
Alexey Romanov, Maria De-Arteaga, Hanna Wallach, Jennifer Chayes, Christian Borgs, Alexandra Chouldechova, Sahin Geyik, Krishnaram Kenthapadi, Anna Rumshisky, Adam Kalai. Proceedings of the 2019 Conference North American Chapter Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2019.
This paper describes the winning system for SemEval-2017 Task 6: #HashtagWars: Learning a Sense of Humor. Humor detection has up until now been predominantly addressed using feature-based approaches. Our utilizes recurrent deep learning methods with dense embeddings to predict humorous tweets from @midnight show #HashtagWars. In order include both meaning and sound in analysis, GloVe are combined novel phonetic representation serve as input an LSTM component. The output is character-based...
This paper addresses the problem of representation learning. Using an autoencoder framework, we propose and evaluate several loss functions that can be used as alternative to commonly cross-entropy reconstruction loss. The proposed use similarities between words in embedding space, train any neural model for text generation. We show introduced amplify semantic diversity reconstructed sentences, while preserving original meaning input. test derived autoencoder-generated representations on...
In this work, we present a new dataset for computational humor, specifically comparative humor ranking, which attempts to eschew the ubiquitous binary approach detection. The consists of tweets that are humorous responses given hashtag. We describe motivation dataset, as well collection process, includes description our semi-automated system data collection. also initial experiments using both unsupervised and supervised approaches. Our best achieved 63.7% accuracy, suggesting task is much...
This article deals with the composite threat-practices that change interlocutors' dispositions of emotional states within communicative performative construct threat and may extend space this construct. The allows us to study organize I-speaker I-hearer's possible generated by threats in combination additional elements menasive space. number points cognitive complexity threat-acts. These can soften effect on state strengthen their complex pragmatic effect.
This paper describes the SimiHawk system submission from UMass Lowell for core Semantic Textual Similarity task at SemEval-2016.We built four systems: a small featurebased that leverages word alignment and machine translation quality evaluation metrics, two end-to-end LSTM-based systems, an ensemble system.The LSTMbased systems used either simple LSTM architecture or Tree-LSTM structure.We found of three base feature-based model obtained best results, outperforming each model's correlation...
Language generation tasks that seek to mimic human ability use language creatively are difficult evaluate, since one must consider creativity, style, and other non-trivial aspects of the generated text. The goal this paper is develop evaluations methods for such task, ghostwriting rap lyrics, provide an explicit, quantifiable foundation goals future directions task. Ghostwriting produce text similar in style emulated artist, yet distinct content. We a novel evaluation methodology addresses...