- Natural Language Processing Techniques
- Topic Modeling
- Speech and dialogue systems
- Text Readability and Simplification
- Semantic Web and Ontologies
- Authorship Attribution and Profiling
- Sentiment Analysis and Opinion Mining
- Hate Speech and Cyberbullying Detection
- Advanced Text Analysis Techniques
- Speech Recognition and Synthesis
- Spam and Phishing Detection
- Text and Document Classification Technologies
- Biomedical Text Mining and Ontologies
- Translation Studies and Practices
- Algorithms and Data Compression
- Syntax, Semantics, Linguistic Variation
- Lexicography and Language Studies
- Multimodal Machine Learning Applications
- Misinformation and Its Impacts
- Linguistic Variation and Morphology
- Handwritten Text Recognition Techniques
- linguistics and terminology studies
- Language and cultural evolution
- Digital Humanities and Scholarship
- Domain Adaptation and Few-Shot Learning
Indiana University
2015-2024
Indiana University Bloomington
2014-2024
Université du Québec à Montréal
2023
Tokyo University of Foreign Studies
2023
Institut Alfred Fournier
2023
Institut de Cancérologie de l'Ouest
2019
University of Colorado System
2017
Association for Computational Linguistics
2013
University of Tübingen
2000-2007
Mercator Institute for China Studies
1998
Parsing unrestricted text is useful for many language technology applications but requires parsing methods that are both robust and efficient. MaltParser a language-independent system data-driven dependency can be used to induce parser new from treebank sample in simple yet flexible manner. Experimental evaluation confirms achieve robust, efficient accurate wide range of languages without language-specific enhancements with rather limited amounts training data.
Ryan Cotterell, Christo Kirov, John Sylak-Glassman, Géraldine Walther, Ekaterina Vylomova, Patrick Xia, Manaal Faruqui, Sandra Kübler, David Yarowsky, Jason Eisner, Mans Hulden. Proceedings of the CoNLL SIGMORPHON 2017 Shared Task: Universal Morphological Reinflection. 2017.
Dependency-based methods for syntactic parsing have become increasingly popular in natural language processing recent years. This book gives a thorough introduction to the that are most widely used today. After an dependency grammar and parsing, followed by formal characterization of problem, surveys three major classes models current use: transition-based, graph-based, grammar-based models. It continues with chapter on evaluation one comparison different methods, it closes few words trends...
This paper reports on the first shared task statistical parsing of morphologically rich languages (MRLs). The features data sets from nine languages, each available both in constituency and dependency annotation. We report preparation sets, proposed scenarios, evaluation metrics for MRLs given different representation types. present analyze results obtained by participants, then provide an analysis comparison parsers across frameworks, reported gold input as well more realistic scenarios.
The Universal Morphology UniMorph project is a collaborative effort to improve how NLP handles complex morphology across the world's languages. releases annotated morphological data using universal tagset, schema. Each inflected form associated with lemma, which typically carries its underlying lexical meaning, and bundle of features from our Additional supporting tools are also released on per-language basis when available. based at Center for Language Speech Processing (CLSP) Johns Hopkins...
Despite the tremendous recent progress on natural language inference (NLI), driven largely by large-scale investment in new datasets (e.g.,SNLI, MNLI) and advances modeling, most has been limited to English due a lack of reliable for world's languages. In this paper, we present first NLI dataset (consisting ~56,000 annotated sentence pairs) Chinese called Original Natural Language Inference (OCNLI). Unlike attempts at extending other languages, our does not rely any automatic translation or...
Can Liu, Wen Li, Bradford Demarest, Yue Chen, Sara Couture, Daniel Dakota, Nikita Haduong, Noah Kaufman, Andrew Lamont, Manan Pancholi, Kenneth Steimel, Sandra Kübler. Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016). 2016.
This paper describes the UM-IU@LING’s system for SemEval 2019 Task 6: Offens-Eval. We take a mixed approach to identify and categorize hate speech in social media. In subtask A, we fine-tuned BERT based classifier detect abusive content tweets, achieving macro F1 score of 0.8136 on test data, thus reaching 3rd rank out 103 submissions. subtasks B C, used linear SVM with selected character n-gram features. For our could target abuse 0.5243, ranking it 27th 65
This paper presents a comparative study of probabilistic treebank parsing German, using the Negra and TüBa-D/Z tree-banks. Experiments with Stanford parser, which uses factored PCFG dependency model, show that, contrary to previous claims for other parsers, lexicalization models boosts performance both treebanks. The experiments also that there is big difference in performance, when trained on Parser are comparable results English Penn treebank. comparison at least suggests German not harder...
This paper reports on the SYN-RA (SYNtax-based Reference Annotation) project, an on-going project of annotating German newspaper texts with referential relations. The has developed inventory anaphoric and coreference relations for in context a unified, XML-based annotation scheme combining morphological, syntactic, semantic, information. discusses how this unified relates to other formats currently discussed literature, particular graph model Bird Liberman (2001) pie-in-the-sky semantic annotation.
The ACL 2008 Workshop on Parsing German features a shared task parsing German. goal of the was to find reasons for radically different behavior parsers treebanks and between constituent dependency representations. In this paper, we describe data sets. addition, provide an overview test results first analysis.
The "White Replacement" conspiracy theory, that governments and corporations are "replacing" white people, is linked to several mass shootings. Given its recent ubiquity in elite rhetoric, concerns have arisen about the popularity of this theory among United States public. Further, political scientists noted a need understand why people believe or act upon theory. Using 2022 US national survey (n = 2001), we find third Americans agree leaders replacing with color. These beliefs related...
Abstract We investigate feature selection methods for machine learning approaches in sentiment analysis. More specifically, we use data from the cooking platform Epicurious and attempt to predict ratings recipes based on user reviews. In such tasks, it is a common approach word or part-of-speech n -grams. This results large set of features, out which only small subset may be good indicators sentiment. One questions concerns extension binary classification setting multi-class problem. show...