NFDI4DS | UHH-SEMS - Publication Details

Sandra Kübler

ORCID: 0000-0003-0885-5436

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5009027177

Research Areas

Natural Language Processing Techniques
Topic Modeling
Speech and dialogue systems
Text Readability and Simplification
Semantic Web and Ontologies
Authorship Attribution and Profiling
Sentiment Analysis and Opinion Mining
Hate Speech and Cyberbullying Detection
Advanced Text Analysis Techniques
Speech Recognition and Synthesis
Spam and Phishing Detection
Text and Document Classification Technologies
Biomedical Text Mining and Ontologies
Translation Studies and Practices
Algorithms and Data Compression
Syntax, Semantics, Linguistic Variation
Lexicography and Language Studies
Multimodal Machine Learning Applications
Misinformation and Its Impacts
Linguistic Variation and Morphology
Handwritten Text Recognition Techniques
linguistics and terminology studies
Language and cultural evolution
Digital Humanities and Scholarship
Domain Adaptation and Few-Shot Learning

Indiana University
2015-2024

Indiana University Bloomington
2014-2024

Université du Québec à Montréal
2023

Tokyo University of Foreign Studies
2023

Institut Alfred Fournier
2023

Institut de Cancérologie de l'Ouest
2019

University of Colorado System
2017

Association for Computational Linguistics
2013

University of Tübingen
2000-2007

Mercator Institute for China Studies
1998

MaltParser: A language-independent system for data-driven dependency parsing

OPENALEX - Publications

Joakim Nivre Johan Hall Jens Nilsson Atanas Chanev Gülşen Eryiğit and 3 more

Parsing unrestricted text is useful for many language technology applications but requires parsing methods that are both robust and efficient. MaltParser a language-independent system data-driven dependency can be used to induce parser new from treebank sample in simple yet flexible manner. Experimental evaluation confirms achieve robust, efficient accurate wide range of languages without language-specific enhancements with rather limited amounts training data.

10.1017/s1351324906004505 article EN Natural Language Engineering 2007-01-12

SAMAR: Subjectivity and sentiment analysis for Arabic social media

OPENALEX - Publications

Muhammad Abdul-Mageed Mona Diab Sandra Kübler

10.1016/j.csl.2013.03.001 article EN Computer Speech & Language 2013-03-26

CoNLL-SIGMORPHON 2017 Shared Task: Universal Morphological Reinflection in 52 Languages

OPENALEX - Publications

Ryan Cotterell Christo Kirov John Sylak-Glassman Géraldine Walther Ekaterina Vylomova and 6 more

Ryan Cotterell, Christo Kirov, John Sylak-Glassman, Géraldine Walther, Ekaterina Vylomova, Patrick Xia, Manaal Faruqui, Sandra Kübler, David Yarowsky, Jason Eisner, Mans Hulden. Proceedings of the CoNLL SIGMORPHON 2017 Shared Task: Universal Morphological Reinflection. 2017.

10.18653/v1/k17-2001 article EN cc-by 2017-01-01

Dependency Parsing

OPENALEX - Publications

Sandra Kübler Ryan McDonald Joakim Nivre

Dependency-based methods for syntactic parsing have become increasingly popular in natural language processing recent years. This book gives a thorough introduction to the that are most widely used today. After an dependency grammar and parsing, followed by formal characterization of problem, surveys three major classes models current use: transition-based, graph-based, grammar-based models. It continues with chapter on evaluation one comparison different methods, it closes few words trends...

10.2200/s00169ed1v01y200901hlt002 article EN Synthesis lectures on human language technologies 2009-01-01

Overview of the SPMRL 2013 Shared Task: A Cross-Framework Evaluation of Parsing Morphologically Rich Languages

OPENALEX - Publications

Djamé Seddah Reut Tsarfaty Sandra Kübler Marie Candito Jinho D. Choi and 18 more

This paper reports on the first shared task statistical parsing of morphologically rich languages (MRLs). The features data sets from nine languages, each available both in constituency and dependency annotation. We report preparation sets, proposed scenarios, evaluation metrics for MRLs given different representation types. present analyze results obtained by participants, then provide an analysis comparison parsers across frameworks, reported gold input as well more realistic scenarios.

10.18653/v1/w13-4917 preprint EN 2013-01-01

UniMorph 2.0: Universal Morphology

OPENALEX - Publications

Christo Kirov Ryan Cotterell John Sylak-Glassman Géraldine Walther Ekaterina Vylomova and 8 more

The Universal Morphology UniMorph project is a collaborative effort to improve how NLP handles complex morphology across the world's languages. releases annotated morphological data using universal tagset, schema. Each inflected form associated with lemma, which typically carries its underlying lexical meaning, and bundle of features from our Additional supporting tools are also released on per-language basis when available. based at Center for Language Speech Processing (CLSP) Johns Hopkins...

10.48550/arxiv.1810.11101 preprint EN other-oa arXiv (Cornell University) 2018-01-01

OCNLI: Original Chinese Natural Language Inference

OPENALEX - Publications

Hai Hu Kyle Richardson Liang Xu Lu Li Sandra Kübler and 1 more

Despite the tremendous recent progress on natural language inference (NLI), driven largely by large-scale investment in new datasets (e.g.,SNLI, MNLI) and advances modeling, most has been limited to English due a lack of reliable for world's languages. In this paper, we present first NLI dataset (consisting ~56,000 annotated sentence pairs) Chinese called Original Natural Language Inference (OCNLI). Unlike attempts at extending other languages, our does not rely any automatic translation or...

10.18653/v1/2020.findings-emnlp.314 article EN cc-by 2020-01-01

Context in abusive language detection: On the interdependence of context and annotation of user comments

OPENALEX - Publications

H. H. Silva López Sandra Kübler

10.1016/j.dcm.2024.100848 article EN Discourse Context & Media 2025-01-07

IUCL at SemEval-2016 Task 6: An Ensemble Model for Stance Detection in Twitter

OPENALEX - Publications

Can Liu Wen Li Bradford Demarest Yue Chen Sara Couture and 7 more

Can Liu, Wen Li, Bradford Demarest, Yue Chen, Sara Couture, Daniel Dakota, Nikita Haduong, Noah Kaufman, Andrew Lamont, Manan Pancholi, Kenneth Steimel, Sandra Kübler. Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016). 2016.

10.18653/v1/s16-1064 article EN cc-by Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022) 2016-01-01

UM-IU@LING at SemEval-2019 Task 6: Identifying Offensive Tweets Using BERT and SVMs

OPENALEX - Publications

Jian Zhu Zuoyu Tian Sandra Kübler

This paper describes the UM-IU@LING’s system for SemEval 2019 Task 6: Offens-Eval. We take a mixed approach to identify and categorize hate speech in social media. In subtask A, we fine-tuned BERT based classifier detect abusive content tweets, achieving macro F1 score of 0.8136 on test data, thus reaching 3rd rank out 103 submissions. subtasks B C, used linear SVM with selected character n-gram features. For our could target abuse 0.5243, ranking it 27th 65

10.18653/v1/s19-2138 preprint EN cc-by 2019-01-01

Is it really that difficult to parse German?

OPENALEX - Publications

Sandra Kübler Erhard Hinrichs Wolfgang Maier

This paper presents a comparative study of probabilistic treebank parsing German, using the Negra and TüBa-D/Z tree-banks. Experiments with Stanford parser, which uses factored PCFG dependency model, show that, contrary to previous claims for other parsers, lexicalization models boosts performance both treebanks. The experiments also that there is big difference in performance, when trained on Parser are comparable results English Penn treebank. comparison at least suggests German not harder...

10.3115/1610075.1610093 article EN 2006-01-01

A unified representation for morphological, syntactic, semantic, and referential annotations

OPENALEX - Publications

Erhard Hinrichs Sandra Kübler Karin Naumann

This paper reports on the SYN-RA (SYNtax-based Reference Annotation) project, an on-going project of annotating German newspaper texts with referential relations. The has developed inventory anaphoric and coreference relations for in context a unified, XML-based annotation scheme combining morphological, syntactic, semantic, information. discusses how this unified relates to other formats currently discussed literature, particular graph model Bird Liberman (2001) pie-in-the-sky semantic annotation.

10.3115/1608829.1608832 article EN 2005-01-01

The PaGe 2008 shared task on parsing German

OPENALEX - Publications

Sandra Kübler

The ACL 2008 Workshop on Parsing German features a shared task parsing German. goal of the was to find reasons for radically different behavior parsers treebanks and between constituent dependency representations. In this paper, we describe data sets. addition, provide an overview test results first analysis.

10.3115/1621401.1621409 article EN 2008-01-01

Belief in White Replacement

OPENALEX - Publications

Casey Klofstad Olyvia R. Christley Amanda B. Diekman Sandra Kübler Adam Enders and 9 more

The "White Replacement" conspiracy theory, that governments and corporations are "replacing" white people, is linked to several mass shootings. Given its recent ubiquity in elite rhetoric, concerns have arisen about the popularity of this theory among United States public. Further, political scientists noted a need understand why people believe or act upon theory. Using 2022 US national survey (n = 2001), we find third Americans agree leaders replacing with color. These beliefs related...

10.1080/21565503.2024.2342834 article EN Politics Groups and Identities 2024-05-07

To use or not to use: Feature selection for sentiment analysis of highly imbalanced data

OPENALEX - Publications

Sandra Kübler Can Liu Zeeshan Ali Sayyed

Abstract We investigate feature selection methods for machine learning approaches in sentiment analysis. More specifically, we use data from the cooking platform Epicurious and attempt to predict ratings recipes based on user reviews. In such tasks, it is a common approach word or part-of-speech n -grams. This results large set of features, out which only small subset may be good indicators sentiment. One questions concerns extension binary classification setting multi-class problem. show...

10.1017/s1351324917000298 article EN Natural Language Engineering 2017-08-07

Coming Soon ...