Andreas Niekler

ORCID: 0000-0002-3036-3318
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Topic Modeling
  • Natural Language Processing Techniques
  • Computational and Text Analysis Methods
  • Advanced Text Analysis Techniques
  • Machine Learning and Algorithms
  • Semantic Web and Ontologies
  • Digital Humanities and Scholarship
  • Wikis in Education and Collaboration
  • linguistics and terminology studies
  • Language and cultural evolution
  • Data Mining Algorithms and Applications
  • Hate Speech and Cyberbullying Detection
  • Mineral Processing and Grinding
  • Data Visualization and Analytics
  • Complex Network Analysis Techniques
  • Data Analysis with R
  • Linguistic research and analysis
  • Economic and Social Issues
  • Web Data Mining and Analysis
  • Biomedical Text Mining and Ontologies
  • Sustainability and Climate Change Governance
  • Geographic Information Systems Studies
  • Machine Learning and Data Classification
  • Machine Learning in Materials Science
  • AI in Service Interactions

Leipzig University
2014-2023

Cal Humanities
2022

Institut für Informationsverarbeitung
2017

Leipzig/Halle Airport
2015

Leipzig University of Applied Sciences
2012

Latent Dirichlet allocation (LDA) topic models are increasingly being used in communication research. Yet, questions regarding reliability and validity of the approach have received little attention thus far. In applying LDA to textual data, researchers need tackle at least four major challenges that affect these criteria: (a) appropriate pre-processing text collection; (b) adequate selection model parameters, including number topics be generated; (c) evaluation model's reliability; (d)...

10.1080/19312458.2018.1430754 article EN Communication Methods and Measures 2018-02-16

Active learning is the iterative construction of a classification model through targeted labeling, enabling significant labeling cost savings. As most research on active has been carried out before transformer-based language models (“transformers”) became popular, despite its practical importance, comparably few papers have investigated how transformers can be combined with to date. This attributed fact that using state-of-the-art query strategies for induces prohibitive runtime overhead,...

10.18653/v1/2022.findings-acl.172 article EN cc-by Findings of the Association for Computational Linguistics: ACL 2022 2022-01-01

Natural language processing (NLP) and neural networks (NNs) have both undergone significant changes in recent years. For active learning (AL) purposes, NNs are, however, less commonly used -- despite their current popularity. By using the superior text classification performance of for AL, we can either increase a model's same amount data or reduce therefore required annotation efforts while keeping performance. We review AL deep (DNNs) elaborate on two main causes which to hinder adoption:...

10.48550/arxiv.2008.07267 preprint EN cc-by arXiv (Cornell University) 2020-01-01

The increasing use of text as data in environmental research offers valuable opportunities, but the inherent biases within textual sources like news, social media, or disaster reports necessitate moving beyond purely descriptive analyses. While NLP techniques topic modeling and categorical annotations can identify emergent patterns, they often fail to elucidate underlying causal mechanisms driving observed phenomena, especially complex interplay anthropogenic activities, societal structures,...

10.5194/egusphere-egu25-4468 preprint EN 2025-03-14

Topic modeling enables researchers to explore large document corpora. Large corpora, however, can be extremely costly model in terms of time and computing resources. In order circumvent this problem, two techniques have been suggested: (1) random samples, (2) prune the vocabulary corpus. Although frequently applied, there has no systematic inquiry into how application these affects respective models. Using three empirical corpora with different characteristics (news articles, websites,...

10.5117/ccr2020.2.001.maie article EN cc-by Computational Communication Research 2020-10-01

Christopher Schröder, Lydia Müller, Andreas Niekler, Martin Potthast. Proceedings of the 17th Conference European Chapter Association for Computational Linguistics: System Demonstrations. 2023.

10.18653/v1/2023.eacl-demo.11 preprint EN cc-by 2023-01-01

Schon immer wurde und wird in der Politikwissenschaft uber die qualitativen Entwicklungen westlicher Demokratien kritisch diskutiert, wobei noch vergleichsweise jungen Debatte Postdemokratie auch auserhalb wissenschaftlicher Zirkel eine grose Aufmerksamkeit zuteilgeworden ist. Dies ist unter anderem darauf zuruckzufuhren, dass Diagnose alltagliche, intuitiv teilbare Beobachtungen zusammengefuhrt werden, sich wie folgt zusammenfassen lassen: Politische Entscheidungen kamen zunehmend dem...

10.3224/zpth.v4i1.13868 article DE ZPTh – Zeitschrift für Politische Theorie 2013-09-09

This paper presents the "Leipzig Corpus Miner", a technical infrastructure for supporting qualitative and quantitative content analysis. The aims at integration of 'close reading' procedures on individual documents with 'distant reading', e.g. lexical characteristics large document collections. Therefore information retrieval systems, lexicometric statistics machine learning are combined in coherent framework which enables data analysts to make use state-of-the-art Natural Language...

10.48550/arxiv.1707.03253 preprint EN other-oa arXiv (Cornell University) 2017-01-01

We present the IR Anthology, a corpus of information retrieval publications accessible via metadata browser and full-text search engine. Following example well-known ACL Anthology serves as hub for researchers interested in retrieval. Our engine ChatNoir indexes publications' full texts, enabling focused linking users to respective publisher's site personal access. Listing more than 40,000 at time writing, can be freely accessed https://IR.webis.de.

10.1145/3404835.3462798 article EN Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval 2021-07-11

This paper presents a procedure to retrieve subsets of relevant documents from large text collections for Content Analysis, e.g. in social sciences. Document retrieval this purpose needs take account the fact that analysts often cannot describe their research objective with small set key terms, especially when dealing theoretical or rather abstract interests. Instead, it is much easier define paradigmatic which reflect topics interest as well targeted manner speech. Thus, contrast classic...

10.48550/arxiv.1707.03217 preprint EN other-oa arXiv (Cornell University) 2017-01-01

Terms in diachronic text corpora may exhibit a high degree of semantic dynamics that is only partially captured by the common notion change. The new measure context volatility we propose models which terms change collection over time. computation for word relies on significance-values its co-occurrent and corresponding co-occurrence ranks sequential time spans. We define baseline present an efficient computational approach order to overcome problems related issues data structure. Results are...

10.5220/0006574001350143 article EN cc-by-nc-nd 2017-01-01

In terminology work, natural language processing, and digital humanities, several studies address the analysis of variations in context meaning terms order to detect semantic change evolution terms. We distinguish three different approaches describe contextual variations: methods based on patterns linguistic clues, exploring latent space single words, for topic membership. The paper presents notion volatility as a new measure detecting applies it key term extraction political science case...

10.48550/arxiv.1707.03255 preprint EN other-oa arXiv (Cornell University) 2017-01-01

We introduce small-text, an easy-to-use active learning library, which offers pool-based for single- and multi-label text classification in Python. It features numerous pre-implemented state-of-the-art query strategies, including some that leverage the GPU. Standardized interfaces allow combination of a variety classifiers, stopping criteria, facilitating quick mix match, enabling rapid convenient development both experiments applications. With objective making various classifiers strategies...

10.48550/arxiv.2107.10314 preprint EN other-oa arXiv (Cornell University) 2021-01-01

Christopher Schröder, Kim Bürgl, Yves Annanias, Andreas Niekler, Lydia Müller, Daniel Wiegreffe, Christian Bender, Christoph Mengs, Gerik Scheuermann, Gerhard Heyer. Proceedings of the 59th Annual Meeting Association for Computational Linguistics and 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2021.

10.18653/v1/2021.acl-long.320 preprint EN cc-by 2021-01-01
Coming Soon ...