NFDI4DS | UHH-SEMS - Publication Details

André Greiner-Petter

ORCID: 0000-0002-5828-5497

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5029478950

Research Areas

Mathematics, Computing, and Information Processing
Natural Language Processing Techniques
Topic Modeling
Advanced Database Systems and Queries
Open Education and E-Learning
Algorithms and Data Compression
Digital Humanities and Scholarship
Scientific Computing and Data Management
Advanced Text Analysis Techniques
Semantic Web and Ontologies
Handwritten Text Recognition Techniques
Computational Physics and Python Applications
Educational Technology and Assessment
Speech Recognition and Synthesis
Biomedical Text Mining and Ontologies
Wikis in Education and Collaboration
Computational and Text Analysis Methods
Educational Assessment and Pedagogy
Research Data Management Practices
Distributed and Parallel Computing Systems
Data Quality and Management
Academic integrity and plagiarism
Intelligent Tutoring Systems and Adaptive Learning
Advanced Data Storage Technologies

University of Göttingen
2023-2024

Stanford University
2023

University of Wuppertal
2019-2022

National Institute of Informatics
2020

University of Konstanz
2018-2019

Technische Universität Berlin
2017

Math-word embedding in math search and semantic extraction

OPENALEX - Publications

André Greiner-Petter Abdou Youssef Terry Ruas Bruce R. Miller Moritz Schubotz and 2 more

Abstract Word embedding, which represents individual words with semantically fixed-length vectors, has made it possible to successfully apply deep learning natural language processing tasks such as semantic role-modeling, question answering, and machine translation. As math text consists of text, well expressions that similarly exhibit linear correlation contextual characteristics, word embedding techniques can also be applied documents. However, while mathematics is a precise accurate...

10.1007/s11192-020-03502-9 article EN cc-by Scientometrics 2020-06-09

Improving the Representation and Conversion of Mathematical Formulae by Considering their Textual Context

OPENALEX - Publications

Moritz Schubotz André Greiner-Petter Philipp Scharpf Norman Meuschke Howard S. Cohl and 1 more

Mathematical formulae represent complex semantic information in a concise form. Especially Science, Technology, Engineering, and Mathematics, mathematical are crucial to communicate information, e.g., scientific papers, perform computations using computer algebra systems. Enabling computers access the encoded requires machine-readable formats that can both presentation content, i.e., semantics, of formulae. Exchanging such between systems additionally conversion methods for representation...

10.1145/3197026.3197058 preprint EN 2018-05-23

Discovering Mathematical Objects of Interest—A Study of Mathematical Notations

OPENALEX - Publications

André Greiner-Petter Moritz Schubotz F. Müller Corinna Breitinger Howard S. Cohl and 2 more

Mathematical notation, i.e., the writing system used to communicate concepts in mathematics, encodes valuable information for a variety of search and retrieval systems. Yet, mathematical notations remain mostly unutilized by today's In this paper, we present first in-depth study on distributions notation two large scientific corpora: open access arXiv (2.5B objects) reviewing service pure applied mathematics zbMATH (61M objects). Our lays foundation future research projects corpora. Further,...

10.1145/3366423.3380218 preprint EN 2020-04-20

SKT5SciSumm -- A Hybrid Generative Approach for Multi-Document Scientific Summarization

OPENALEX - Publications

Huy Quoc To Hung-Nghiep Tran André Greiner-Petter Felix Beierle Akiko Aizawa

Summarization for scientific text has shown significant benefits both the research community and human society. Given fact that nature of is distinctive input multi-document summarization task substantially long, requires sufficient embedding generation truncation without losing important information. To tackle these issues, in this paper, we propose SKT5SciSumm - a hybrid framework (MDSS). We leverage Sentence-Transformer version Scientific Paper Embeddings using Citation-Informed...

10.48550/arxiv.2402.17311 preprint EN arXiv (Cornell University) 2024-02-27

Semantic preserving bijective mappings for expressions involving special functions between computer algebra systems and document preparation systems

OPENALEX - Publications

André Greiner-Petter Moritz Schubotz Howard S. Cohl Béla Gipp

Purpose: Modern mathematicians and scientists of math-related disciplines often use Document Preparation Systems (DPS) to write Computer Algebra (CAS) calculate mathematical expressions. Usually, they translate the expressions manually between DPS CAS. This process is time-consuming error-prone. Our goal automate this translation. paper uses Maple Mathematica as CAS, LaTeX our DPS. Design/methodology/approach: Bruce Miller at National Institute Standards Technology (NIST) developed a...

10.1108/ajim-08-2018-0185 article EN Aslib Journal of Information Management 2019-05-20

Can LLMs Master Math? Investigating Large Language Models on Math Stack Exchange

OPENALEX - Publications

Ankit Satpute Noah Gießing André Greiner-Petter Moritz Schubotz Olaf Teschke and 2 more

10.1145/3626772.3657945 article EN Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval 2024-07-10

TEIMMA: The First Content Reuse Annotator for Text, Images, and Math

OPENALEX - Publications

Ankit Satpute André Greiner-Petter Moritz Schubotz Norman Meuschke Akiko Aizawa and 2 more

This demo paper presents the first tool to annotate reuse of text, images, and mathematical formulae in a document pair-TEIMMA. Annotating content is particularly useful develop plagiarism detection algorithms. Real-world often obfuscated, which makes it challenging identify such cases. TEIMMA allows entering obfuscation type enable novel classifications for confirmed cases plagiarism. It enables recording different types HTML supports users by visualizing pair using similarity methods text math.

10.1109/jcdl57899.2023.00056 article EN 2023-06-01

Do the Math: Making Mathematics in Wikipedia Computable

OPENALEX - Publications

André Greiner-Petter Moritz Schubotz Corinna Breitinger Philipp Scharpf Akiko Aizawa and 1 more

Wikipedia combines the power of AI solutions and human reviewers to safeguard article quality. Quality control objectives include detecting malicious edits, fixing typos, spotting inconsistent formatting. However, no automated quality mechanisms currently exist for mathematical formulae. Spell checkers are widely used highlight textual errors, yet equivalent tool exists detect algebraically incorrect Our paper addresses this shortcoming by making formulae computable. We present a method that...

10.1109/tpami.2022.3195261 article EN cc-by IEEE Transactions on Pattern Analysis and Machine Intelligence 2022-08-02

Neural Machine Translation for Mathematical Formulae

OPENALEX - Publications

Felix Petersen Moritz Schubotz André Greiner-Petter Béla Gipp

We tackle the problem of neural machine translation mathematical formulae between ambiguous presentation languages and unambiguous content languages. Compared to on natural language, have a much smaller vocabulary longer sequences symbols, while their requires extreme precision satisfy information needs. In this work, we perform tasks translating from LaTeX Mathematica as well semantic LaTeX. While recurrent, recursive, transformer networks struggle with preserving all contained information,...

10.18653/v1/2023.acl-long.645 article EN cc-by 2023-01-01

Mathematical Formulae in Wikimedia Projects 2020

OPENALEX - Publications

Moritz Schubotz André Greiner-Petter Norman Meuschke Olaf Teschke Béla Gipp

This poster summarizes our contributions to Wikimedia's processing pipeline for mathematical formulae. We describe how we have supported the transition from rendering formulae as course-grained PNG images in 2001 providing modern semantically enriched language-independent MathML 2020. Additionally, plans improve accessibility and discoverability of knowledge Wikimedia projects further.

10.1145/3383583.3398557 preprint EN 2020-08-01

The Promises and Pitfalls of LLM Annotations in Dataset Labeling: a Case Study on Media Bias Detection

OPENALEX - Publications

Tomáš Horych Christian W. Mandl Terry Ruas André Greiner-Petter Béla Gipp and 2 more

High annotation costs from hiring or crowdsourcing complicate the creation of large, high-quality datasets needed for training reliable text classifiers. Recent research suggests using Large Language Models (LLMs) to automate process, reducing these while maintaining data quality. LLMs have shown promising results in annotating downstream tasks like hate speech detection and political framing. Building on success areas, this study investigates whether are viable complex task media bias a...

10.48550/arxiv.2411.11081 preprint EN arXiv (Cornell University) 2024-11-17

Taxonomy of Mathematical Plagiarism

OPENALEX - Publications

Ankit Satpute André Greiner-Petter Noah Gießing Isabel Beckenbach Moritz Schubotz and 3 more

Plagiarism is a pressing concern, even more so with the availability of large language models. Existing plagiarism detection systems reliably find copied and moderately reworded text but fail for idea plagiarism, especially in mathematical science, which heavily uses formal notation. We make two contributions. First, we establish taxonomy content reuse by annotating potentially plagiarised 122 scientific document pairs. Second, analyze best-performing approaches to detect similarity on newly...

10.1007/978-3-031-56066-8_2 preprint EN arXiv (Cornell University) 2024-01-30

Multi-Task Media-Bias Analysis Generalization for Pre-Trained Identification of Expressions

OPENALEX - Publications

Tomáš Horych Martin Wessel Jan Philip Wahle Terry Ruas Jerome Waßmuth and 4 more

Media bias detection poses a complex, multifaceted problem traditionally tackled using single-task models and small in-domain datasets, consequently lacking generalizability. To address this, we introduce MAGPIE, the first large-scale multi-task pre-training approach explicitly tailored for media detection. enable at scale, present Large Bias Mixture (LBM), compilation of 59 bias-related tasks. MAGPIE outperforms previous approaches in on Annotation By Experts (BABE) dataset, with relative...

10.48550/arxiv.2403.07910 preprint EN arXiv (Cornell University) 2024-02-26

Can LLMs Master Math? Investigating Large Language Models on Math Stack Exchange

OPENALEX - Publications

Ankit Satpute Noah Gießing André Greiner-Petter Moritz Schubotz Olaf Teschke and 2 more

Large Language Models (LLMs) have demonstrated exceptional capabilities in various natural language tasks, often achieving performances that surpass those of humans. Despite these advancements, the domain mathematics presents a distinctive challenge, primarily due to its specialized structure and precision it demands. In this study, we adopted two-step approach for investigating proficiency LLMs answering mathematical questions. First, employ most effective LLMs, as identified by their...

10.48550/arxiv.2404.00344 preprint EN arXiv (Cornell University) 2024-03-30

Why Machines Cannot Learn Mathematics, Yet

OPENALEX - Publications

André Greiner-Petter Terry Ruas Moritz Schubotz Akiko Aizawa William I. Grosky and 1 more

Nowadays, Machine Learning (ML) is seen as the universal solution to improve effectiveness of information retrieval (IR) methods. However, while mathematics a precise and accurate science, it usually expressed by less imprecise descriptions, contributing relative dearth machine learning applications for IR in this domain. Generally, mathematical documents communicate their knowledge with an ambiguous, context-dependent, non-formal language. Given recent advances ML, seems canonical apply ML...

10.48550/arxiv.1905.08359 preprint EN other-oa arXiv (Cornell University) 2019-01-01

Coming Soon ...