NFDI4DS | UHH-SEMS - Publication Details

Moritz Schubotz

ORCID: 0000-0001-7141-4997

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5038664667

Research Areas

Mathematics, Computing, and Information Processing
Natural Language Processing Techniques
Topic Modeling
Semantic Web and Ontologies
Scientific Computing and Data Management
Open Education and E-Learning
Advanced Database Systems and Queries
Research Data Management Practices
Academic integrity and plagiarism
Advanced Text Analysis Techniques
Wikis in Education and Collaboration
Digital Humanities and Scholarship
Algorithms and Data Compression
Advanced Data Storage Technologies
Peer-to-Peer Network Technologies
Blockchain Technology Applications and Security
Educational Technology and Assessment
Distributed and Parallel Computing Systems
Caching and Content Delivery
Machine Learning and Data Classification
Intelligent Tutoring Systems and Adaptive Learning
Data Mining Algorithms and Applications
Big Data and Business Intelligence
Misinformation and Its Impacts
Data Quality and Management

FIZ Karlsruhe – Leibniz Institute for Information Infrastructure
2019-2024

University of Wuppertal
2018-2023

University of Göttingen
2021-2023

Stanford University
2023

University of Konstanz
2017-2022

Technische Informationsbibliothek (TIB)
2021

University of Michigan
2021

National Institute of Informatics
2018

Technische Universität Berlin
2011-2016

Moritz Klinik
2014

Design and evaluation of IPFS

OPENALEX - Publications

Dennis Trautwein Aravindh Raman Gareth Tyson Ignacio Castro W Scott and 3 more

Recent years have witnessed growing consolidation of web operations. For example, the majority traffic now originates from a few organizations, and even micro-websites often choose to host on large pre-existing cloud infrastructures. In response this, "Decentralized Web" attempts distribute ownership operation services more evenly. This paper describes design implementation largest most widely used Decentralized Web platform --- InterPlanetary File System (IPFS) an open-source,...

10.1145/3544216.3544232 preprint EN 2022-08-11

Semantification of Identifiers in Mathematics for Better Math Information Retrieval

OPENALEX - Publications

Moritz Schubotz А. В. Григорьев Marcus Leich Howard S. Cohl Norman Meuschke and 3 more

Mathematical formulae are essential in science, but face challenges of ambiguity, due to the use a small number identifiers represent an immense concepts. Corresponding word sense disambiguation Natural Language Processing, we disambiguate mathematical identifiers. By regarding and natural text as one monolithic information source, able extract semantics process term Processing (MLP). As scientific communities tend establish standard (identifier) notations, document domain infer actual...

10.1145/2911451.2911503 article EN Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval 2016-07-07

HyPlag

OPENALEX - Publications

Norman Meuschke Vincent Stange Moritz Schubotz Béla Gipp

Current plagiarism detection systems reliably find instances of copied and moderately altered text, but often fail to detect strong paraphrases, translations, the reuse non-textual content ideas. To improve upon capabilities for such concealed in academic publications, we make four contributions: i) We present first approach that combines analysis mathematical expressions, images, citations text. ii) describe implementation this hybrid research prototype HyPlag. iii) novel visualization...

10.1145/3209978.3210177 article EN 2018-06-27

Math-word embedding in math search and semantic extraction

OPENALEX - Publications

André Greiner-Petter Abdou Youssef Terry Ruas Bruce R. Miller Moritz Schubotz and 2 more

Abstract Word embedding, which represents individual words with semantically fixed-length vectors, has made it possible to successfully apply deep learning natural language processing tasks such as semantic role-modeling, question answering, and machine translation. As math text consists of text, well expressions that similarly exhibit linear correlation contextual characteristics, word embedding techniques can also be applied documents. However, while mathematics is a precise accurate...

10.1007/s11192-020-03502-9 article EN cc-by Scientometrics 2020-06-09

Evaluating Link-based Recommendations for Wikipedia

OPENALEX - Publications

Malte Schwarzer Moritz Schubotz Norman Meuschke Corinna Breitinger Volker Markl and 1 more

Literature recommender systems support users in filtering the vast and increasing number of documents digital libraries on Web. For academic literature, research has proven ability citation-based document similarity measures, such as Co-Citation (CoCit), or Proximity Analysis (CPA) to improve recommendation quality. In this paper, we report first large-scale investigation performance CPA approach generating literature recommendations for Wikipedia, which is fundamentally different from...

10.1145/2910896.2910908 article EN 2016-06-10

Improving the Representation and Conversion of Mathematical Formulae by Considering their Textual Context

OPENALEX - Publications

Moritz Schubotz André Greiner-Petter Philipp Scharpf Norman Meuschke Howard S. Cohl and 1 more

Mathematical formulae represent complex semantic information in a concise form. Especially Science, Technology, Engineering, and Mathematics, mathematical are crucial to communicate information, e.g., scientific papers, perform computations using computer algebra systems. Enabling computers access the encoded requires machine-readable formats that can both presentation content, i.e., semantics, of formulae. Exchanging such between systems additionally conversion methods for representation...

10.1145/3197026.3197058 preprint EN 2018-05-23

Improving Academic Plagiarism Detection for STEM Documents by Analyzing Mathematical Content and Citations

OPENALEX - Publications

Norman Meuschke Vincent Stange Moritz Schubotz Michael H. Kramer Béla Gipp

Identifying academic plagiarism is a pressing task for educational and research institutions, publishers, funding agencies. Current detection systems reliably find instances of copied moderately reworded text. However, detecting concealed plagiarism, such as strong paraphrases, translations, the reuse nontextual content ideas an open problem. In this paper, we extend our prior on analyzing mathematical citations. Both are promising approaches improving primarily in Science, Technology,...

10.1109/jcdl.2019.00026 preprint EN 2019-06-01

Pairwise Multi-Class Document Classification for Semantic Relations between Wikipedia Articles

OPENALEX - Publications

Malte Ostendorff Terry Ruas Moritz Schubotz Georg Rehm Béla Gipp

Many digital libraries recommend literature to their users considering the similarity between a query document and repository. However, they often fail distinguish what is relationship that makes two documents alike. In this paper, we model problem of finding as pairwise classification task. To find semantic relation documents, apply series techniques, such GloVe, Paragraph Vectors, BERT, XLNet under different configurations (e.g., sequence length, vector concatenation scheme), including...

10.1145/3383583.3398525 article EN 2020-08-01

Analyzing Mathematical Content to Detect Academic Plagiarism

OPENALEX - Publications

Norman Meuschke Moritz Schubotz Felix Hamborg Tomáš Skopal Béla Gipp

This paper presents, to our knowledge, the first study on analyzing mathematical expressions detect academic plagiarism. We make following contributions. First, we investigate confirmed cases of plagiarism categorize similarities content commonly found in plagiarized publications. From this investigation, derive possible feature selection and comparison strategies for developing math-based detection approaches a ground truth experiments. Second, create test collection by embedding into...

10.1145/3132847.3133144 article EN 2017-11-06

Challenges of Mathematical Information Retrievalin the NTCIR-11 Math Wikipedia Task

OPENALEX - Publications

Moritz Schubotz Abdou Youssef Volker Markl Howard S. Cohl

Mathematical Information Retrieval concerns retrieving information related to a particular mathematical concept. The NTCIR-11 Math Task develops an evaluation test collection for document sections retrieval of scientific articles based on human generated topics. Those topics involve combination formula patterns and keywords. In addition, the optional Wikipedia provides individual from search that contain exactly one pattern. We developed framework automatic query generation immediate...

10.1145/2766462.2767787 article EN Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval 2015-08-04

AnnoMathTeX - a formula identifier annotation recommender system for STEM documents

OPENALEX - Publications

Philipp Scharpf Ian Mackerracher Moritz Schubotz Joeran Beel Corinna Breitinger and 1 more

Documents from science, technology, engineering and mathematics (STEM) often contain a large number of mathematical formulae alongside text. Semantic search, recommender, question answering systems require the occurring formula constants variables (identifiers) to be disambiguated. We present first implementation recommender system that enables accelerates annotation by displaying most likely candidates for identifier names four different sources (arXiv, Wikipedia, Wikidata, or surrounding...

10.1145/3298689.3347042 article EN 2019-09-10

Discovering Mathematical Objects of Interest—A Study of Mathematical Notations

OPENALEX - Publications

André Greiner-Petter Moritz Schubotz F. Müller Corinna Breitinger Howard S. Cohl and 2 more

Mathematical notation, i.e., the writing system used to communicate concepts in mathematics, encodes valuable information for a variety of search and retrieval systems. Yet, mathematical notations remain mostly unutilized by today's In this paper, we present first in-depth study on distributions notation two large scientific corpora: open access arXiv (2.5B objects) reviewing service pure applied mathematics zbMATH (61M objects). Our lays foundation future research projects corpora. Further,...

10.1145/3366423.3380218 preprint EN 2020-04-20

Classification and Clustering of arXiv Documents, Sections, and Abstracts, Comparing Encodings of Natural and Mathematical Language

OPENALEX - Publications

Philipp Scharpf Moritz Schubotz Abdou Youssef Felix Hamborg Norman Meuschke and 1 more

In this paper, we show how selecting and combining encodings of natural mathematical language affect classification clustering documents with content. We demonstrate by using sets documents, sections, abstracts from the arXiv preprint server that are labeled their subject class (mathematics, computer science, physics, etc.) to compare different text formulae evaluate performance runtimes selected algorithms. Our achieve accuracies up 82.8% cluster purities 69.4% (number clusters equals...

10.1145/3383583.3398529 preprint EN 2020-08-01

Mathematical Language Processing Project

OPENALEX - Publications

Robert Pagael Moritz Schubotz

In natural language, words and phrases themselves imply the semantics. contrast, meaning of identifiers in mathematical formulae is undefined. Thus scientists must study context to decode meaning. The Mathematical Language Processing (MLP) project aims support that process. this paper, we compare two approaches discover identifier-definition tuples. At first use a simple pattern matching approach. Second, present MLP approach uses part-of-speech tag based distances as well sentence positions...

10.48550/arxiv.1407.0167 preprint EN other-oa arXiv (Cornell University) 2014-01-01

Extraction of Main Event Descriptors from News Articles by Answering the Journalistic Five W and One H Questions

OPENALEX - Publications

Felix Hamborg Corinna Breitinger Moritz Schubotz Soeren Lachnit Béla Gipp

The identification and extraction of the events that news articles report on is a commonly performed task in analysis workflow various projects analyze articles. However, due to lack universally usable publicly available methods for articles, many researchers must redundantly implement event be used within their projects. Answers journalistic five W one H questions (5W1H) describe main story, i.e., who did what, when, where, why, how. We propose Giveme5W1H, an open-source system uses...

10.1145/3197026.3203899 article EN 2018-05-23

Analyzing Semantic Concept Patterns to Detect Academic Plagiarism

OPENALEX - Publications

Norman Meuschke Nicolas Siebeck Moritz Schubotz Béla Gipp

Detecting academic plagiarism is a pressing problem, e.g., for educational and research institutions, funding agencies, publishers. Existing detection systems reliably identify copied text, or near copies of but often fail to detect disguised forms plagiarism, such as paraphrases, translations, idea plagiarism. We present Semantic Concept Pattern Analysis - an approach that performs integrated analysis semantic text relatedness structural similarity. Using 25 officially retracted cases, we...

10.1145/3127526.3127535 article EN 2017-12-15

Coming Soon ...