NFDI4DS | UHH-SEMS - Publication Details

Torsten Zesch

ORCID: 0000-0002-9678-3825

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5072459546

Research Areas

Natural Language Processing Techniques
Topic Modeling
Text Readability and Simplification
Advanced Text Analysis Techniques
Semantic Web and Ontologies
Wikis in Education and Collaboration
Hate Speech and Cyberbullying Detection
Sentiment Analysis and Opinion Mining
Speech Recognition and Synthesis
Handwritten Text Recognition Techniques
Speech and dialogue systems
Web Data Mining and Analysis
Biomedical Text Mining and Ontologies
Text and Document Classification Technologies
Software Engineering Research
Image Processing and 3D Reconstruction
Intelligent Tutoring Systems and Adaptive Learning
Educational Technology and Assessment
Digital Communication and Language
Authorship Attribution and Profiling
Second Language Acquisition and Learning
Adversarial Robustness in Machine Learning
Music and Audio Processing
Educational Assessment and Pedagogy
Hand Gesture Recognition Systems

University of Hagen
2022-2025

Innovation Cluster (Canada)
2022

University of Duisburg-Essen
2014-2021

Dartmouth College
2021

University of Stuttgart
2021

Stockholm University
2021

Uppsala University
2021

East Stroudsburg University
2021

DIPF | Leibniz Institute for Research and Information in Education
2012-2016

Educational Testing Service
2015

Investigating neural architectures for short answer scoring

OPENALEX - Publications

Brian Riordan Andrea Horbach Aoife Cahill Torsten Zesch Chongmin Lee

Neural approaches to automated essay scoring have recently shown state-of-the-art performance. The task typically involves a broad notion of writing quality that encompasses content, grammar, organization, and conventions. This differs from the short answer content task, which focuses on accuracy. inputs neural models – ngrams embeddings are arguably well-suited evaluate in tasks. We investigate how several basic similar those used for perform scoring. show architectures can outperform...

10.18653/v1/w17-5017 article EN cc-by 2017-01-01

Wisdom of crowds versus wisdom of linguists – measuring the semantic relatedness of words

OPENALEX - Publications

Torsten Zesch Iryna Gurevych

Abstract In this article, we present a comprehensive study aimed at computing semantic relatedness of word pairs. We analyze the performance large number measures proposed in literature with respect to different experimental conditions, such as (i) datasets employed, (ii) language (English or German), (iii) underlying knowledge source, and (iv) evaluation task (computing scores relatedness, ranking pairs, solving choice problems). To our knowledge, is first systematically on properties,...

10.1017/s1351324909990167 article EN Natural Language Engineering 2009-09-09

Task-Independent Features for Automated Essay Grading

OPENALEX - Publications

Torsten Zesch Michael Wojatzki Dirk Scholten-Akoun

Automated scoring of student essays is increasingly used to reduce manual grading effort.State-of-the-art approaches use supervised machine learning which makes it complicated transfer a system trained on one task another.We investigate currently features are task-independent and evaluate their transferability English German datasets.We find that, by using our feature set, models better between tasks.We also that the works even tasks same type.

10.3115/v1/w15-0626 article EN cc-by 2015-01-01

Predicting the Difficulty of Language Proficiency Tests

OPENALEX - Publications

Lisa Beinborn Torsten Zesch Iryna Gurevych

Language proficiency tests are used to evaluate and compare the progress of language learners. We present an approach for automatic difficulty prediction C-tests that performs on par with human experts. On basis detailed analysis newly collected data, we develop a model C-test introducing four dimensions: solution difficulty, candidate ambiguity, inter-gap dependency, paragraph difficulty. show cues from all dimensions contribute

10.1162/tacl_a_00200 article EN cc-by Transactions of the Association for Computational Linguistics 2014-12-01

DKPro TC: A Java-based Framework for Supervised Learning Experiments on Textual Data

OPENALEX - Publications

Johannes Daxenberger Oliver Ferschke Iryna Gurevych Torsten Zesch

We present DKPro TC, a framework for supervised learning experiments on textual data.The main goal of TC is to enable researchers focus the actual research task behind problem and let handle rest.It enables rapid prototyping by relying an easy-to-use workflow engine standardized document preprocessing based Apache Unstructured Information Management Architecture (Ferrucci Lally, 2004).It ships with standard feature extraction modules, while at same time allowing user add customized...

10.3115/v1/p14-5011 article EN cc-by 2014-01-01

Towards better language representation in Natural Language Processing

OPENALEX - Publications

Arianna Masciolini Andrew Caines Orphée De Clercq Joni Kruijsbergen Murathan Kurfalı and 25 more

Abstract This paper introduces MultiGEC, a dataset for multilingual Grammatical Error Correction (GEC) in twelve European languages: Czech, English, Estonian, German, Greek, Icelandic, Italian, Latvian, Russian, Slovene, Swedish and Ukrainian. MultiGEC distinguishes itself from previous GEC datasets that it covers several underrepresented languages, which we argue should be included resources used to train models Natural Language Processing tasks which, as itself, have implications Learner...

10.1075/ijlcr.24033.mas article EN International Journal of Learner Corpus Research 2025-04-01

Automatically creating datasets for measures of semantic relatedness

OPENALEX - Publications

Torsten Zesch Iryna Gurevych

Semantic relatedness is a special form of linguistic distance between words. Evaluating semantic measures usually performed by comparison with human judgments. Previous test datasets had been created analytically and were limited in size. We propose corpus-based system for automatically creating datasets. Experiments subjects show that the resulting cover all degrees relatedness. As result approach, types lexical-semantic relations contain domain-specific words naturally occurring texts.

10.3115/1641976.1641980 article EN 2006-01-01

The Influence of Variance in Learner Answers on Automatic Content Scoring

OPENALEX - Publications

Andrea Horbach Torsten Zesch

Automatic content scoring is an important application in the area of automatic educational assessment. Short texts written by learners are scored based on their while spelling and grammar mistakes usually ignored. The difficulty automatically such varies with variance within learner answers. In this paper, we first discuss factors that influence answers, so practitioners can better estimate if might be applicable to usage scenario. We then compare two main paradigms scoring: (i)...

10.3389/feduc.2019.00028 article EN cc-by Frontiers in Education 2019-04-04

A survey of semantic relatedness evaluation datasets and procedures

OPENALEX - Publications

Mohamed Ali Hadj Taieb Torsten Zesch Mohamed Ben Aouicha

10.1007/s10462-019-09796-3 article EN Artificial Intelligence Review 2019-12-23

Reducing Annotation Efforts in Supervised Short Answer Scoring

OPENALEX - Publications

Torsten Zesch Michael Heilman Aoife Cahill

Automated short answer scoring is increasingly used to give students timely feedback about their learning progress.Building models comes with high costs, as stateof-the-art methods using supervised require large amounts of hand-annotated data.We analyze the potential recently proposed for semi-supervised based on clustering.We find that all examined (centroids, clusters, selected pure clusters) are mainly effective very answers and do not generalize well severalsentence responses.

10.3115/v1/w15-0615 article EN cc-by 2015-01-01

Automatic Generation of Challenging Distractors Using Context-Sensitive Inference Rules

OPENALEX - Publications

Torsten Zesch Oren Melamud

Automatically generating challenging distractors for multiple-choice gap-fill items is still an unsolved problem.We propose to employ context-sensitive lexical inference rules in order generate that are semantically similar the gap target word some sense, but not particular sense induced by context.We hypothesize such should be particularly hard distinguish from correct answer.We focus on verbs as they especially difficult master language learners and find our approach quite effective.In...

10.3115/v1/w14-1817 article EN cc-by 2014-01-01

The Role of Diacritics in Designing Lexical Recognition Tests for Arabic

OPENALEX - Publications

Osama Hamed Torsten Zesch

Lexical recognition tests are widely used to assess vocabulary knowledge. We investigate the role that diacritics play in designing an Arabic lexical test. compare a non-diacritized and diacritized test user study find they largely comparable their ability proficiency. However, we argue better suited control difficulty by allowing nonwords more targeted selection of word forms.

10.1016/j.procs.2017.10.100 article EN Procedia Computer Science 2017-01-01

Coming Soon ...