NFDI4DS | UHH-SEMS - Publication Details

Yevgeni Berzak

ORCID: 0000-0003-4474-1727

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5023679744

Research Areas

Natural Language Processing Techniques
Text Readability and Simplification
Topic Modeling
Second Language Acquisition and Learning
Multimodal Machine Learning Applications
Gaze Tracking and Assistive Technology
Educational Methods and Media Use
Reading and Literacy Development
Speech and dialogue systems
Semantic Web and Ontologies
Neurobiology of Language and Bilingualism
Digital Accessibility for Disabilities
Human Pose and Action Recognition
Advanced Image and Video Retrieval Techniques
Language, Metaphor, and Cognition
Video Analysis and Summarization
Technology-Enhanced Education Studies
Intelligent Tutoring Systems and Adaptive Learning
Educational Strategies and Epistemologies
Legal and Constitutional Studies
Language and cultural evolution
Handwritten Text Recognition Techniques
Educational Technology and Assessment
Subtitles and Audiovisual Media
Law, Economics, and Judicial Systems

Technion – Israel Institute of Technology
2022-2025

Institute of Cognitive and Brain Sciences
2019-2023

Cornell University
2022

Massachusetts Institute of Technology
2014-2022

PRG S&Tech (South Korea)
2021

University of Cambridge
2021

Saarland University
2011

Modeling Language Variation and Universals: A Survey on Typological Linguistics for Natural Language Processing

OPENALEX - Publications

Edoardo Maria Ponti Helen O’Horan Yevgeni Berzak Ivan Vulić Roi Reichart and 3 more

Linguistic typology aims to capture structural and semantic variation across the world’s languages. A large-scale could provide excellent guidance for multilingual Natural Language Processing (NLP), particularly languages that suffer from lack of human labeled resources. We present an extensive literature survey on use typological information in development NLP techniques. Our demonstrates date, existing databases has resulted consistent but modest improvements system performance. show this...

10.1162/coli_a_00357 article EN cc-by-nc-nd Computational Linguistics 2019-06-25

Universal Dependencies for Learner English

OPENALEX - Publications

Yevgeni Berzak Jessica Kenney Carolyn Spadine Jing Xian Wang Lucia L.C. Lam and 3 more

Yevgeni Berzak, Jessica Kenney, Carolyn Spadine, Jing Xian Wang, Lucia Lam, Keiko Sophie Mori, Sebastian Garza, Boris Katz. Proceedings of the 54th Annual Meeting Association for Computational Linguistics (Volume 1: Long Papers). 2016.

10.18653/v1/p16-1070 article EN cc-by Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2016-01-01

Survey on the Use of Typological Information in Natural Language Processing

OPENALEX - Publications

Helen O’Horan Yevgeni Berzak Ivan Vulić Roi Reichart Anna Korhonen

In recent years linguistic typology, which classifies the world's languages according to their functional and structural properties, has been widely used support multilingual NLP. While growing importance of typological information in supporting tasks recognised, no systematic survey existing resources use NLP published. This paper provides such a as well discussion we hope will both inform inspire future work area.

10.48550/arxiv.1610.03349 preprint EN other-oa arXiv (Cornell University) 2016-01-01

Anchoring and Agreement in Syntactic Annotations

OPENALEX - Publications

Yevgeni Berzak Yan Huang Andrei Barbu Anna Korhonen Boris Katz

We present a study on two key characteristics of human syntactic annotations: anchoring and agreement. Anchoring is well known cognitive bias in decision making, where judgments are drawn towards pre-existing values. the influence standard approach to creation resources annotations obtained via editing tagger parser output. Our experiments demonstrate clear effect reveal unwanted consequences, including overestimation parsing performance lower quality comparison with human-based annotations....

10.18653/v1/d16-1239 article EN cc-by Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2016-01-01

CELER: A 365-Participant Corpus of Eye Movements in L1 and L2 English Reading

OPENALEX - Publications

Yevgeni Berzak Chie Nakamura Amelia Smith Emily Weng Boris Katz and 2 more

We present CELER (Corpus of Eye Movements in L1 and L2 English Reading), a broad coverage eye-tracking corpus for English. comprises over 320,000 words, data from 365 participants. Sixty-nine participants are (first language) speakers, 296 (second speakers wide range proficiency levels five different native language backgrounds. As such, has an order magnitude more than any currently available eye movements dataset with readers. Each participant reads 156 newswire sentences the Wall Street...

10.1162/opmi_a_00054 article EN cc-by Open Mind 2022-01-01

The Effect of Text Simplification on Reading Fluency and Reading Comprehension in L1 English Speakers

OPENALEX - Publications

K Klein Omer Shubi Sergey Frenkel Yevgeni Berzak

Text simplification is a common practice for making texts easier to read and understand. To which extent does it achieve these goals, participant text characteristics drive benefits? In this work, we use eye tracking address questions the first time population of adult native (L1) English speakers. We find that 42\% readers exhibit reading facilitation effects, while only 2\% improve comprehension accuracy. further observe fluency benefits are larger slower less experienced readers, more...

10.31234/osf.io/dhk8c_v1 preprint EN 2025-02-12

Surprisal Takes It All: Eye Tracking Based Cognitive Evaluation of Text Readability Measures

OPENALEX - Publications

K Klein Sergey Frenkel Omer Shubi Yevgeni Berzak

Text readability measures are widely used in many real-world scenarios and NLP. These have primarily been developed by predicting reading comprehension outcomes, while largely neglecting what is perhaps the core aspect of a readable text: ease. In this work, we propose new eye tracking based methodology for evaluating measures, which focuses on their ability to account facilitation effects text simplification, as well ease more broadly. Using approach, find that existing formulas moderate...

10.48550/arxiv.2502.11150 preprint EN arXiv (Cornell University) 2025-02-16

D\'ej\`a Vu? Decoding Repeated Reading from Eye Movements

OPENALEX - Publications

Yoav Meiri Omer Shubi Cfir Avraham Hadar Ariel Kreisberg Nitzav Yevgeni Berzak

Be it your favorite novel, a newswire article, cooking recipe or an academic paper -- in many daily situations we read the same text more than once. In this work, ask whether is possible to automatically determine reader has previously encountered based on their eye movement patterns. We introduce two variants of task and address them with considerable success using both feature-based neural models. further general strategy for enhancing these models machine generated simulations movements...

10.48550/arxiv.2502.11061 preprint EN arXiv (Cornell University) 2025-02-16

OneStop: A 360-Participant English Eye Tracking Dataset with Different Reading Regimes

OPENALEX - Publications

Yevgeni Berzak Jonathan Malmaud Omer Shubi Yoav Meiri Ella Lion and 1 more

We present OneStop Eye Movements, a large-scale corpus of eye movements in reading, which native speakers read newswire texts English and answer reading comprehension questions. has 152 hours passage movement recordings from 360 participants for 2.6 million word tokens, more data than all the existing public broad coverage tracking datasets combined. The was collected extensively piloted materials comprising 486 questions auxiliary text annotations geared towards behavioral analyses...

10.31234/osf.io/kgxv5_v1 preprint EN 2025-02-24

OneStop: A 360-Participant English Eye Tracking Dataset with Different Reading Regimes

OPENALEX - Publications

Yevgeni Berzak Jonathan Malmaud Omer Shubi Yoav Meiri Ella Lion and 1 more

10.31234/osf.io/kgxv5_v2 preprint EN 2025-05-06

The Effect of Text Simplification on Reading Fluency and Reading Comprehension in L1 English Speakers

OPENALEX - Publications

K Klein Omer Shubi Sergey Frenkel Yevgeni Berzak

Text simplification is a common practice for making texts easier to read and understand. To which extent does it achieve these goals, participant text characteristics drive benefits? In this work, we use eye tracking address questions the first time population of adult native (L1) English speakers. We find that 42% readers exhibit reading facilitation effects, while only 2% improve comprehension accuracy. further observe fluency benefits are larger slower lessexperienced readers, more...

10.31234/osf.io/dhk8c_v2 preprint EN 2025-05-22

MultiplEYE: Creating a multilingual eye-tracking-while-reading corpus

OPENALEX - Publications

Deborah N. Jakobi Maja Stegenwallner-Schütz Nora Hollenstein Cui Ding Ramunė Kasperẹ and 70 more

10.1145/3715669.3726843 article EN 2025-05-24

Do You See What I Mean? Visual Resolution of Linguistic Ambiguities

OPENALEX - Publications

Yevgeni Berzak Andrei Barbu Daniel Harari Boris Katz Shimon Ullman

Understanding language goes hand in with the ability to integrate complex contextual information obtained via perception.In this work, we present a novel task for grounded understanding: disambiguating sentence given visual scene which depicts one of possible interpretations that sentence.To end, introduce new multimodal corpus containing ambiguous sentences, representing wide range syntactic, semantic and discourse ambiguities, coupled videos visualize different each sentence.We address by...

10.18653/v1/d15-1172 article EN cc-by Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2015-01-01

Bridging Information-Seeking Human Gaze and Machine Reading Comprehension

OPENALEX - Publications

Jonathan Malmaud Roger Lévy Yevgeni Berzak

In this work, we analyze how human gaze during reading comprehension is conditioned on the given question, and whether signal can be beneficial for machine comprehension. To end, collect a new eye-tracking dataset with large number of participants engaging in multiple choice task. Our analysis data reveals increased fixation times over parts text that are most relevant answering question. Motivated by finding, propose making automated more human-like mimicking information-seeking behavior We...

10.18653/v1/2020.conll-1.11 article EN cc-by 2020-01-01

STARC: Structured Annotations for Reading Comprehension

OPENALEX - Publications

Yevgeni Berzak Jonathan Malmaud Roger Lévy

We present STARC (Structured Annotations for Reading Comprehension), a new annotation framework assessing reading comprehension with multiple choice questions. Our introduces principled structure the answer choices and ties them to textual span annotations. The is implemented in OneStopQA, high-quality dataset evaluation analysis of English. use this demonstrate that can be leveraged key application development SAT-like materials: automatic quality probing via ablation experiments. further...

10.18653/v1/2020.acl-main.507 article EN cc-by 2020-01-01

Assessing Language Proficiency from Eye Movements in Reading

OPENALEX - Publications

Yevgeni Berzak Boris Katz Roger Lévy

Yevgeni Berzak, Boris Katz, Roger Levy. Proceedings of the 2018 Conference North American Chapter Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). 2018.

10.18653/v1/n18-1180 article EN cc-by 2018-01-01

Contrastive Analysis with Predictive Power: Typology Driven Estimation of Grammatical Error Distributions in ESL

OPENALEX - Publications

Yevgeni Berzak Roi Reichart Boris Katz

This work examines the impact of crosslinguistic transfer on grammatical errors in English as Second Language (ESL) texts.Using a computational framework that formalizes theory Contrastive Analysis (CA), we demonstrate language specific error distributions ESL writing can be predicted from typological properties native and their relation to typology English.Our driven model enables obtain accurate estimates such without access any data for target languages.Furthermore, present strategy...

10.18653/v1/k15-1010 article EN cc-by 2015-01-01

Predicting Native Language from Gaze

OPENALEX - Publications

Yevgeni Berzak Chie Nakamura Suzanne Flynn Boris Katz

A fundamental question in language learning concerns the role of a speaker's first second acquisition. We present novel methodology for studying this question: analysis eye-movement patterns reading free-form text. Using methodology, we demonstrate time that native English learners can be predicted from their gaze fixations when English. provide classifier uncertainty and learned features, which indicates differences are likely to rooted linguistic divergences across languages. The presented...

10.18653/v1/p17-1050 article EN cc-by Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2017-01-01

Reconstructing Native Language Typology from Foreign Language Usage

OPENALEX - Publications

Yevgeni Berzak Roi Reichart Boris Katz

This work was supported by the Center for Brains, Minds and Machines (CBMM), funded NSF STC award CCF - 1231216.

10.3115/v1/w14-1603 article EN 2014-01-01

Grounding language acquisition by training semantic parsers using captioned videos

OPENALEX - Publications

Candace Ross Andrei Barbu Yevgeni Berzak Battushig Myanganbayar Boris Katz

We develop a semantic parser that is trained in grounded setting using pairs of videos captioned with sentences. This both data-efficient, requiring little annotation, and similar to the experience children where they observe their environment listen speakers. The recovers meaning English sentences despite not having access any annotated It does so ambiguity inherent vision sentence may refer combination objects, object properties, relations or actions taken by agent video. For this task, we...

10.18653/v1/d18-1285 article EN cc-by Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2018-01-01

Classifying Syntactic Errors in Learner Language

OPENALEX - Publications

Leshem Choshen Dmitry Nikolaev Yevgeni Berzak Omri Abend

We present a method for classifying syntactic errors in learner language, namely whose correction alters the morphosyntactic structure of sentence. The methodology builds on established Universal Dependencies representation scheme, and provides complementary information to other error-classification systems. Unlike existing error classification methods, our is applicable across languages, which we showcase by producing detailed picture English Russian. further demonstrate utility analyzing...

10.18653/v1/2020.conll-1.7 article EN cc-by 2020-01-01

Predicting Text Readability from Scrolling Interactions

OPENALEX - Publications

Sian Gooding Yevgeni Berzak Tony Wing Chung Mak Matt Sharifi

Judging the readability of text has many important applications, for instance when performing simplification or sourcing reading material language learners. In this paper, we present a 518 participant study which investigates how scrolling behaviour relates to English texts. We make our dataset publicly available and show that (1) there are statistically significant differences in way readers interact with depending on level, (2) such measures can be used predict text, (3) background reader...

10.18653/v1/2021.conll-1.30 article EN cc-by 2021-01-01

Fine-Grained Prediction of Reading Comprehension from Eye Movements

OPENALEX - Publications

Omer Shubi Yoav Meiri Cfir Avraham Hadar Yevgeni Berzak

10.18653/v1/2024.emnlp-main.198 article EN Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2024-01-01

Coming Soon ...