Yevgeni Berzak

ORCID: 0000-0003-4474-1727
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Natural Language Processing Techniques
  • Text Readability and Simplification
  • Topic Modeling
  • Second Language Acquisition and Learning
  • Multimodal Machine Learning Applications
  • Gaze Tracking and Assistive Technology
  • Educational Methods and Media Use
  • Reading and Literacy Development
  • Speech and dialogue systems
  • Semantic Web and Ontologies
  • Neurobiology of Language and Bilingualism
  • Digital Accessibility for Disabilities
  • Human Pose and Action Recognition
  • Advanced Image and Video Retrieval Techniques
  • Language, Metaphor, and Cognition
  • Video Analysis and Summarization
  • Technology-Enhanced Education Studies
  • Intelligent Tutoring Systems and Adaptive Learning
  • Educational Strategies and Epistemologies
  • Legal and Constitutional Studies
  • Language and cultural evolution
  • Handwritten Text Recognition Techniques
  • Educational Technology and Assessment
  • Subtitles and Audiovisual Media
  • Law, Economics, and Judicial Systems

Technion – Israel Institute of Technology
2022-2025

Institute of Cognitive and Brain Sciences
2019-2023

Cornell University
2022

Massachusetts Institute of Technology
2014-2022

PRG S&Tech (South Korea)
2021

University of Cambridge
2021

Saarland University
2011

Linguistic typology aims to capture structural and semantic variation across the world’s languages. A large-scale could provide excellent guidance for multilingual Natural Language Processing (NLP), particularly languages that suffer from lack of human labeled resources. We present an extensive literature survey on use typological information in development NLP techniques. Our demonstrates date, existing databases has resulted consistent but modest improvements system performance. show this...

10.1162/coli_a_00357 article EN cc-by-nc-nd Computational Linguistics 2019-06-25

Yevgeni Berzak, Jessica Kenney, Carolyn Spadine, Jing Xian Wang, Lucia Lam, Keiko Sophie Mori, Sebastian Garza, Boris Katz. Proceedings of the 54th Annual Meeting Association for Computational Linguistics (Volume 1: Long Papers). 2016.

10.18653/v1/p16-1070 article EN cc-by Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2016-01-01

In recent years linguistic typology, which classifies the world's languages according to their functional and structural properties, has been widely used support multilingual NLP. While growing importance of typological information in supporting tasks recognised, no systematic survey existing resources use NLP published. This paper provides such a as well discussion we hope will both inform inspire future work area.

10.48550/arxiv.1610.03349 preprint EN other-oa arXiv (Cornell University) 2016-01-01

We present a study on two key characteristics of human syntactic annotations: anchoring and agreement. Anchoring is well known cognitive bias in decision making, where judgments are drawn towards pre-existing values. the influence standard approach to creation resources annotations obtained via editing tagger parser output. Our experiments demonstrate clear effect reveal unwanted consequences, including overestimation parsing performance lower quality comparison with human-based annotations....

10.18653/v1/d16-1239 article EN cc-by Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2016-01-01

We present CELER (Corpus of Eye Movements in L1 and L2 English Reading), a broad coverage eye-tracking corpus for English. comprises over 320,000 words, data from 365 participants. Sixty-nine participants are (first language) speakers, 296 (second speakers wide range proficiency levels five different native language backgrounds. As such, has an order magnitude more than any currently available eye movements dataset with readers. Each participant reads 156 newswire sentences the Wall Street...

10.1162/opmi_a_00054 article EN cc-by Open Mind 2022-01-01

Text simplification is a common practice for making texts easier to read and understand. To which extent does it achieve these goals, participant text characteristics drive benefits? In this work, we use eye tracking address questions the first time population of adult native (L1) English speakers. We find that 42\% readers exhibit reading facilitation effects, while only 2\% improve comprehension accuracy. further observe fluency benefits are larger slower less experienced readers, more...

10.31234/osf.io/dhk8c_v1 preprint EN 2025-02-12

Text readability measures are widely used in many real-world scenarios and NLP. These have primarily been developed by predicting reading comprehension outcomes, while largely neglecting what is perhaps the core aspect of a readable text: ease. In this work, we propose new eye tracking based methodology for evaluating measures, which focuses on their ability to account facilitation effects text simplification, as well ease more broadly. Using approach, find that existing formulas moderate...

10.48550/arxiv.2502.11150 preprint EN arXiv (Cornell University) 2025-02-16

Be it your favorite novel, a newswire article, cooking recipe or an academic paper -- in many daily situations we read the same text more than once. In this work, ask whether is possible to automatically determine reader has previously encountered based on their eye movement patterns. We introduce two variants of task and address them with considerable success using both feature-based neural models. further general strategy for enhancing these models machine generated simulations movements...

10.48550/arxiv.2502.11061 preprint EN arXiv (Cornell University) 2025-02-16

We present OneStop Eye Movements, a large-scale corpus of eye movements in reading, which native speakers read newswire texts English and answer reading comprehension questions. has 152 hours passage movement recordings from 360 participants for 2.6 million word tokens, more data than all the existing public broad coverage tracking datasets combined. The was collected extensively piloted materials comprising 486 questions auxiliary text annotations geared towards behavioral analyses...

10.31234/osf.io/kgxv5_v1 preprint EN 2025-02-24

We present OneStop Eye Movements, a large-scale corpus of eye movements in reading, which native speakers read newswire texts English and answer reading comprehension questions. has 152 hours passage movement recordings from 360 participants for 2.6 million word tokens, more data than all the existing public broad coverage tracking datasets combined. The was collected extensively piloted materials comprising 486 questions auxiliary text annotations geared towards behavioral analyses...

10.31234/osf.io/kgxv5_v2 preprint EN 2025-05-06

Text simplification is a common practice for making texts easier to read and understand. To which extent does it achieve these goals, participant text characteristics drive benefits? In this work, we use eye tracking address questions the first time population of adult native (L1) English speakers. We find that 42% readers exhibit reading facilitation effects, while only 2% improve comprehension accuracy. further observe fluency benefits are larger slower lessexperienced readers, more...

10.31234/osf.io/dhk8c_v2 preprint EN 2025-05-22

Understanding language goes hand in with the ability to integrate complex contextual information obtained via perception.In this work, we present a novel task for grounded understanding: disambiguating sentence given visual scene which depicts one of possible interpretations that sentence.To end, introduce new multimodal corpus containing ambiguous sentences, representing wide range syntactic, semantic and discourse ambiguities, coupled videos visualize different each sentence.We address by...

10.18653/v1/d15-1172 article EN cc-by Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2015-01-01

In this work, we analyze how human gaze during reading comprehension is conditioned on the given question, and whether signal can be beneficial for machine comprehension. To end, collect a new eye-tracking dataset with large number of participants engaging in multiple choice task. Our analysis data reveals increased fixation times over parts text that are most relevant answering question. Motivated by finding, propose making automated more human-like mimicking information-seeking behavior We...

10.18653/v1/2020.conll-1.11 article EN cc-by 2020-01-01

We present STARC (Structured Annotations for Reading Comprehension), a new annotation framework assessing reading comprehension with multiple choice questions. Our introduces principled structure the answer choices and ties them to textual span annotations. The is implemented in OneStopQA, high-quality dataset evaluation analysis of English. use this demonstrate that can be leveraged key application development SAT-like materials: automatic quality probing via ablation experiments. further...

10.18653/v1/2020.acl-main.507 article EN cc-by 2020-01-01

Yevgeni Berzak, Boris Katz, Roger Levy. Proceedings of the 2018 Conference North American Chapter Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). 2018.

10.18653/v1/n18-1180 article EN cc-by 2018-01-01

This work examines the impact of crosslinguistic transfer on grammatical errors in English as Second Language (ESL) texts.Using a computational framework that formalizes theory Contrastive Analysis (CA), we demonstrate language specific error distributions ESL writing can be predicted from typological properties native and their relation to typology English.Our driven model enables obtain accurate estimates such without access any data for target languages.Furthermore, present strategy...

10.18653/v1/k15-1010 article EN cc-by 2015-01-01

A fundamental question in language learning concerns the role of a speaker's first second acquisition. We present novel methodology for studying this question: analysis eye-movement patterns reading free-form text. Using methodology, we demonstrate time that native English learners can be predicted from their gaze fixations when English. provide classifier uncertainty and learned features, which indicates differences are likely to rooted linguistic divergences across languages. The presented...

10.18653/v1/p17-1050 article EN cc-by Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2017-01-01

This work was supported by the Center for Brains, Minds and Machines (CBMM), funded NSF STC award CCF - 1231216.

10.3115/v1/w14-1603 article EN 2014-01-01

We develop a semantic parser that is trained in grounded setting using pairs of videos captioned with sentences. This both data-efficient, requiring little annotation, and similar to the experience children where they observe their environment listen speakers. The recovers meaning English sentences despite not having access any annotated It does so ambiguity inherent vision sentence may refer combination objects, object properties, relations or actions taken by agent video. For this task, we...

10.18653/v1/d18-1285 article EN cc-by Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2018-01-01

We present a method for classifying syntactic errors in learner language, namely whose correction alters the morphosyntactic structure of sentence. The methodology builds on established Universal Dependencies representation scheme, and provides complementary information to other error-classification systems. Unlike existing error classification methods, our is applicable across languages, which we showcase by producing detailed picture English Russian. further demonstrate utility analyzing...

10.18653/v1/2020.conll-1.7 article EN cc-by 2020-01-01

Judging the readability of text has many important applications, for instance when performing simplification or sourcing reading material language learners. In this paper, we present a 518 participant study which investigates how scrolling behaviour relates to English texts. We make our dataset publicly available and show that (1) there are statistically significant differences in way readers interact with depending on level, (2) such measures can be used predict text, (3) background reader...

10.18653/v1/2021.conll-1.30 article EN cc-by 2021-01-01

10.18653/v1/2024.emnlp-main.198 article EN Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2024-01-01
Coming Soon ...