- Natural Language Processing Techniques
- Text Readability and Simplification
- Topic Modeling
- Second Language Acquisition and Learning
- Multimodal Machine Learning Applications
- Gaze Tracking and Assistive Technology
- Educational Methods and Media Use
- Reading and Literacy Development
- Speech and dialogue systems
- Semantic Web and Ontologies
- Neurobiology of Language and Bilingualism
- Digital Accessibility for Disabilities
- Human Pose and Action Recognition
- Advanced Image and Video Retrieval Techniques
- Language, Metaphor, and Cognition
- Video Analysis and Summarization
- Technology-Enhanced Education Studies
- Intelligent Tutoring Systems and Adaptive Learning
- Educational Strategies and Epistemologies
- Legal and Constitutional Studies
- Language and cultural evolution
- Handwritten Text Recognition Techniques
- Educational Technology and Assessment
- Subtitles and Audiovisual Media
- Law, Economics, and Judicial Systems
Technion – Israel Institute of Technology
2022-2025
Institute of Cognitive and Brain Sciences
2019-2023
Cornell University
2022
Massachusetts Institute of Technology
2014-2022
PRG S&Tech (South Korea)
2021
University of Cambridge
2021
Saarland University
2011
Linguistic typology aims to capture structural and semantic variation across the world’s languages. A large-scale could provide excellent guidance for multilingual Natural Language Processing (NLP), particularly languages that suffer from lack of human labeled resources. We present an extensive literature survey on use typological information in development NLP techniques. Our demonstrates date, existing databases has resulted consistent but modest improvements system performance. show this...
Yevgeni Berzak, Jessica Kenney, Carolyn Spadine, Jing Xian Wang, Lucia Lam, Keiko Sophie Mori, Sebastian Garza, Boris Katz. Proceedings of the 54th Annual Meeting Association for Computational Linguistics (Volume 1: Long Papers). 2016.
In recent years linguistic typology, which classifies the world's languages according to their functional and structural properties, has been widely used support multilingual NLP. While growing importance of typological information in supporting tasks recognised, no systematic survey existing resources use NLP published. This paper provides such a as well discussion we hope will both inform inspire future work area.
We present a study on two key characteristics of human syntactic annotations: anchoring and agreement. Anchoring is well known cognitive bias in decision making, where judgments are drawn towards pre-existing values. the influence standard approach to creation resources annotations obtained via editing tagger parser output. Our experiments demonstrate clear effect reveal unwanted consequences, including overestimation parsing performance lower quality comparison with human-based annotations....
We present CELER (Corpus of Eye Movements in L1 and L2 English Reading), a broad coverage eye-tracking corpus for English. comprises over 320,000 words, data from 365 participants. Sixty-nine participants are (first language) speakers, 296 (second speakers wide range proficiency levels five different native language backgrounds. As such, has an order magnitude more than any currently available eye movements dataset with readers. Each participant reads 156 newswire sentences the Wall Street...
Text simplification is a common practice for making texts easier to read and understand. To which extent does it achieve these goals, participant text characteristics drive benefits? In this work, we use eye tracking address questions the first time population of adult native (L1) English speakers. We find that 42\% readers exhibit reading facilitation effects, while only 2\% improve comprehension accuracy. further observe fluency benefits are larger slower less experienced readers, more...
Text readability measures are widely used in many real-world scenarios and NLP. These have primarily been developed by predicting reading comprehension outcomes, while largely neglecting what is perhaps the core aspect of a readable text: ease. In this work, we propose new eye tracking based methodology for evaluating measures, which focuses on their ability to account facilitation effects text simplification, as well ease more broadly. Using approach, find that existing formulas moderate...
Be it your favorite novel, a newswire article, cooking recipe or an academic paper -- in many daily situations we read the same text more than once. In this work, ask whether is possible to automatically determine reader has previously encountered based on their eye movement patterns. We introduce two variants of task and address them with considerable success using both feature-based neural models. further general strategy for enhancing these models machine generated simulations movements...
We present OneStop Eye Movements, a large-scale corpus of eye movements in reading, which native speakers read newswire texts English and answer reading comprehension questions. has 152 hours passage movement recordings from 360 participants for 2.6 million word tokens, more data than all the existing public broad coverage tracking datasets combined. The was collected extensively piloted materials comprising 486 questions auxiliary text annotations geared towards behavioral analyses...
We present OneStop Eye Movements, a large-scale corpus of eye movements in reading, which native speakers read newswire texts English and answer reading comprehension questions. has 152 hours passage movement recordings from 360 participants for 2.6 million word tokens, more data than all the existing public broad coverage tracking datasets combined. The was collected extensively piloted materials comprising 486 questions auxiliary text annotations geared towards behavioral analyses...
Text simplification is a common practice for making texts easier to read and understand. To which extent does it achieve these goals, participant text characteristics drive benefits? In this work, we use eye tracking address questions the first time population of adult native (L1) English speakers. We find that 42% readers exhibit reading facilitation effects, while only 2% improve comprehension accuracy. further observe fluency benefits are larger slower lessexperienced readers, more...
Understanding language goes hand in with the ability to integrate complex contextual information obtained via perception.In this work, we present a novel task for grounded understanding: disambiguating sentence given visual scene which depicts one of possible interpretations that sentence.To end, introduce new multimodal corpus containing ambiguous sentences, representing wide range syntactic, semantic and discourse ambiguities, coupled videos visualize different each sentence.We address by...
In this work, we analyze how human gaze during reading comprehension is conditioned on the given question, and whether signal can be beneficial for machine comprehension. To end, collect a new eye-tracking dataset with large number of participants engaging in multiple choice task. Our analysis data reveals increased fixation times over parts text that are most relevant answering question. Motivated by finding, propose making automated more human-like mimicking information-seeking behavior We...
We present STARC (Structured Annotations for Reading Comprehension), a new annotation framework assessing reading comprehension with multiple choice questions. Our introduces principled structure the answer choices and ties them to textual span annotations. The is implemented in OneStopQA, high-quality dataset evaluation analysis of English. use this demonstrate that can be leveraged key application development SAT-like materials: automatic quality probing via ablation experiments. further...
Yevgeni Berzak, Boris Katz, Roger Levy. Proceedings of the 2018 Conference North American Chapter Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). 2018.
This work examines the impact of crosslinguistic transfer on grammatical errors in English as Second Language (ESL) texts.Using a computational framework that formalizes theory Contrastive Analysis (CA), we demonstrate language specific error distributions ESL writing can be predicted from typological properties native and their relation to typology English.Our driven model enables obtain accurate estimates such without access any data for target languages.Furthermore, present strategy...
A fundamental question in language learning concerns the role of a speaker's first second acquisition. We present novel methodology for studying this question: analysis eye-movement patterns reading free-form text. Using methodology, we demonstrate time that native English learners can be predicted from their gaze fixations when English. provide classifier uncertainty and learned features, which indicates differences are likely to rooted linguistic divergences across languages. The presented...
This work was supported by the Center for Brains, Minds and Machines (CBMM), funded NSF STC award CCF - 1231216.
We develop a semantic parser that is trained in grounded setting using pairs of videos captioned with sentences. This both data-efficient, requiring little annotation, and similar to the experience children where they observe their environment listen speakers. The recovers meaning English sentences despite not having access any annotated It does so ambiguity inherent vision sentence may refer combination objects, object properties, relations or actions taken by agent video. For this task, we...
We present a method for classifying syntactic errors in learner language, namely whose correction alters the morphosyntactic structure of sentence. The methodology builds on established Universal Dependencies representation scheme, and provides complementary information to other error-classification systems. Unlike existing error classification methods, our is applicable across languages, which we showcase by producing detailed picture English Russian. further demonstrate utility analyzing...
Judging the readability of text has many important applications, for instance when performing simplification or sourcing reading material language learners. In this paper, we present a 518 participant study which investigates how scrolling behaviour relates to English texts. We make our dataset publicly available and show that (1) there are statistically significant differences in way readers interact with depending on level, (2) such measures can be used predict text, (3) background reader...