- Text Readability and Simplification
- Neurobiology of Language and Bilingualism
- Natural Language Processing Techniques
- Reading and Literacy Development
- Second Language Acquisition and Learning
- Speech and dialogue systems
- Categorization, perception, and language
- Topic Modeling
- Lexicography and Language Studies
- Digital Communication and Language
- Child and Animal Learning Development
- Language and cultural evolution
- Authorship Attribution and Profiling
- Cognitive Abilities and Testing
- Technology Adoption and User Behaviour
- Language Development and Disorders
- Aging and Gerontology Research
- Open Source Software Innovations
- Advanced Text Analysis Techniques
- Linguistics, Language Diversity, and Identity
- Knowledge Management and Sharing
- Migration, Policy, and Dickens Studies
- Language and Culture
- Computational and Text Analysis Methods
- Multisensory perception and integration
McMaster University
2021
Brock University
2021
Tilburg University
2021
Ghent University Hospital
2013-2021
Ghent University
2014-2020
Institute of Psychology
2018-2020
Jagiellonian University
2018-2020
We present word frequencies based on subtitles of British television programmes. show that the SUBTLEX-UK explain more variance in lexical decision times Lexicon Project than National Corpus and SUBTLEX-US frequencies. In addition to form frequencies, we also measures contextual diversity part-of-speech specific children programmes, bigram giving researchers English access full range norms recently made available for other languages. Finally, introduce a new measure frequency, Zipf scale,...
Based on an analysis of the literature and a large scale crowdsourcing experiment, we estimate that average 20-year-old native speaker American English knows 42,000 lemmas 4,200 non-transparent multiword expressions, derived from 11,100 word families. The numbers range 27,000 for lowest 5% to 52,000 highest 5%. Between ages 20 60, person learns 6,000 extra or about one new lemma every 2 days. knowledge words can be as shallow knowing exists. In addition, people learn tens thousands inflected...
The word frequency effect refers to the observation that high-frequency words are processed more efficiently than low-frequency words. Although was first described over 80 years ago, in recent it has been investigated detail. It become clear considerable quality differences exist between estimates and we need a new standardized measure does not mislead users. Research also points consistent individual effect, meaning will be present at different ranges for people with degrees of language...
We use the results of a large online experiment on word knowledge in Dutch to investigate variables influencing vocabulary size population and examine effect prevalence-the percentage knowing word-as measure occurrence. Nearly 300,000 participants were presented with about 70 stimuli (selected from list 53,000 words) an adapted lexical decision task. identify age, education, multilingualism as most important factors size. The suggest that accumulation throughout life multiple languages...
Keuleers, Stevens, Mandera, and Brysbaert (2015) presented a new variable, word prevalence, defined as knowledge in the population. Some words are known to more people than other. This is particularly true for low-frequency (e.g., screenshot vs. scourage). In present study, we examined impact of measure by collecting lexical decision times 30,000 Dutch lemmas various lengths (the Lexicon Project 2). Word prevalence had second highest correlation with (after frequency): Words everyone...
Subjective ratings for age of acquisition, concreteness, affective valence, and many other variables are an important element psycholinguistic research. However, even well-studied languages, usually cover just a small part the vocabulary. A possible solution involves using corpora to build semantic similarity space apply machine learning techniques extrapolate existing previously unrated words. We conduct systematic comparison two extrapolation techniques: k-nearest neighbours, random...
To have more information about the English words known by second language (L2) speakers, we ran a large-scale crowdsourcing vocabulary test, which yielded 17 million useful responses. It provided us with list of 445 to nearly all participants. The was compared various existing lists advised include in first stages L2 teaching. data also ranking 61,000 terms degree and speed word recognition correlated r = .85 similar based on native speakers. speakers our study were relatively better at...
We present a new dataset of English word recognition times for total 62 thousand words, called the Crowdsourcing Project. The data were collected via an internet vocabulary test, in which more than one million people participated. is limited to native speakers. Participants asked indicate words they knew. Their response registered, although at no point participants respond as fast possible. Still, correlate around .75 with Lexicon Project shared words. Also results virtual experiments that...
We present a new database of Dutch word recognition times for total 54 thousand words, called the Crowdsourcing Project. The data were collected with an internet vocabulary test. is limited to native speakers. Participants asked indicate which words they knew. Their response registered, even though participants not respond as fast possible. Still, correlate around .7 Lexicon Projects shared words. Also results virtual experiments that are valid addition Projects. This only means we have...
This study presents a Polish semantic priming dataset and similarity ratings for word pairs obtained with native speakers, as well range of spaces. The include strongly related, weakly semantically unrelated pairs. rating (Experiment 1) confirmed that the three conditions differed in relatedness. lexical decision carefully matched subset stimuli 2), revealed strong effects related pairs, whereas showed smaller but still significant effect relative to datasets both experiments those...