- Natural Language Processing Techniques
- Language and cultural evolution
- Linguistic Variation and Morphology
- Linguistics and Cultural Studies
- Lexicography and Language Studies
- Syntax, Semantics, Linguistic Variation
- Historical Linguistics and Language Studies
- Authorship Attribution and Profiling
- Multilingual Education and Policy
- Linguistics, Language Diversity, and Identity
- Topic Modeling
- Pacific and Southeast Asian Studies
- Spanish Linguistics and Language Studies
- Physics and Engineering Research Articles
- Linguistic Studies and Language Acquisition
- Language, Linguistics, Cultural Analysis
- Linguistics and language evolution
- Language, Discourse, Communication Strategies
- Algorithms and Data Compression
- Phonetics and Phonology Research
- Image Retrieval and Classification Techniques
- Radiomics and Machine Learning in Medical Imaging
- African history and culture analysis
- Speech and dialogue systems
- Australian Indigenous Culture and History
Uppsala University
2017-2024
Max Planck Institute for Psycholinguistics
2013-2023
University of Zurich
2022
Max Planck Society
2011-2020
Humboldt-Universität zu Berlin
2020
Australian National University
2018-2020
Max Planck Institute for the Science of Human History
2015-2018
Université Claude Bernard Lyon 1
2018
Centre National de la Recherche Scientifique
2018
Max Planck Institute for Evolutionary Anthropology
2010-2014
It is widely assumed that one of the fundamental properties spoken language arbitrary relation between sound and meaning. Some exceptions in form nonarbitrary associations have been documented linguistics, cognitive science, anthropology, but these studies only involved small subsets 6,000+ languages world today. By analyzing word lists covering nearly two-thirds world's languages, we demonstrate a considerable proportion 100 basic vocabulary items carry strong with specific kinds human...
While global patterns of human genetic diversity are increasingly well characterized, the languages remains less systematically described. Here, we outline Grambank database. With over 400,000 data points and 2400 languages, is largest comparative grammatical database available. The comprehensiveness allows us to quantify relative effects genealogical inheritance geographic proximity on structural world's evaluate constraints linguistic diversity, identify most unusual languages. An analysis...
This paper describes a computerized alternative to glottochronology for estimating elapsed time since parent languages diverged into daughter languages. The method, developed by the Automated Similarity Judgment Program (ASJP) consortium, is different from in four major respects: (1) it automated and thus more objective, (2) applies uniform analytical approach single database of worldwide languages, (3) based on lexical similarity as determined Levenshtein (edit) distances rather than...
This article surveys work on Unsupervised Learning of Morphology. We define Morphology as the problem inducing a description (of some kind, even if only morpheme-segmentation) how orthographic words are built up given raw text data language. briefly go through history and motivation this problem. Next, over 200 items listed with brief characterization, most important ideas in field critically discussed. summarize achievements so far give pointers for future developments.
What would your ideas about language evolution be if there was only one left on earth? Fortunately, our investigation need not that impoverished. In the present article, we survey state of knowledge regarding kinds found among humans, inventory, population sizes, time depth, grammatical variation, and other relevant issues a theory should minimally take into account.
Abstract The amount of available digital data for the languages world is constantly increasing. Unfortunately, most are provided in a large variety formats and therefore not amenable comparison re-use. Cross-Linguistic Data Formats initiative proposes new standards two basic types historical typological language (word lists, structural datasets) framework to incorporate more (e.g. parallel texts, dictionaries). specification cross-linguistic comes along with software package validation...
This discussion note reviews responses of the linguistics profession to grave issues language endangerment identified a quarter century ago in journal Language by Krauss, Hale, England, Craig, and others (Hale et al. 1992). Two half decades worldwide research not only have given us much more accurate picture number, phylogeny, typological variety world's languages, but they also seen development wide range new approaches, conceptual technological, problem documenting them. We review these...
Abstract This paper presents a precise definition of numeral classifiers, steps to identify classifier language, and database 3,338 languages, which 723 languages have been identified as having system. The database, named World Atlas Classifier Languages (WACL), has systematically constructed over the last 10 years via manual survey relevant literature also an automatic scan digitized grammars followed by checking. open-access release WACL is thus significant contribution linguistic research...
In this paper, we seek to draw attention Malayo-Polynesian languages outside of the Oceanic subgroup with innovative bases and complex numerals involving various additive, subtractive, multiplicative procedures. We highlight fact that number showing such innovations is more than previously recognized in literature. Finally, observe concentration numeral region eastern Indonesia suggests Papuan influence, either through contact or substrate. However, also note sociocultural factors, form...
Human history is written in both our genes and languages. The extent to which biological linguistic histories are congruent has been the subject of considerable debate, with clear examples matches mismatches. To disentangle patterns demographic cultural transmission, we need a global systematic assessment Here, assemble genomic database (GeLaTo, or Genes Languages Together) specifically curated investigate genetic diversity worldwide. We find that most populations GeLaTo speak languages same...
One attempt at explaining why some language families are large (while others small) is the hypothesis that now became because their ancestral speakers had a technological advantage, most often agriculture. Variants of this idea referred to as Language Farming Dispersal Hypothesis. Previously, detailed family studies have uncovered various supporting examples and counterexamples idea. In present paper I weigh evidence from ALL attested families. For each family, use number member languages...
Preview this article: Problems with, and alternatives to,the tree model in historical linguistics, Page 1 of < Previous page | Next > /docserver/preview/fulltext/jhl.00005.kal-1.gif
Glottocodes constitute the backbone identification system for language, dialect and family inventory Glottolog (https://glottolog.org). In this paper, we summarize motivation history behind of glottocodes describe principles practices data curation, technical infrastructure update/version-tracking systematics. Since our understanding target domain – dialects, languages language families entire world is continually evolving, changes updates are relatively common. The resulting assessed in...
While the notion of ‘area’ or ‘Sprachbund’ has a long history in linguistics, with geographically-defined regions frequently cited as useful means to explain typological distributions, problem delimiting areas not been well addressed. Lists general-purpose, largely independent ‘macro-areas’ (typically continent size) have proposed step rule out contact an explanation for various large-scale linguistic phenomena. This squib points some problems currently widely-used predetermined areas, those...
Abstract This paper shows how it is possible to count languages vs. dialects if, for every pair of varieties, we are given whether they mutually intelligible or not. The method divide the varieties into a minimum number internally groups where each group counts as one language. Expressed in terms graphs (as discrete mathematics), even easier understood as: applying graph-colouring graph over with intelligibility interrelationships edges. Graph colouring already mathematically well-understood...