NFDI4DS | UHH-SEMS - Publication Details

Johann‐Mattis List

ORCID: 0000-0003-2133-8919

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5012676548

Research Areas

Natural Language Processing Techniques
Language and cultural evolution
Linguistic Variation and Morphology
Linguistics and Cultural Studies
China's Ethnic Minorities and Relations
Linguistics, Language Diversity, and Identity
Phonetics and Phonology Research
Topic Modeling
Linguistics and language evolution
Lexicography and Language Studies
Semantic Web and Ontologies
Digital Humanities and Scholarship
Authorship Attribution and Profiling
linguistics and terminology studies
Spanish Linguistics and Language Studies
Historical Linguistics and Language Studies
Language, Linguistics, Cultural Analysis
Speech Recognition and Synthesis
Multilingual Education and Policy
Syntax, Semantics, Linguistic Variation
Chinese history and philosophy
Computational and Text Analysis Methods
Hearing Impairment and Communication
Scientific Computing and Data Management
Neurobiology of Language and Bilingualism

Max Planck Institute for Evolutionary Anthropology
2021-2025

University of Passau
2023-2025

Aristotle University of Thessaloniki
2023

Trinity College Dublin
2023

Kobe City University of Foreign Studies
2023

Macquarie University
2022-2023

The University of Texas at Austin
2022-2023

University of Hawaiʻi at Mānoa
2022-2023

University of Colorado System
2022-2023

University of Colorado Boulder
2022-2023

Emotion semantics show both cultural variation and universal structure

OPENALEX - Publications

Joshua Conrad Jackson Joseph Watts Teague R. Henry Johann‐Mattis List Robert Forkel and 4 more

Many human languages have words for emotions such as "anger" and "fear," yet it is not clear whether these similar meanings across languages, or why their might vary. We estimate emotion semantics a sample of 2474 spoken using "colexification"-a phenomenon in which name semantically related concepts with the same word. Analyses show significant variation networks concept colexification, predicted by geographic proximity language families. also find evidence universal structure colexification...

10.1126/science.aaw8160 article EN Science 2019-12-20

Dated language phylogenies shed light on the ancestry of Sino-Tibetan

OPENALEX - Publications

Laurent Sagart Guillaume Jacques Yunfan Lai Robin Ryder Valentin Thouzeau and 2 more

Significance Given its size and geographical extension, Sino-Tibetan is of the highest importance for understanding prehistory East Asia, neighboring language families. Based on a dataset 50 languages, we infer phylogenies that date origin family to around 7200 B.P., linking with late Cishan early Yangshao cultures.

10.1073/pnas.1817972116 article EN cc-by-nc-nd Proceedings of the National Academy of Sciences 2019-05-06

From Text to Thought: How Analyzing Language Can Advance Psychological Science

OPENALEX - Publications

Joshua Conrad Jackson Joseph Watts Johann‐Mattis List Curtis Puryear Ryan Drabble and 1 more

Humans have been using language for millennia but only just begun to scratch the surface of what natural can reveal about mind. Here we propose that offers a unique window into psychology. After briefly summarizing legacy analyses in psychological science, show how methodological advances made these more feasible and insightful than ever before. In particular, describe two forms analysis-natural-language processing comparative linguistics-are contributing understand topics as diverse...

10.1177/17456916211004899 article EN cc-by Perspectives on Psychological Science 2021-10-04

Automated Dating of the World’s Language Families Based on Lexical Similarity

OPENALEX - Publications

Eric W. Holman Cecil H. Brown Søren Wichmann André Müller Viveka Velupillai and 10 more

This paper describes a computerized alternative to glottochronology for estimating elapsed time since parent languages diverged into daughter languages. The method, developed by the Automated Similarity Judgment Program (ASJP) consortium, is different from in four major respects: (1) it automated and thus more objective, (2) applies uniform analytical approach single database of worldwide languages, (3) based on lexical similarity as determined Levenshtein (edit) distances rather than...

10.1086/662127 article EN Current Anthropology 2011-11-30

The Potential of Automatic Word Comparison for Historical Linguistics

OPENALEX - Publications

Johann‐Mattis List Simon J. Greenhill Russell D. Gray

The amount of data from languages spoken all over the world is rapidly increasing. Traditional manual methods in historical linguistics need to face challenges brought by this influx data. Automatic approaches word comparison could provide invaluable help pre-analyze which can be later enhanced experts. In way, computational take care repetitive and schematic tasks leaving experts concentrate on answering interesting questions. Here we test potential automatic detect etymologically related...

10.1371/journal.pone.0170046 article EN cc-by PLoS ONE 2017-01-27

The Database of Cross-Linguistic Colexifications, reproducible analysis of cross-linguistic polysemies

OPENALEX - Publications

Christoph Rzymski Tiago Tresoldi Simon J. Greenhill Mei-Shin Wu Nathanael E. Schweikhard and 22 more

Abstract Advances in computer-assisted linguistic research have been greatly influential reshaping research. With the increasing availability of interconnected datasets created and curated by researchers, more interwoven questions can now be investigated. Such advances, however, are bringing high requirements terms rigorousness for preparing curating datasets. Here we present CLICS, a Database Cross-Linguistic Colexifications (CLICS). CLICS tackles interdisciplinary about colexification...

10.1038/s41597-019-0341-x article EN cc-by Scientific Data 2020-01-13

Cross-Linguistic Data Formats, advancing data sharing and re-use in comparative linguistics

OPENALEX - Publications

Robert Forkel Johann‐Mattis List Simon J. Greenhill Christoph Rzymski Sebastian Bank and 5 more

Abstract The amount of available digital data for the languages world is constantly increasing. Unfortunately, most are provided in a large variety formats and therefore not amenable comparison re-use. Cross-Linguistic Data Formats initiative proposes new standards two basic types historical typological language (word lists, structural datasets) framework to incorporate more (e.g. parallel texts, dictionaries). specification cross-linguistic comes along with software package validation...

10.1038/sdata.2018.205 article EN cc-by Scientific Data 2018-10-16

Lexibank, a public repository of standardized wordlists with computed phonological and lexical features

OPENALEX - Publications

Johann‐Mattis List Robert Forkel Simon J. Greenhill Christoph Rzymski Johannes Englisch and 1 more

The past decades have seen substantial growth in digital data on the world's languages. At same time, demand for cross-linguistic datasets has been increasing, as witnessed by numerous studies devoted to diverse questions human prehistory, cultural evolution, and cognition. Unfortunately, most published lack standardization which makes their comparison difficult. Here, we present a new approach increase comparability of lexical data. We designed workflows computer-assisted lifting...

10.1038/s41597-022-01432-0 article EN cc-by Scientific Data 2022-06-16

From Isolates to Families: Using Neural Networks for Automated Language Affiliation

OPENALEX - Publications

Frederic Blum Steffen Herbold Johann‐Mattis List

In historical linguistics, the affiliation of languages to a common language family is traditionally carried out using complex workflow that relies on manually comparing individual languages. Large-scale standardized collections multilingual wordlists and grammatical structures might help improve this open new avenues for developing automated workflows. Here, we present neural network models use lexical data from worldwide sample more than 1,000 with known affiliations classify into...

10.48550/arxiv.2502.11688 preprint EN arXiv (Cornell University) 2025-02-17

Networks uncover hidden lexical borrowing in Indo-European language evolution

OPENALEX - Publications

Shijulal Nelson‐Sathi Johann‐Mattis List Hans Geisler Heiner Fangerau Russell D. Gray and 2 more

Language evolution is traditionally described in terms of family trees with ancestral languages splitting into descendent languages. However, it has long been recognized that language also entails horizontal components, most commonly through lexical borrowing. For example, the English was heavily influenced by Old Norse and French; eight per cent its basic vocabulary borrowed. Borrowing a distinctly non-tree-like process—akin to gene transfer genome evolution—that cannot be recovered...

10.1098/rspb.2010.1917 article EN cc-by Proceedings of the Royal Society B Biological Sciences 2010-11-24

CLICS2: An improved database of cross-linguistic colexifications assembling lexical data with the help of cross-linguistic data formats

OPENALEX - Publications

Johann‐Mattis List Simon J. Greenhill Cormac Anderson Thomas U. Mayer Tiago Tresoldi and 1 more

Abstract The Database of Cross-Linguistic Colexifications (CLICS), has established a computer-assisted framework for the interactive representation cross-linguistic colexification patterns. In its current form, it proven to be useful tool various kinds investigation into semantic associations, ranging from studies on change, patterns conceptualization, and linguistic paleontology. But CLICS also been criticized obvious shortcomings, underlying dataset, which still contains many errors, up...

10.1515/lingty-2018-0010 article EN cc-by-nc-nd Linguistic Typology 2018-08-21

A recent northern origin for the Uto-Aztecan family

OPENALEX - Publications

Simon J. Greenhill Hannah J. Haynie Robert Ross Angela M. Chira Johann‐Mattis List and 3 more

The Uto-Aztecan language family is one of the largest families in Americas. However, there has been considerable debate about its origin and how it spread. Here we use Bayesian phylogenetic methods to analyze lexical data from thirty-four varieties two Kiowa-Tanoan languages. We infer age Proto-Uto-Aztecan be around 4,100 years (3,258–5,025 years) identify most likely homeland near what now Southern California. reconstruct probable subsistence strategy ancestral society no casual or...

10.1353/lan.0.0276 article EN Language 2023-01-01

A cross-linguistic database of phonetic transcription systems

OPENALEX - Publications

Cormac Anderson Tiago Tresoldi Thiago Costa Chacon Anne‐Maria Fehn Mary Walworth and 2 more

Abstract Contrary to what non-practitioners might expect, the systems of phonetic notation used by linguists are highly idiosyncratic. Not only do various linguistic subfields disagree on specific symbols they use denote speech sounds languages, but also in large databases sound inventories considerable variation can be found. Inspired recent efforts link cross-linguistic data with help reference catalogues (Glottolog, Concepticon) across different resources, we present initial a catalogue...

10.2478/yplm-2018-0002 article EN cc-by-nc-nd Yearbook of the Poznan Linguistic Meeting 2018-12-01

Using lexical language models to detect borrowings in monolingual wordlists

OPENALEX - Publications

John E. Miller Tiago Tresoldi Roberto Zariquiey César Beltrán Natalia Morozova and 1 more

Lexical borrowing, the transfer of words from one language to another, is most frequent processes in evolution. In order detect borrowings, linguists make use various strategies, combining evidence sources. Despite increasing popularity computational approaches comparative linguistics, automated lexical borrowing detection are still their infancy, disregarding many aspects that routinely considered by human experts. One example for this kind phonological and phonotactic clues especially...

10.1371/journal.pone.0242709 article EN cc-by PLoS ONE 2020-12-09

Using Sequence Similarity Networks to Identify Partial Cognates in Multilingual Wordlists

OPENALEX - Publications

Johann‐Mattis List Philippe Lopez Éric Bapteste

10.18653/v1/p16-2097 article EN 2016-01-01

A Web-Based Interactive Tool for Creating, Inspecting, Editing, and Publishing Etymological Datasets

OPENALEX - Publications

Johann‐Mattis List

The paper presents the Etymological DICtionary ediTOR (EDICTOR), a free, interactive, web-based tool designed to aid historical linguists in creating, editing, analysing, and publishing etymological datasets. EDICTOR offers interactive solutions for important tasks linguistics, including facilitated input segmentation of phonetic transcriptions, quantitative qualitative analyses morphological data, enhanced interfaces cognate class assignment multiple word alignment, automated evaluation...

10.18653/v1/e17-3003 article EN cc-by 2017-01-01

Improved computational models of sound change shed light on the history of the Tukanoan languages

OPENALEX - Publications

Thiago Costa Chacon Johann‐Mattis List

Improved computational models of sound change shed light on the history Tukanoan languages * There has been much debate regarding internal during last four decades, with different classification proposals being based lexical and phonological data.Here, we present a new language family an improved approach which infers phylogenetic trees from proposed patterns.In contrast to traditional methods manual identification shared innovations by experts, our method identifies valid within parsimony...

10.31826/jlr-2016-133-404 article EN Voprosy âzykovogo rodstva 2016-01-01

Are Automatic Methods for Cognate Detection Good Enough for Phylogenetic Reconstruction in Historical Linguistics?

OPENALEX - Publications

Taraka Rama Johann‐Mattis List Johannes Wahle Gerhard Jäger

Taraka Rama, Johann-Mattis List, Johannes Wahle, Gerhard Jäger. Proceedings of the 2018 Conference North American Chapter Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers). 2018.

10.18653/v1/n18-2063 article EN cc-by 2018-01-01

Using support vector machines and state-of-the-art algorithms for phonetic alignment to identify cognates in multi-lingual wordlists

OPENALEX - Publications

Gerhard Jäger Johann‐Mattis List Pavel Sofroniev

Most current approaches in phylogenetic linguistics require as input multilingual word lists partitioned into sets of etymologically related words (cognates). Cognate identification is so far done manually by experts, which time consuming and yet only available for a small number well-studied language families. Automatizing this step will greatly expand the empirical scope methods linguistics, raw wordlists (in phonetic transcription) are much easier to obtain than cognate have been fully...

10.18653/v1/e17-1113 article EN cc-by 2017-01-01

Challenges of annotation and analysis in computer-assisted language comparison: A case study on Burmish languages

OPENALEX - Publications

Nathan W. Hill Johann‐Mattis List

Abstract The use of computational methods in comparative linguistics is growing popularity. increasing deployment such draws into focus those areas which they remain inadequate as well where classical approaches to language comparison are untransparent and inconsistent. In this paper we illustrate specific challenges both encounter when studying South-East Asian languages. With the help data from Burmish family point resulting missing annotation standards insufficient for analysis how tackle...

10.1515/yplm-2017-0003 article EN cc-by-nc-nd Yearbook of the Poznan Linguistic Meeting 2017-09-13

Coming Soon ...