NFDI4DS | UHH-SEMS - Publication Details

Harald Hammarström

ORCID: 0000-0003-0120-6396

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5007975524

Research Areas

Natural Language Processing Techniques
Language and cultural evolution
Linguistic Variation and Morphology
Linguistics and Cultural Studies
Lexicography and Language Studies
Syntax, Semantics, Linguistic Variation
Historical Linguistics and Language Studies
Authorship Attribution and Profiling
Multilingual Education and Policy
Linguistics, Language Diversity, and Identity
Topic Modeling
Pacific and Southeast Asian Studies
Spanish Linguistics and Language Studies
Physics and Engineering Research Articles
Linguistic Studies and Language Acquisition
Language, Linguistics, Cultural Analysis
Linguistics and language evolution
Language, Discourse, Communication Strategies
Algorithms and Data Compression
Phonetics and Phonology Research
Image Retrieval and Classification Techniques
Radiomics and Machine Learning in Medical Imaging
African history and culture analysis
Speech and dialogue systems
Australian Indigenous Culture and History

Uppsala University
2017-2024

Max Planck Institute for Psycholinguistics
2013-2023

University of Zurich
2022

Max Planck Society
2011-2020

Humboldt-Universität zu Berlin
2020

Australian National University
2018-2020

Max Planck Institute for the Science of Human History
2015-2018

Université Claude Bernard Lyon 1
2018

Centre National de la Recherche Scientifique
2018

Max Planck Institute for Evolutionary Anthropology
2010-2014

Sound–meaning association biases evidenced across thousands of languages

OPENALEX - Publications

Damián E. Blasí Søren Wichmann Harald Hammarström Peter F. Stadler Morten H. Christiansen

It is widely assumed that one of the fundamental properties spoken language arbitrary relation between sound and meaning. Some exceptions in form nonarbitrary associations have been documented linguistics, cognitive science, anthropology, but these studies only involved small subsets 6,000+ languages world today. By analyzing word lists covering nearly two-thirds world's languages, we demonstrate a considerable proportion 100 basic vocabulary items carry strong with specific kinds human...

10.1073/pnas.1605782113 article EN Proceedings of the National Academy of Sciences 2016-09-12

Grambank reveals the importance of genealogical constraints on linguistic diversity and highlights the impact of language loss

OPENALEX - Publications

Hedvig Skirgård Hannah J. Haynie Damián E. Blasí Harald Hammarström Jeremy Collins and 95 more

While global patterns of human genetic diversity are increasingly well characterized, the languages remains less systematically described. Here, we outline Grambank database. With over 400,000 data points and 2400 languages, is largest comparative grammatical database available. The comprehensiveness allows us to quantify relative effects genealogical inheritance geographic proximity on structural world's evaluate constraints linguistic diversity, identify most unusual languages. An analysis...

10.1126/sciadv.adg6175 article EN cc-by-nc Science Advances 2023-04-19

Automated Dating of the World’s Language Families Based on Lexical Similarity

OPENALEX - Publications

Eric W. Holman Cecil H. Brown Søren Wichmann André Müller Viveka Velupillai and 10 more

This paper describes a computerized alternative to glottochronology for estimating elapsed time since parent languages diverged into daughter languages. The method, developed by the Automated Similarity Judgment Program (ASJP) consortium, is different from in four major respects: (1) it automated and thus more objective, (2) applies uniform analytical approach single database of worldwide languages, (3) based on lexical similarity as determined Levenshtein (edit) distances rather than...

10.1086/662127 article EN Current Anthropology 2011-11-30

Unsupervised Learning of Morphology

OPENALEX - Publications

Harald Hammarström Lars Borin

This article surveys work on Unsupervised Learning of Morphology. We define Morphology as the problem inducing a description (of some kind, even if only morpheme-segmentation) how orthographic words are built up given raw text data language. briefly go through history and motivation this problem. Next, over 200 items listed with brief characterization, most important ideas in field critically discussed. summarize achievements so far give pointers for future developments.

10.1162/coli_a_00050 article EN cc-by-nc-nd Computational Linguistics 2011-04-05

Linguistic diversity and language evolution

OPENALEX - Publications

Harald Hammarström

What would your ideas about language evolution be if there was only one left on earth? Fortunately, our investigation need not that impoverished. In the present article, we survey state of knowledge regarding kinds found among humans, inventory, population sizes, time depth, grammatical variation, and other relevant issues a theory should minimally take into account.

10.1093/jole/lzw002 article EN Journal of Language Evolution 2016-01-01

Cross-Linguistic Data Formats, advancing data sharing and re-use in comparative linguistics

OPENALEX - Publications

Robert Forkel Johann‐Mattis List Simon J. Greenhill Christoph Rzymski Sebastian Bank and 5 more

Abstract The amount of available digital data for the languages world is constantly increasing. Unfortunately, most are provided in a large variety formats and therefore not amenable comparison re-use. Cross-Linguistic Data Formats initiative proposes new standards two basic types historical typological language (word lists, structural datasets) framework to incorporate more (e.g. parallel texts, dictionaries). specification cross-linguistic comes along with software package validation...

10.1038/sdata.2018.205 article EN cc-by Scientific Data 2018-10-16

Language documentation twenty-five years on

OPENALEX - Publications

Frank Seifart Nicholas Evans Harald Hammarström Stephen C. Levinson

This discussion note reviews responses of the linguistics profession to grave issues language endangerment identified a quarter century ago in journal Language by Krauss, Hale, England, Craig, and others (Hale et al. 1992). Two half decades worldwide research not only have given us much more accurate picture number, phylogeny, typological variety world's languages, but they also seen development wide range new approaches, conceptual technological, problem documenting them. We review these...

10.1353/lan.2018.0070 article EN Language 2018-01-01

Defining numeral classifiers and identifying classifier languages of the world

OPENALEX - Publications

One‐Soon Her Harald Hammarström Marc Allassonnière‐Tang

Abstract This paper presents a precise definition of numeral classifiers, steps to identify classifier language, and database 3,338 languages, which 723 languages have been identified as having system. The database, named World Atlas Classifier Languages (WACL), has systematically constructed over the last 10 years via manual survey relevant literature also an automatic scan digitized grammars followed by checking. open-access release WACL is thus significant contribution linguistic research...

10.1515/lingvan-2022-0006 article EN cc-by Linguistics Vanguard 2022-11-01

Innovative Numerals in Malayo-Polynesian Languages outside of Oceania

OPENALEX - Publications

Antoinette Schapper Harald Hammarström

In this paper, we seek to draw attention Malayo-Polynesian languages outside of the Oceanic subgroup with innovative bases and complex numerals involving various additive, subtractive, multiplicative procedures. We highlight fact that number showing such innovations is more than previously recognized in literature. Finally, observe concentration numeral region eastern Indonesia suggests Papuan influence, either through contact or substrate. However, also note sociocultural factors, form...

10.1353/ol.2013.0023 article EN Oceanic Linguistics 2013-01-01

A global analysis of matches and mismatches between human genetic and linguistic histories

OPENALEX - Publications

Chiara Barbieri Damián E. Blasí Epifanía Arango-Isaza Alexandros G. Sotiropoulos Harald Hammarström and 6 more

Human history is written in both our genes and languages. The extent to which biological linguistic histories are congruent has been the subject of considerable debate, with clear examples matches mismatches. To disentangle patterns demographic cultural transmission, we need a global systematic assessment Here, assemble genomic database (GeLaTo, or Genes Languages Together) specifically curated investigate genetic diversity worldwide. We find that most populations GeLaTo speak languages same...

10.1073/pnas.2122084119 article EN cc-by Proceedings of the National Academy of Sciences 2022-11-18

A full-scale test of the language farming dispersal hypothesis

OPENALEX - Publications

Harald Hammarström

One attempt at explaining why some language families are large (while others small) is the hypothesis that now became because their ancestral speakers had a technological advantage, most often agriculture. Variants of this idea referred to as Language Farming Dispersal Hypothesis. Previously, detailed family studies have uncovered various supporting examples and counterexamples idea. In present paper I weigh evidence from ALL attested families. For each family, use number member languages...

10.1075/dia.27.2.02ham article EN Diachronica 2010-10-11

Problems with, and alternatives to,the tree model in historical linguistics

OPENALEX - Publications

Siva Kalyan Alexandre François Harald Hammarström

Preview this article: Problems with, and alternatives to,the tree model in historical linguistics, Page 1 of < Previous page | Next > /docserver/preview/fulltext/jhl.00005.kal-1.gif

10.1075/jhl.00005.kal article EN Journal of Historical Linguistics 2019-07-02

Glottocodes: Identifiers linking families, languages and dialects to comprehensive reference information

OPENALEX - Publications

Robert Forkel Harald Hammarström

Glottocodes constitute the backbone identification system for language, dialect and family inventory Glottolog (https://glottolog.org). In this paper, we summarize motivation history behind of glottocodes describe principles practices data curation, technical infrastructure update/version-tracking systematics. Since our understanding target domain – dialects, languages language families entire world is continually evolving, changes updates are relatively common. The resulting assessed in...

10.3233/sw-212843 article EN other-oa Semantic Web 2022-01-14

Some Principles on the Use of Macro-Areas in Typological Comparison

OPENALEX - Publications

Harald Hammarström Mark Donohue

While the notion of ‘area’ or ‘Sprachbund’ has a long history in linguistics, with geographically-defined regions frequently cited as useful means to explain typological distributions, problem delimiting areas not been well addressed. Lists general-purpose, largely independent ‘macro-areas’ (typically continent size) have proposed step rule out contact an explanation for various large-scale linguistic phenomena. This squib points some problems currently widely-used predetermined areas, those...

10.1163/22105832-00401001 article EN Language Dynamics and Change 2014-01-01

Counting Languages in Dialect Continua Using the Criterion of Mutual Intelligibility*

OPENALEX - Publications

Harald Hammarström

Abstract This paper shows how it is possible to count languages vs. dialects if, for every pair of varieties, we are given whether they mutually intelligible or not. The method divide the varieties into a minimum number internally groups where each group counts as one language. Expressed in terms graphs (as discrete mathematics), even easier understood as: applying graph-colouring graph over with intelligibility interrelationships edges. Graph colouring already mathematically well-understood...

10.1080/09296170701794278 article EN Journal of Quantitative Linguistics 2008-01-17

Coming Soon ...