Towards a Formal Genealogical Classification of the Lezgian Languages (North Caucasus): Testing Various Phylogenetic Methods on Lexical Data

UPGMA Similarity (geometry) Levenshtein distance
DOI: 10.1371/journal.pone.0116950 Publication Date: 2015-02-26T18:52:33Z
ABSTRACT
A lexicostatistical classification is proposed for 20 languages and dialects of the Lezgian group North Caucasian family, based on meticulously compiled 110-item wordlists, published as part Global Lexicostatistical Database project. The lexical data have been subsequently analyzed with aid principal phylogenetic methods, both distance-based character-based: Starling neighbor joining (StarlingNJ), Neighbor (NJ), Unweighted pair method arithmetic mean (UPGMA), Bayesian Markov chain Monte Carlo (MCMC), maximum parsimony (UMP). Cognation indexes within input matrix were marked by two different algorithms: traditional etymological approach phonetic similarity, i.e., automatic consonant classes (Levenshtein distances). Due to certain reasons (first all, high lexicographic quality wordlists a consensus about phylogeny among Caucasologists), database perfect testing area appraisal methods. For etymology-based matrix, all possible exception UMP, yielded trees that are sufficiently compatible each other generate tree lects. obtained agrees expert well some previously formal classifications this linguistic group. Contrary theoretical expectations, UMP has suggested least plausible all. In case similarity-based methods (StarlingNJ, NJ, UPGMA) produced rather close classification, whereas character-based (Bayesian MCMC, UMP) less likely topologies.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (55)
CITATIONS (8)