Principal component analysis of incomplete data – A simple solution to an old problem

Biplot Imputation (statistics) Data Matrix Biological data Multiple correspondence analysis
DOI: 10.1016/j.ecoinf.2021.101235 Publication Date: 2021-01-23T08:33:54Z
ABSTRACT
A long-standing problem in biological data analysis is the unintentional absence of values for some observations or variables, preventing use standard multivariate exploratory methods, such as principal component (PCA). Solutions include deleting parts by which information lost, imputation, always arbitrary, and restriction to either variables observations, thereby losing advantages biplot diagrams. We describe a minor modification eigenanalysis-based PCA correlations covariances are calculated using different numbers each pair resulting eigenvalues eigenvectors used calculate scores that missing skipped. This procedure avoids artificial exhausts all from allows preparation biplots simultaneous display ordination observations. The modified PCA, called InDaPCA (PCA Incomplete Data) demonstrated on actual examples: leaf functional traits plants, invertebrates, cranial morphometry crocodiles fish hybridization – with biologically meaningful results. Our study suggests it not percentage entries matrix matters; success mostly affected minimum number available comparing given variables. In present study, interpretation results space first two components was hindered, however.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (28)
CITATIONS (31)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....