Dimension reduction in principal component analysis for trees

Methodology (stat.ME) FOS: Computer and information sciences 0101 mathematics 01 natural sciences Statistics - Methodology
DOI: 10.1016/j.csda.2013.12.007 Publication Date: 2014-01-10T11:02:33Z
ABSTRACT
The statistical analysis of tree structured data is a new topic in statistics with wide application areas. Some Principal Component Analysis (PCA) ideas were previously developed for binary tree spaces. In this study, we extend these ideas to the more general space of rooted and labeled trees. We re-define concepts such as tree-line and forward principal component tree-line for this more general space, and generalize the optimal algorithm that finds them. We then develop an analog of classical dimension reduction technique in PCA for the tree space. To do this, we define the components that carry the least amount of variation of a tree data set, called backward principal components. We present an optimal algorithm to find them. Furthermore, we investigate the relationship of these the forward principal components, and prove a path-independency property between the forward and backward techniques. We apply our methods to a data set of brain artery data set of 98 subjects. Using our techniques, we investigate how aging affects the brain artery structure of males and females. We also analyze a data set of organization structure of a large US company and explore the structural differences across different types of departments within the company.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (25)
CITATIONS (12)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....