- Psychometric Methodologies and Testing
- Advanced Statistical Modeling Techniques
- Mental Health Research Topics
- Neural Networks and Applications
- Gene expression and cancer classification
- Statistical Methods and Inference
- Data Mining Algorithms and Applications
- Data Analysis with R
- Advanced Statistical Methods and Models
- Bayesian Modeling and Causal Inference
- Statistical Methods and Bayesian Inference
- Bioinformatics and Genomic Networks
- Cognitive Abilities and Testing
- Genetic and phenotypic traits in livestock
- Explainable Artificial Intelligence (XAI)
- Imbalanced Data Classification Techniques
- Statistical Methods in Clinical Trials
- Evolutionary Algorithms and Applications
- Bayesian Methods and Mixture Models
- Multi-Criteria Decision Making
- Forest ecology and management
- Statistics Education and Methodologies
- Sociology and Education Studies
- Genetic Associations and Epidemiology
- Optimal Experimental Design Methods
University of Zurich
2016-2025
University of Basel
2023
Ludwig-Maximilians-Universität München
2005-2021
Indiana University
2021
University of Washington
2021
University of Missouri
2020
Universität Innsbruck
2011-2016
University Hospital Heidelberg
2011
Zimmer Biomet (Germany)
2011
Heidelberg University
2011
Variable importance measures for random forests have been receiving increased attention as a means of variable selection in many classification tasks bioinformatics and related scientific fields, instance to select subset genetic markers relevant the prediction certain disease. We show that forest are sensible applications, but not reliable situations where potential predictor variables vary their scale measurement or number categories. This is particularly important genomics computational...
Random forests are becoming increasingly popular in many scientific fields because they can cope with "small n large p" problems, complex interactions and even highly correlated predictor variables. Their variable importance measures have recently been suggested as screening tools for, e.g., gene expression studies. However, these show a bias towards We identify two mechanisms responsible for this finding: (i) A preference the selection of predictors tree building process (ii) an additional...
Recursive partitioning methods have become popular and widely used tools for nonparametric regression classification in many scientific fields. Especially random forests, which can deal with large numbers of predictor variables even the presence complex interactions, been applied successfully genetics, clinical medicine, bioinformatics within past few years. High-dimensional problems are common not only but also some areas psychological research, where a subjects be measured because time or...
Random forests (RF) have been increasingly used in applications such as genome-wide association and microarray studies where predictor correlation is frequently observed. Recent works on permutation-based variable importance measures (VIMs) RF come to apparently contradictory conclusions. We present an extended simulation study synthesize results. In the case when both was predictors were associated with outcome (HA), unconditional VIM attributed a higher share of correlated predictors,...
The random forest (RF) method is a commonly used tool for classification with high dimensional data as well ranking candidate predictors based on the so-called variable importance measures (VIMs). However performance of RF known to be suboptimal in case strongly unbalanced data, i.e. where response class sizes differ considerably. Suggestions were made obtain better either sampling procedures or cost sensitivity analyses. our knowledge VIMs has not yet been examined classes. In this paper we...
Random forest based variable importance measures have become popular tools for assessing the contributions of predictor variables in a fitted random forest. In this article we reconsider frequently used measure, Conditional Permutation Importance (CPI). We argue and illustrate that CPI corresponds to more partial quantification suggest several improvements its methodology implementation enhance practical value. addition, introduce threshold value algorithm as parameter can make or...
In this crowdsourced initiative, independent analysts used the same dataset to test two hypotheses regarding effects of scientists' gender and professional status on verbosity during group meetings. Not only analytic approach but also operationalizations key variables were left unconstrained up individual analysts. For instance, could choose operationalize as job title, institutional ranking, citation counts, or some combination. To maximize transparency process by which choices are made, a...
Differential item functioning (DIF) indicates the violation of invariance assumption, for instance, in models based on response theory (IRT). For item-wise DIF analysis using IRT, a common metric parameters groups that are to be compared (e.g., reference and focal group) is necessary. In Rasch model, therefore, same linear restriction imposed both groups. Items termed ``anchor items''. Ideally, these items DIF-free avoid artificially augmented false alarm rates. However, question how anchor...
The high prevalence of depression in a growing aging population represents critical public health issue. It is unclear how social, health, cognitive, and functional variables rank as risk/protective factors for among older adults whether there are conspicuous differences men women.We used random forest analysis (RFA), machine learning method, to compare 56 large representative sample European (N = 67,603; ages 45-105y; 56.1% women; 18 countries) from the Survey Health, Ageing Retirement...
In recent years, machine learning methods have become increasingly popular prediction in psychology. At the same time, psychological researchers are typically not only interested making predictions about dependent variable, but also which predictor variables relevant, how they influence and predictors interact with each other. However, most directly interpretable. Interpretation techniques that support describing technique came to its may be a means this end. We present variety of...
The use of random forests is increasingly common in genetic association studies. variable importance measure (VIM) that automatically calculated as a by-product the algorithm often used to rank polymorphisms with respect their ability predict investigated phenotype. Here, we investigate characteristic this methodology may be considered an important pitfall, namely variants are systematically favoured by widely Gini VIM. As consequence, researchers overlook rare contribute missing...
For the last eight years, microarray-based class prediction has been subject of numerous publications in medicine, bioinformatics and statistics journals. However, many articles, assessment classification accuracy is carried out using suboptimal procedures not paid much attention. In this paper, we carefully review various statistical aspects classifier evaluation validation from a practical point view. The main topics addressed are measures, error rate estimation procedures, variable...
Random forests have become a widely-used predictive model in many scientific disciplines within the past few years. Additionally, they are increasingly popular for assessing variable importance, e.g., genetics and bioinformatics. We highlight both advantages limitations of different importance scores associated testing procedures, especially context correlated predictor variables. For test Breiman Cutler (2008), we investigate statistical properties find that power depends on sample size and...
Summary Among saproxylic beetles, many early colonizers prefer particular host species. Ranking of preferred hosts local beetle communities is critical for effective dead‐wood management in forests, but rarely done because experiments with numerous tree species are labour and cost intensive. We analysed the preference on logs 13 relation to (unmanaged managed beech stands, conifer plantations natural sites) three regions Germany during most period specificity, that first two years after...
The preference scaling of a group subjects may not be homogeneous, but different groups with certain characteristics show scalings, each which can derived from paired comparisons by means the Bradley-Terry model. Usually, either models are fit in predefined subsets sample or effects subject covariates explicitly specified parametric In both cases, categorical employed directly to distinguish between groups, while numeric typically discretized prior modeling. Here, semiparametric approach for...
In biometric practice, researchers often apply a large number of different methods in "trial-and-error" strategy to get as much possible out their data and, due publication pressure or from the consulting customer, present only most favorable results. This may induce substantial optimistic bias prediction error estimation, which is quantitatively assessed manuscript. The focus our work on class based high-dimensional (e.g. microarray data), since such analyses are particularly exposed this...
Stability is a major requirement to draw reliable conclusions when interpreting results from supervised statistical learning. In this article, we present general framework for assessing and comparing the stability of results, which can be used in real-world learning applications as well simulation benchmark studies. We use show that property both algorithm data-generating process. particular, demonstrate unstable algorithms (such recursive partitioning) produce stable functional form...