Carolin Strobl

ORCID: 0000-0003-0952-3230
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Psychometric Methodologies and Testing
  • Advanced Statistical Modeling Techniques
  • Mental Health Research Topics
  • Neural Networks and Applications
  • Gene expression and cancer classification
  • Statistical Methods and Inference
  • Data Mining Algorithms and Applications
  • Data Analysis with R
  • Advanced Statistical Methods and Models
  • Bayesian Modeling and Causal Inference
  • Statistical Methods and Bayesian Inference
  • Bioinformatics and Genomic Networks
  • Cognitive Abilities and Testing
  • Genetic and phenotypic traits in livestock
  • Explainable Artificial Intelligence (XAI)
  • Imbalanced Data Classification Techniques
  • Statistical Methods in Clinical Trials
  • Evolutionary Algorithms and Applications
  • Bayesian Methods and Mixture Models
  • Multi-Criteria Decision Making
  • Forest ecology and management
  • Statistics Education and Methodologies
  • Sociology and Education Studies
  • Genetic Associations and Epidemiology
  • Optimal Experimental Design Methods

University of Zurich
2016-2025

University of Basel
2023

Ludwig-Maximilians-Universität München
2005-2021

Indiana University
2021

University of Washington
2021

University of Missouri
2020

Universität Innsbruck
2011-2016

University Hospital Heidelberg
2011

Zimmer Biomet (Germany)
2011

Heidelberg University
2011

Variable importance measures for random forests have been receiving increased attention as a means of variable selection in many classification tasks bioinformatics and related scientific fields, instance to select subset genetic markers relevant the prediction certain disease. We show that forest are sensible applications, but not reliable situations where potential predictor variables vary their scale measurement or number categories. This is particularly important genomics computational...

10.1186/1471-2105-8-25 article EN cc-by BMC Bioinformatics 2007-01-25

Random forests are becoming increasingly popular in many scientific fields because they can cope with "small n large p" problems, complex interactions and even highly correlated predictor variables. Their variable importance measures have recently been suggested as screening tools for, e.g., gene expression studies. However, these show a bias towards We identify two mechanisms responsible for this finding: (i) A preference the selection of predictors tree building process (ii) an additional...

10.1186/1471-2105-9-307 article EN cc-by BMC Bioinformatics 2008-07-11

Recursive partitioning methods have become popular and widely used tools for nonparametric regression classification in many scientific fields. Especially random forests, which can deal with large numbers of predictor variables even the presence complex interactions, been applied successfully genetics, clinical medicine, bioinformatics within past few years. High-dimensional problems are common not only but also some areas psychological research, where a subjects be measured because time or...

10.1037/a0016973 article EN Psychological Methods 2009-12-01

Random forests (RF) have been increasingly used in applications such as genome-wide association and microarray studies where predictor correlation is frequently observed. Recent works on permutation-based variable importance measures (VIMs) RF come to apparently contradictory conclusions. We present an extended simulation study synthesize results. In the case when both was predictors were associated with outcome (HA), unconditional VIM attributed a higher share of correlated predictors,...

10.1186/1471-2105-11-110 article EN cc-by BMC Bioinformatics 2010-02-27

The random forest (RF) method is a commonly used tool for classification with high dimensional data as well ranking candidate predictors based on the so-called variable importance measures (VIMs). However performance of RF known to be suboptimal in case strongly unbalanced data, i.e. where response class sizes differ considerably. Suggestions were made obtain better either sampling procedures or cost sensitivity analyses. our knowledge VIMs has not yet been examined classes. In this paper we...

10.1186/1471-2105-14-119 article EN cc-by BMC Bioinformatics 2013-04-05

10.32614/rj-2009-013 article The R Journal 2009-01-01

Random forest based variable importance measures have become popular tools for assessing the contributions of predictor variables in a fitted random forest. In this article we reconsider frequently used measure, Conditional Permutation Importance (CPI). We argue and illustrate that CPI corresponds to more partial quantification suggest several improvements its methodology implementation enhance practical value. addition, introduce threshold value algorithm as parameter can make or...

10.1186/s12859-020-03622-2 article EN cc-by BMC Bioinformatics 2020-07-14
Martin Schweinsberg Michael B. Feldman Nicola Staub Olmo R. van den Akker Robbie C. M. van Aert and 95 more Marcel A. L. M. van Assen Yang Liu Tim Althoff Jeffrey Heer Alex Kale Zainab Mohamed Hashem Amireh Vaishali Venkatesh Prasad Abraham Bernstein Emily V. Robinson Kaisa Snellman S. Amy Sommer Sarah M. G. Otner David Robinson Nikhil Madan Raphael Silberzahn Pavel Goldstein Warren Tierney Toshio Murase Benjamin Mandl Domenico Viganola Carolin Strobl Catherine Schaumans Stijn Kelchtermans Chan Naseeb S. Mason Garrison Tal Yarkoni C.S. Richard Chan Prestone Adie Paulius Alaburda Casper J. Albers Sara Alspaugh Jeff Alstott Andrew A. Nelson Eduardo Ariño de la Rubia Arzi Adbi Štěpán Bahník Jason Min Baik Laura Winther Balling Sachin Banker David A. A. Baranger Dale J. Barr Brenda A. Barros-Rivera Matt Bauer Blaise Manga Enuh Lisa Boelen Katerina Bohle Carbonell Robert A. Briers Oliver Burkhard Miguel-Angel Canela Laura Castrillo Timothy Catlett Olivia Chen Michael Clark Brent Cohn Alex Coppock Natàlia Cugueró-Escofet Paul Curran Wilson Cyrus-Lai David Dai Giulio Valentino Dalla Riva Henrik Danielsson Rosaria de F.S.M. Russo Niko de Silva Curdin Derungs Frank Dondelinger Carolina Duarte de Souza Blessing Dube Marina Dubova Ben Mark Dunn Peter A. Edelsbrunner Sara Finley Nick C. Fox Timo Gnambs Yuanyuan Gong Erin Grand Brandon Greenawalt Han Dan Paul H. P. Hanel Antony B. Hong David D. Hood Justin Hsueh Lilian Huang Kent Ngan‐Cheung Hui Keith A. Hultman Azka Javaid Lily J. Jiang Jonathan Jong Jash Kamdar David Kane Gregor Kappler Erikson Kaszubowski Christopher Kavanagh Madian Khabsa Bennett Kleinberg

In this crowdsourced initiative, independent analysts used the same dataset to test two hypotheses regarding effects of scientists' gender and professional status on verbosity during group meetings. Not only analytic approach but also operationalizations key variables were left unconstrained up individual analysts. For instance, could choose operationalize as job title, institutional ranking, citation counts, or some combination. To maximize transparency process by which choices are made, a...

10.1016/j.obhdp.2021.02.003 article EN cc-by Organizational Behavior and Human Decision Processes 2021-06-17

Differential item functioning (DIF) indicates the violation of invariance assumption, for instance, in models based on response theory (IRT). For item-wise DIF analysis using IRT, a common metric parameters groups that are to be compared (e.g., reference and focal group) is necessary. In Rasch model, therefore, same linear restriction imposed both groups. Items termed ``anchor items''. Ideally, these items DIF-free avoid artificially augmented false alarm rates. However, question how anchor...

10.1177/0013164414529792 article EN Educational and Psychological Measurement 2014-04-21

The high prevalence of depression in a growing aging population represents critical public health issue. It is unclear how social, health, cognitive, and functional variables rank as risk/protective factors for among older adults whether there are conspicuous differences men women.We used random forest analysis (RFA), machine learning method, to compare 56 large representative sample European (N = 67,603; ages 45-105y; 56.1% women; 18 countries) from the Survey Health, Ageing Retirement...

10.1016/j.lanepe.2022.100391 article EN cc-by-nc-nd The Lancet Regional Health - Europe 2022-04-29

In recent years, machine learning methods have become increasingly popular prediction in psychology. At the same time, psychological researchers are typically not only interested making predictions about dependent variable, but also which predictor variables relevant, how they influence and predictors interact with each other. However, most directly interpretable. Interpretation techniques that support describing technique came to its may be a means this end. We present variety of...

10.1037/met0000560 article EN Psychological Methods 2023-05-25

The use of random forests is increasingly common in genetic association studies. variable importance measure (VIM) that automatically calculated as a by-product the algorithm often used to rank polymorphisms with respect their ability predict investigated phenotype. Here, we investigate characteristic this methodology may be considered an important pitfall, namely variants are systematically favoured by widely Gini VIM. As consequence, researchers overlook rare contribute missing...

10.1093/bib/bbr053 article EN Briefings in Bioinformatics 2011-09-10

For the last eight years, microarray-based class prediction has been subject of numerous publications in medicine, bioinformatics and statistics journals. However, many articles, assessment classification accuracy is carried out using suboptimal procedures not paid much attention. In this paper, we carefully review various statistical aspects classifier evaluation validation from a practical point view. The main topics addressed are measures, error rate estimation procedures, variable...

10.4137/cin.s408 article EN cc-by-nc Cancer Informatics 2008-01-01

Random forests have become a widely-used predictive model in many scientific disciplines within the past few years. Additionally, they are increasingly popular for assessing variable importance, e.g., genetics and bioinformatics. We highlight both advantages limitations of different importance scores associated testing procedures, especially context correlated predictor variables. For test Breiman Cutler (2008), we investigate statistical properties find that power depends on sample size and...

10.5282/ubm/epub.2111 article EN 2008-01-30

Summary Among saproxylic beetles, many early colonizers prefer particular host species. Ranking of preferred hosts local beetle communities is critical for effective dead‐wood management in forests, but rarely done because experiments with numerous tree species are labour and cost intensive. We analysed the preference on logs 13 relation to (unmanaged managed beech stands, conifer plantations natural sites) three regions Germany during most period specificity, that first two years after...

10.1111/1365-2664.12421 article EN Journal of Applied Ecology 2015-03-06

The preference scaling of a group subjects may not be homogeneous, but different groups with certain characteristics show scalings, each which can derived from paired comparisons by means the Bradley-Terry model. Usually, either models are fit in predefined subsets sample or effects subject covariates explicitly specified parametric In both cases, categorical employed directly to distinguish between groups, while numeric typically discretized prior modeling. Here, semiparametric approach for...

10.3102/1076998609359791 article EN Journal of Educational and Behavioral Statistics 2011-04-01

In biometric practice, researchers often apply a large number of different methods in "trial-and-error" strategy to get as much possible out their data and, due publication pressure or from the consulting customer, present only most favorable results. This may induce substantial optimistic bias prediction error estimation, which is quantitatively assessed manuscript. The focus our work on class based high-dimensional (e.g. microarray data), since such analyses are particularly exposed this...

10.1186/1471-2288-9-85 article EN cc-by BMC Medical Research Methodology 2009-12-01

Stability is a major requirement to draw reliable conclusions when interpreting results from supervised statistical learning. In this article, we present general framework for assessing and comparing the stability of results, which can be used in real-world learning applications as well simulation benchmark studies. We use show that property both algorithm data-generating process. particular, demonstrate unstable algorithms (such recursive partitioning) produce stable functional form...

10.1080/10618600.2018.1473779 article EN Journal of Computational and Graphical Statistics 2018-05-18
Coming Soon ...