- Statistical Methods and Inference
- Statistical Methods and Bayesian Inference
- Statistical Methods in Clinical Trials
- Advanced Bandit Algorithms Research
- Advanced Causal Inference Techniques
- Advanced Statistical Process Monitoring
- Machine Learning and Algorithms
- Probability and Risk Models
- Bayesian Methods and Mixture Models
- Sports Analytics and Performance
- Simulation Techniques and Applications
- Numerical Methods and Algorithms
- Mobile Crowdsensing and Crowdsourcing
- Privacy-Preserving Technologies in Data
- Economic and Environmental Valuation
- Internet Traffic Analysis and Secure E-voting
- Blood Pressure and Hypertension Studies
- Multi-Criteria Decision Making
- Probability and Statistical Research
- Machine Learning in Healthcare
- Efficiency Analysis Using DEA
- Advanced Statistical Methods and Models
- Health Systems, Economic Evaluations, Quality of Life
- Reinforcement Learning in Robotics
- Statistical Mechanics and Entropy
Carnegie Mellon University
2020-2024
University of Waterloo
2018
Abstract We derive confidence intervals (CIs) and sequences (CSs) for the classical problem of estimating a bounded mean. Our approach generalizes improves on celebrated Chernoff method, yielding best closed-form "empirical-Bernstein" CSs CIs (converging exactly to oracle Bernstein width) as well non-closed-form "betting" CIs. method combines new composite nonnegative (super)martingales with Ville's maximal inequality, strong connections testing by betting mixtures. also show how these ideas...
Background Nursing notes have not been widely used in prediction models for clinical outcomes, despite containing rich information. Advances natural language processing made it possible to extract information from large scale unstructured data like nursing notes. This study extracted the sentiment—impressions and attitudes—of nurses, examined how sentiment relates 30-day mortality survival. Methods applied a analysis algorithm MIMIC-III, public intensive care unit (ICU) database. A multiple...
We present nonasymptotic concentration inequalities for sums of independent and identically distributed random variables that yield asymptotic strong Gaussian approximations Koml\'os, Major, Tusn\'ady (KMT) [1975,1976]. The constants appearing in our are either universal or explicit, thus as corollaries, they imply distribution-uniform generalizations the aforementioned KMT approximations. In particular, it is shown uniform integrability a variable's $q^{\text{th}}$ moment both necessary...
We consider the problem of sequential hypothesis testing by betting. For a general class composite problems -- which include bounded mean testing, equal for random tuples, and some key ingredients two-sample independence as special cases we show that any $e$-process satisfying certain sublinear regret bound is adaptively, asymptotically, almost surely log-optimal alternative. This strong notion optimality has not previously been established aforementioned provide explicit test...
A/B tests are the gold standard for evaluating digital experiences on web. However, traditional "fixed-horizon" statistical methods often incompatible with needs of modern industry practitioners as they do not permit continuous monitoring experiments. Frequent evaluation fixed-horizon ("peeking") leads to inflated type-I error and can result in erroneous conclusions. We have released an experimentation service Adobe Experience Platform based anytime-valid confidence sequences, allowing test...
Many practical tasks involve sampling sequentially without replacement (WoR) from a finite population of size $N$, in an attempt to estimate some parameter $θ^\star$. Accurately quantifying uncertainty throughout this process is nontrivial task, but necessary because it often determines when we stop collecting samples and confidently report result. We present suite tools for designing confidence sequences (CS) A CS sequence sets $(C_n)_{n=1}^N$, that shrink size, all contain $θ^\star$...
This paper derives confidence intervals (CI) and time-uniform sequences (CS) for the classical problem of estimating an unknown mean from bounded observations. We present a general approach deriving concentration bounds, that can be seen as generalization improvement celebrated Chernoff method. At its heart, it is based on class composite nonnegative martingales, with strong connections to testing by betting method mixtures. show how extend these ideas sampling without replacement, another...
Introduction. The EQ-5D-5L valuation protocol contains both time tradeoff (TTO) tasks and discrete choice experiments (DCE), raising the question of how to best use these in creating a value set. hybrid model, which combines TTO DCE data, has emerged as commonly used approach. However, this model assumes independence among responses from same individual, linear relationship between utilities, and, many implementations, homoscedastic residuals. aims study are examine alternatives assumptions...
Contextual bandit algorithms are ubiquitous tools for active sequential experimentation in healthcare and the tech industry. They involve online learning that adaptively learn policies over time to map observed contexts X t actions A an attempt maximize stochastic rewards R . This adaptivity raises interesting but hard statistical inference questions, especially counterfactual ones: example, it is often of interest estimate properties a hypothetical policy different from logging was used...
We revisit the question of whether strong law large numbers (SLLN) holds uniformly in a rich family distributions, culminating distribution-uniform generalization Marcinkiewicz-Zygmund SLLN. These results can be viewed as extensions Chung's SLLN to random variables with integrable $q^\text{th}$ absolute central moments for $0 < q 2;\ \neq 1$. Furthermore, we show that uniform integrability moment is both sufficient and necessary hold at rate $n^{1/q - 1}$. proofs centrally rely on analogues...
Confidence intervals based on the central limit theorem (CLT) are a cornerstone of classical statistics. Despite being only asymptotically valid, they ubiquitous because permit statistical inference under weak assumptions and can often be applied to problems even when nonasymptotic is impossible. This paper introduces time-uniform analogues such asymptotic confidence intervals, adding literature sequences (CS) -- that uniformly valid over time which provide at arbitrary stopping times incur...
This work derives methods for performing nonparametric, nonasymptotic statistical inference population means under the constraint of local differential privacy (LDP). Given bounded observations $(X_1, \dots, X_n)$ with mean $\mu^\star$ that are privatized into $(Z_1, Z_n)$, we present confidence intervals (CI) and time-uniform sequences (CS) when only given access to data. To achieve this, introduce a nonparametric sequentially interactive generalization Warner's famous ``randomized...
Are asymptotic confidence sequences and anytime $p$-values uniformly valid for a nontrivial class of distributions $\mathcal{P}$? We give positive answer to this question by deriving distribution-uniform anytime-valid inference procedures. Historically, methods -- including sequences, $p$-values, sequential hypothesis tests that enable at stopping times have been justified nonasymptotically. Nevertheless, procedures such as those based on the central limit theorem occupy an important part...
Contextual bandit algorithms are ubiquitous tools for active sequential experimentation in healthcare and the tech industry. They involve online learning that adaptively learn policies over time to map observed contexts $X_t$ actions $A_t$ an attempt maximize stochastic rewards $R_t$. This adaptivity raises interesting but hard statistical inference questions, especially counterfactual ones: example, it is often of interest estimate properties a hypothetical policy different from logging was...
Accurately determining the outcome of an election is a complex task with many potential sources error, ranging from software glitches in voting machines to procedural lapses outright fraud. Risk-limiting audits (RLA) are statistically principled "incremental" hand counts that provide statistical assurance reported outcomes accurately reflect validly cast votes. We present suite tools for conducting RLAs using confidence sequences -- sets which uniformly capture electoral parameter interest...