- Statistical Methods and Inference
- Statistical Methods in Clinical Trials
- Machine Learning and Algorithms
- Advanced Bandit Algorithms Research
- Advanced Statistical Methods and Models
- Advanced Statistical Process Monitoring
- Statistical Methods and Bayesian Inference
- Bayesian Methods and Mixture Models
- Advanced Causal Inference Techniques
- Machine Learning and Data Classification
- Markov Chains and Monte Carlo Methods
- Bayesian Modeling and Causal Inference
- Stochastic Gradient Optimization Techniques
- Sparse and Compressive Sensing Techniques
- Gaussian Processes and Bayesian Inference
- Adversarial Robustness in Machine Learning
- Gene expression and cancer classification
- Random Matrices and Applications
- Anomaly Detection Techniques and Applications
- Probability and Risk Models
- Imbalanced Data Classification Techniques
- Optimal Experimental Design Methods
- Distributed Sensor Networks and Detection Algorithms
- Sports Analytics and Performance
- Auction Theory and Applications
Carnegie Mellon University
2016-2025
Google (United States)
2023-2024
University of Waterloo
2021
University of California, Berkeley
2015-2020
University of Chicago
2019
Amazon (Germany)
2019
Berkeley College
2015
Nonparametric two-sample or homogeneity testing is a decision theoretic problem that involves identifying differences between two random variables without making parametric assumptions about their underlying distributions. The literature old and rich, with wide variety of statistics having being designed analyzed, both for the unidimensional multivariate setting. In this short survey, we focus on test involve Wasserstein distance. Using an entropic smoothing distance, connect these to very...
Story understanding involves many perceptual and cognitive subprocesses, from perceiving individual words, to parsing sentences, the relationships among story characters. We present an integrated computational model of reading that incorporates these additional simultaneously discovering their fMRI signatures. Our predicts activity associated with arbitrary text passages, well enough distinguish which two segments is being read 74% accuracy. This approach first track diverse subprocesses...
The Kaczmarz and Gauss--Seidel methods both solve a linear system $\boldsymbol{X}{\boldsymbol{\beta}} = \boldsymbol{y}$ by iteratively refining the solution estimate. Recent interest in these has been sparked proof of Strohmer Vershynin which shows randomized method converges linearly expectation to solution. Lewis Leventhal then proved similar result for algorithm. However, behavior depends heavily on whether is underdetermined or overdetermined, it consistent not. Here we provide unified...
This paper introduces the jackknife+, which is a novel method for constructing predictive confidence intervals. Whereas jackknife outputs an interval centered at predicted response of test point, with width determined by quantiles leave-one-out residuals, jackknife+ also uses predictions point to account variability in fitted regression function. Assuming exchangeable training samples, we prove that this crucial modification permits rigorous coverage guarantees regardless distribution data...
Conformal prediction is a popular, modern technique for providing valid predictive inference arbitrary machine learning models. Its validity relies on the assumptions of exchangeability data, and symmetry given model fitting algorithm as function data. However, often violated when models are deployed in practice. For example, if data distribution drifts over time, then points no longer exchangeable; moreover, such settings, we might want to use nonsymmetric that treats recent observations...
Abstract We derive confidence intervals (CIs) and sequences (CSs) for the classical problem of estimating a bounded mean. Our approach generalizes improves on celebrated Chernoff method, yielding best closed-form "empirical-Bernstein" CSs CIs (converging exactly to oracle Bernstein width) as well non-closed-form "betting" CIs. method combines new composite nonnegative (super)martingales with Ville's maximal inequality, strong connections testing by betting mixtures. also show how these ideas...
This paper is about two related decision theoretic problems, nonparametric two-sample testing and independence testing. There a belief that recently proposed solutions, based on kernels distances between pairs of points, behave well in high-dimensional settings. We identify different sources misconception give rise to the above belief. Specifically, we differentiate hardness estimation test statistics from whether these are zero or not, explicitly discuss notion "fair" alternative hypotheses...
Significance Most statistical methods rely on certain mathematical conditions, known as regularity assumptions, to ensure their validity. Without these quantities like P values and confidence intervals might not be valid. In this paper we give a surprisingly simple method for producing significance statements without any conditions. The resulting hypothesis tests can used parametric model several nonparametric models.
We propose a method to optimize the representation and distinguishability of samples from two probability distributions, by maximizing estimated power statistical test based on maximum mean discrepancy (MMD). This optimized MMD is applied setting unsupervised learning generative adversarial networks (GAN), in which model attempts generate realistic samples, discriminator tell these apart data samples. In this context, may be used roles: first, as discriminator, either directly or features...
Abstract We consider the problem of distribution-free predictive inference, with goal producing coverage guarantees that hold conditionally rather than marginally. Existing methods such as conformal prediction offer marginal guarantees, where holds on average over all possible test points, but this is not sufficient for many practical applications we would like to know our predictions are valid a given individual, merely population. On other hand, exact conditional inference known be...
A confidence sequence is a of intervals that uniformly valid over an unbounded time horizon. Our work develops sequences whose widths go to zero, with nonasymptotic coverage guarantees under nonparametric conditions. We draw connections between the Cram\'er-Chernoff method for exponential concentration, law iterated logarithm (LIL), and sequential probability ratio test -- our are time-uniform extensions first; provide tight, characterizations second; generalize third settings, including...
Abstract E-values have gained attention as potential alternatives to p-values measures of uncertainty, significance and evidence. In brief, e-values are realized by random variables with expectation at most one under the null; examples include betting scores, (point null) Bayes factors, likelihood ratios stopped supermartingales. We design a natural analogue Benjamini-Hochberg (BH) procedure for false discovery rate (FDR) control that utilizes e-values, called e-BH procedure, compare it...
Safe anytime-valid inference (SAVI) provides measures of statistical evidence and certainty—e-processes for testing confidence sequences estimation—that remain valid at all stopping times, accommodating continuous monitoring analysis accumulating data optional or continuation any reason. These crucially rely on test martingales, which are nonnegative martingales starting one. Since a martingale is the wealth process player in betting game, SAVI centrally employs game-theoretic intuition,...
We develop a class of exponential bounds for the probability that martingale sequence crosses time-dependent linear threshold. Our key insight is it both natural and fruitful to formulate concentration inequalities in this way. illustrate point by presenting single assumption theorem together unify strengthen many tail martingales, including classical (1960–80) Bernstein, Bennett, Hoeffding, Freedman; contemporary (1980–2000) Shorack Wellner, Pinelis, Blackwell, van de Geer, la Peña; several...
We extend conformal prediction methodology beyond the case of exchangeable data. In particular, we show that a weighted version can be used to compute distribution-free intervals for problems in which test and training covariate distributions differ, but likelihood ratio between these two is known---or, practice, estimated accurately with access large set unlabeled data (test points). Our extension also applies more generally, settings satisfies certain notion exchangeability. discuss other...
When data analysts train a classifier and check if its accuracy is significantly different from chance, they are implicitly performing two-sample test. We investigate the statistical properties of this flexible approach in high-dimensional setting. prove two results that hold for all classifiers any dimensions: true error remains $\epsilon $-better than chance some >0$ as $d,n\to \infty $, then (a) permutation-based test consistent (has power approaching to one), (b) computationally...
We propose confidence sequences -- of intervals which are valid uniformly over time for quantiles any distribution a complete, fully-ordered set, based on stream i.i.d. observations. give methods both tracking fixed quantile and all simultaneously. Specifically, we provide explicit expressions with small constants whose widths shrink at the fastest possible $\sqrt{t^{-1} \log\log t}$ rate, along non-asymptotic concentration inequality empirical function holds same rate. The latter...
This paper presents a fast and robust algorithm for trend filtering, recently developed nonparametric regression tool. It has been shown that, estimating functions whose derivatives are of bounded variation, filtering achieves the minimax optimal error rate, while other popular methods like smoothing splines kernels do not. Standing in way more widespread practical adoption, however, is lack scalable numerically stable algorithms fitting estimates. highly efficient, specialized ADMM routine...
Summary In many practical applications of multiple testing, there are natural ways to partition the hypotheses into groups by using structural, spatial or temporal relatedness hypotheses, and this prior knowledge is not used in classical Benjamini–Hochberg procedure for controlling false discovery rate (FDR). When one can define (possibly several) such partitions, it may be desirable control group FDR simultaneously all partitions (as special cases, ‘finest’ divides n hypothesis each,...
The Kaczmarz and Gauss--Seidel methods aim to solve an $m \times n$ linear system $X{\beta} = {y}$ by iteratively refining the solution estimate; former uses random rows of $X$ update ${\beta}$ given corresponding equations latter columns coordinates in ${\beta}$. Recent work analyzed these algorithms a parallel comparison for overcomplete undercomplete systems, showing convergence ordinary least squares (OLS) minimum Euclidean norm solution, respectively. This paper considers natural...
There is a significant literature on methods for incorporating knowledge into multiple testing procedures so as to improve their power and precision. Some common forms of prior include (a) beliefs about which hypotheses are null, modeled by nonuniform weights; (b) differing importances hypotheses, penalties false discoveries; (c) arbitrary partitions the (possibly overlapping) groups (d) independence, positive or dependence between groups, suggesting use more aggressive conservative...