- Statistical Methods and Inference
- Privacy-Preserving Technologies in Data
- Advanced Statistical Methods and Models
- Statistical Methods and Bayesian Inference
- Bayesian Methods and Mixture Models
- Machine Learning and Algorithms
- Advanced Causal Inference Techniques
- Probability and Risk Models
- Markov Chains and Monte Carlo Methods
- Neural Networks and Applications
- Complex Systems and Time Series Analysis
- Sparse and Compressive Sensing Techniques
- Cryptography and Data Security
- Sensory Analysis and Statistical Methods
- Bayesian Modeling and Causal Inference
- Statistical Methods in Clinical Trials
- Data-Driven Disease Surveillance
- Privacy, Security, and Data Protection
- Peripheral Artery Disease Management
- Face and Expression Recognition
- Machine Learning and Data Classification
- Stochastic Gradient Optimization Techniques
- Venous Thromboembolism Diagnosis and Management
- Random Matrices and Applications
- Mathematical and Theoretical Epidemiology and Ecology Models
University of Warwick
2020-2024
University of Edinburgh
2020
University of Cambridge
2017-2020
Centre de Recherche en Économie et Statistique
2019
University of Wisconsin–Madison
2018
Many statistical procedures, including goodness-of-fit tests and methods for independent component analysis, rely critically on the estimation of entropy a distribution. In this paper, we seek estimators that are efficient achieve local asymptotic minimax lower bound with respect to squared error loss. To end, study weighted averages originally proposed by Kozachenko Leonenko [Probl. Inform. Transm. 23 (1987), 95–101], based $k$-nearest neighbour distances sample $n$ identically distributed...
Summary We propose a test of independence two multivariate random vectors, given sample from the underlying population. Our approach is based on estimation mutual information, whose decomposition into joint and marginal entropies facilitates use recently developed efficient entropy estimators derived nearest neighbour distances. The proposed critical values may be obtained by simulation in case where an approximation to one available or permuting data otherwise. This size guarantees, we...
Summary We propose a general new method, the conditional permutation test, for testing independence of variables X and Y given potentially high dimensional random vector Z that may contain confounding factors. The test permutes entries non-uniformly, to respect existing dependence between thus account presence these confounders. Like randomization Candès co-workers in 2018, our relies on availability an approximation distribution X|Z—whereas their uses this estimate draw X-values, we use...
This study determined whether in vivo positron emission tomography (PET) of arterial inflammation (18F-fluorodeoxyglucose [18F-FDG]) or microcalcification (18F-sodium fluoride [18F-NaF]) could predict restenosis following PTA. Restenosis lower limb percutaneous transluminal angioplasty (PTA) is common, unpredictable, and challenging to treat. Currently, it impossible which patient will suffer from angioplasty. In this prospective observational cohort study, 50 patients with symptomatic...
We study the problem of independence testing given independent and identically distributed pairs taking values in a σ-finite, separable measure space. Defining natural dependence D(f) as squared L2-distance between joint density f product its marginals, we first show that there is no valid test uniformly consistent against alternatives form {f:D(f)≥ρ2}. therefore restrict attention to impose additional Sobolev-type smoothness constraints, define permutation based on basis expansion...
It is of soaring demand to develop statistical analysis tools that are robust against contamination as well preserving individual data owners' privacy. In spite the fact both topics host a rich body literature, best our knowledge, we first systematically study connections between optimality under Huber's model and local differential privacy (LDP) constraints. this paper, start with general minimax lower bound result, which disentangles costs being Huber LDP. We further four concrete...
We derive a new asymptotic expansion for the global excess risk of local-$k$-nearest neighbour classifier, where choice $k$ may depend upon test point. This elucidates conditions under which dominant contribution to comes from decision boundary optimal Bayes but we also show that if these are not satisfied, then arise tails marginal distribution features. Moreover, prove that, provided $d$-dimensional features has finite $\rho $th moment some >4$ (as well as other regularity conditions),...
Given a set of incomplete observations, we study the nonparametric problem testing whether data are Missing Completely At Random (MCAR). Our first contribution is to characterise precisely alternatives that can be distinguished from MCAR null hypothesis. This reveals interesting and novel links theory Fréchet classes (in particular, compatible distributions) linear programming, allow us propose tests consistent against all detectable alternatives. We define an incompatibility index as...
We present the U -statistic permutation (USP) test of independence in context discrete data displayed a contingency table. Either Pearson's χ2 -test independence, or G -test, are typically used for this task, but we argue that these tests have serious deficiencies, both terms their inability to control size test, and power properties. By contrast, USP is guaranteed at nominal level all sample sizes, has no issues with small (or zero) cell counts, able detect distributions violate only...
Many statistical procedures, including goodness-of-fit tests and methods for independent component analysis, rely critically on the estimation of entropy a distribution. In this paper, we seek estimators that are efficient achieve local asymptotic minimax lower bound with respect to squared error loss. To end, study weighted averages originally proposed by Kozachenko Leonenko (1987), based $k$-nearest neighbour distances sample $n$ identically distributed random vectors in $\mathbb{R}^d$. A...
We find separation rates for testing multinomial or more general discrete distributions under the constraint of local differential privacy. construct efficient randomized algorithms and test procedures, in both case where only non-interactive privacy mechanisms are allowed also all sequentially interactive allowed. The faster latter case. prove information theoretical bounds that allow us to establish optimality our among pairs most usual cases. Considered examples include uniform,...
We consider the estimation of two-sample integral functionals, type that occur naturally, for example, when object interest is a divergence between unknown probability densities. Our first main result that, in wide generality, weighted nearest neighbour estimator efficient, sense achieving local asymptotic minimax lower bound. Moreover, we also prove corresponding central limit theorem, which facilitates construction asymptotically valid confidence intervals functional, having minimal width....
We consider the binary classification problem in a setup that preserves privacy of original sample. provide mechanism is locally differentially private and then construct classifier based on sample universally consistent Euclidean spaces. Under stronger assumptions, we establish minimax rates convergence excess risk see they are slower than case when available.
In this paper we revisit the classical problem of nonparametric regression, but impose local differential privacy constraints. Under such constraints, raw data (X1,Y1),...,(Xn,Yn), taking values in Rd×R, cannot be directly observed, and all estimators are functions randomised output from a suitable mechanism. The statistician is free to choose form mechanism, here add Laplace distributed noise discretisation location feature vector Xi value its response variable Yi. Based on data, design...
We propose a test of independence two multivariate random vectors, given sample from the underlying population. Our approach, which we call MINT, is based on estimation mutual information, whose decomposition into joint and marginal entropies facilitates use recently-developed efficient entropy estimators derived nearest neighbour distances. The proposed critical values, may be obtained simulation (in case where one known) or resampling, guarantee that has nominal size, provide local power...
We propose a general new method, the conditional permutation test, for testing independence of variables X and Y given potentially high dimensional random vector Z that may contain confounding factors. The test permutes entries non‐uniformly, to respect existing dependence between thus account presence these confounders. Like randomization Candes co‐workers in 2018, our relies on availability an approximation distribution X|Z—whereas their uses this estimate draw X‐values, we use design...
Journal Article Discussion of ‘Multi-scale Fisher’s independence test for multivariate dependence’ Get access T B Berrett Department Statistics, University Warwick, Coventry CV4 7AL, U.K. tom.berrett@warwick.ac.uk https://orcid.org/0000-0002-2005-110X Search other works by this author on: Oxford Academic Google Scholar Biometrika, Volume 109, Issue 3, September 2022, Pages 589–592, https://doi.org/10.1093/biomet/asac023 Published: 24 August 2022 history Editorial decision: 08 April Received:
We study the problem of testing whether missing values a potentially high-dimensional dataset are Missing Completely at Random (MCAR). relax MCAR to compatibility sequence covariance matrices, motivated by fact that this procedure is feasible when dimension grows with sample size. Tests can be used test feasibility positive semi-definite matrix completion problems noisy observations, and thus our results may independent interest. Our first contributions define natural measure incompatibility...