Thomas B. Berrett

ORCID: 0000-0002-2005-110X
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Statistical Methods and Inference
  • Privacy-Preserving Technologies in Data
  • Advanced Statistical Methods and Models
  • Statistical Methods and Bayesian Inference
  • Bayesian Methods and Mixture Models
  • Machine Learning and Algorithms
  • Advanced Causal Inference Techniques
  • Probability and Risk Models
  • Markov Chains and Monte Carlo Methods
  • Neural Networks and Applications
  • Complex Systems and Time Series Analysis
  • Sparse and Compressive Sensing Techniques
  • Cryptography and Data Security
  • Sensory Analysis and Statistical Methods
  • Bayesian Modeling and Causal Inference
  • Statistical Methods in Clinical Trials
  • Data-Driven Disease Surveillance
  • Privacy, Security, and Data Protection
  • Peripheral Artery Disease Management
  • Face and Expression Recognition
  • Machine Learning and Data Classification
  • Stochastic Gradient Optimization Techniques
  • Venous Thromboembolism Diagnosis and Management
  • Random Matrices and Applications
  • Mathematical and Theoretical Epidemiology and Ecology Models

University of Warwick
2020-2024

University of Edinburgh
2020

University of Cambridge
2017-2020

Centre de Recherche en Économie et Statistique
2019

University of Wisconsin–Madison
2018

Many statistical procedures, including goodness-of-fit tests and methods for independent component analysis, rely critically on the estimation of entropy a distribution. In this paper, we seek estimators that are efficient achieve local asymptotic minimax lower bound with respect to squared error loss. To end, study weighted averages originally proposed by Kozachenko Leonenko [Probl. Inform. Transm. 23 (1987), 95–101], based $k$-nearest neighbour distances sample $n$ identically distributed...

10.1214/18-aos1688 article EN The Annals of Statistics 2018-11-30

Summary We propose a test of independence two multivariate random vectors, given sample from the underlying population. Our approach is based on estimation mutual information, whose decomposition into joint and marginal entropies facilitates use recently developed efficient entropy estimators derived nearest neighbour distances. The proposed critical values may be obtained by simulation in case where an approximation to one available or permuting data otherwise. This size guarantees, we...

10.1093/biomet/asz024 article EN Biometrika 2019-04-15

Summary We propose a general new method, the conditional permutation test, for testing independence of variables X and Y given potentially high dimensional random vector Z that may contain confounding factors. The test permutes entries non-uniformly, to respect existing dependence between thus account presence these confounders. Like randomization Candès co-workers in 2018, our relies on availability an approximation distribution X|Z—whereas their uses this estimate draw X-values, we use...

10.1111/rssb.12340 article EN cc-by Journal of the Royal Statistical Society Series B (Statistical Methodology) 2019-10-21

This study determined whether in vivo positron emission tomography (PET) of arterial inflammation (18F-fluorodeoxyglucose [18F-FDG]) or microcalcification (18F-sodium fluoride [18F-NaF]) could predict restenosis following PTA. Restenosis lower limb percutaneous transluminal angioplasty (PTA) is common, unpredictable, and challenging to treat. Currently, it impossible which patient will suffer from angioplasty. In this prospective observational cohort study, 50 patients with symptomatic...

10.1016/j.jcmg.2019.03.031 article EN cc-by JACC. Cardiovascular imaging 2019-06-12

We study the problem of independence testing given independent and identically distributed pairs taking values in a σ-finite, separable measure space. Defining natural dependence D(f) as squared L2-distance between joint density f product its marginals, we first show that there is no valid test uniformly consistent against alternatives form {f:D(f)≥ρ2}. therefore restrict attention to impose additional Sobolev-type smoothness constraints, define permutation based on basis expansion...

10.1214/20-aos2041 article EN The Annals of Statistics 2021-10-01

It is of soaring demand to develop statistical analysis tools that are robust against contamination as well preserving individual data owners' privacy. In spite the fact both topics host a rich body literature, best our knowledge, we first systematically study connections between optimality under Huber's model and local differential privacy (LDP) constraints. this paper, start with general minimax lower bound result, which disentangles costs being Huber LDP. We further four concrete...

10.1214/23-aos2267 article EN The Annals of Statistics 2023-04-01

We derive a new asymptotic expansion for the global excess risk of local-$k$-nearest neighbour classifier, where choice $k$ may depend upon test point. This elucidates conditions under which dominant contribution to comes from decision boundary optimal Bayes but we also show that if these are not satisfied, then arise tails marginal distribution features. Moreover, prove that, provided $d$-dimensional features has finite $\rho $th moment some >4$ (as well as other regularity conditions),...

10.1214/19-aos1868 article EN The Annals of Statistics 2020-06-01

Given a set of incomplete observations, we study the nonparametric problem testing whether data are Missing Completely At Random (MCAR). Our first contribution is to characterise precisely alternatives that can be distinguished from MCAR null hypothesis. This reveals interesting and novel links theory Fréchet classes (in particular, compatible distributions) linear programming, allow us propose tests consistent against all detectable alternatives. We define an incompatibility index as...

10.1214/23-aos2326 article EN The Annals of Statistics 2023-10-01

We present the U -statistic permutation (USP) test of independence in context discrete data displayed a contingency table. Either Pearson's χ2 -test independence, or G -test, are typically used for this task, but we argue that these tests have serious deficiencies, both terms their inability to control size test, and power properties. By contrast, USP is guaranteed at nominal level all sample sizes, has no issues with small (or zero) cell counts, able detect distributions violate only...

10.1098/rspa.2021.0549 article EN cc-by Proceedings of the Royal Society A Mathematical Physical and Engineering Sciences 2021-12-01

Many statistical procedures, including goodness-of-fit tests and methods for independent component analysis, rely critically on the estimation of entropy a distribution. In this paper, we seek estimators that are efficient achieve local asymptotic minimax lower bound with respect to squared error loss. To end, study weighted averages originally proposed by Kozachenko Leonenko (1987), based $k$-nearest neighbour distances sample $n$ identically distributed random vectors in $\mathbb{R}^d$. A...

10.48550/arxiv.1606.00304 preprint EN other-oa arXiv (Cornell University) 2016-01-01

We find separation rates for testing multinomial or more general discrete distributions under the constraint of local differential privacy. construct efficient randomized algorithms and test procedures, in both case where only non-interactive privacy mechanisms are allowed also all sequentially interactive allowed. The faster latter case. prove information theoretical bounds that allow us to establish optimality our among pairs most usual cases. Considered examples include uniform,...

10.48550/arxiv.2005.12601 preprint EN other-oa arXiv (Cornell University) 2020-01-01

We consider the estimation of two-sample integral functionals, type that occur naturally, for example, when object interest is a divergence between unknown probability densities. Our first main result that, in wide generality, weighted nearest neighbour estimator efficient, sense achieving local asymptotic minimax lower bound. Moreover, we also prove corresponding central limit theorem, which facilitates construction asymptotically valid confidence intervals functional, having minimal width....

10.1214/23-aos2265 article EN The Annals of Statistics 2023-04-01

We consider the binary classification problem in a setup that preserves privacy of original sample. provide mechanism is locally differentially private and then construct classifier based on sample universally consistent Euclidean spaces. Under stronger assumptions, we establish minimax rates convergence excess risk see they are slower than case when available.

10.48550/arxiv.1912.04629 preprint EN other-oa arXiv (Cornell University) 2019-01-01

In this paper we revisit the classical problem of nonparametric regression, but impose local differential privacy constraints. Under such constraints, raw data (X1,Y1),...,(Xn,Yn), taking values in Rd×R, cannot be directly observed, and all estimators are functions randomised output from a suitable mechanism. The statistician is free to choose form mechanism, here add Laplace distributed noise discretisation location feature vector Xi value its response variable Yi. Based on data, design...

10.1214/21-ejs1845 article EN cc-by Electronic Journal of Statistics 2021-01-01

We propose a test of independence two multivariate random vectors, given sample from the underlying population. Our approach, which we call MINT, is based on estimation mutual information, whose decomposition into joint and marginal entropies facilitates use recently-developed efficient entropy estimators derived nearest neighbour distances. The proposed critical values, may be obtained simulation (in case where one known) or resampling, guarantee that has nominal size, provide local power...

10.48550/arxiv.1711.06642 preprint EN other-oa arXiv (Cornell University) 2017-01-01

We propose a general new method, the conditional permutation test, for testing independence of variables X and Y given potentially high dimensional random vector Z that may contain confounding factors. The test permutes entries non‐uniformly, to respect existing dependence between thus account presence these confounders. Like randomization Candes co‐workers in 2018, our relies on availability an approximation distribution X|Z—whereas their uses this estimate draw X‐values, we use design...

10.17863/cam.43806 article EN Journal of the Royal Statistical Society Series A (Statistics in Society) 2019-09-11

Journal Article Discussion of ‘Multi-scale Fisher’s independence test for multivariate dependence’ Get access T B Berrett Department Statistics, University Warwick, Coventry CV4 7AL, U.K. tom.berrett@warwick.ac.uk https://orcid.org/0000-0002-2005-110X Search other works by this author on: Oxford Academic Google Scholar Biometrika, Volume 109, Issue 3, September 2022, Pages 589–592, https://doi.org/10.1093/biomet/asac023 Published: 24 August 2022 history Editorial decision: 08 April Received:

10.1093/biomet/asac023 article EN Biometrika 2022-04-14

We study the problem of testing whether missing values a potentially high-dimensional dataset are Missing Completely at Random (MCAR). relax MCAR to compatibility sequence covariance matrices, motivated by fact that this procedure is feasible when dimension grows with sample size. Tests can be used test feasibility positive semi-definite matrix completion problems noisy observations, and thus our results may independent interest. Our first contributions define natural measure incompatibility...

10.48550/arxiv.2401.05256 preprint EN cc-by arXiv (Cornell University) 2024-01-01
Coming Soon ...