NFDI4DS | UHH-SEMS - Publication Details

Thomas B. Berrett

ORCID: 0000-0002-2005-110X

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5057729785

Research Areas

Statistical Methods and Inference
Privacy-Preserving Technologies in Data
Advanced Statistical Methods and Models
Statistical Methods and Bayesian Inference
Bayesian Methods and Mixture Models
Machine Learning and Algorithms
Advanced Causal Inference Techniques
Probability and Risk Models
Markov Chains and Monte Carlo Methods
Neural Networks and Applications
Complex Systems and Time Series Analysis
Sparse and Compressive Sensing Techniques
Cryptography and Data Security
Sensory Analysis and Statistical Methods
Bayesian Modeling and Causal Inference
Statistical Methods in Clinical Trials
Data-Driven Disease Surveillance
Privacy, Security, and Data Protection
Peripheral Artery Disease Management
Face and Expression Recognition
Machine Learning and Data Classification
Stochastic Gradient Optimization Techniques
Venous Thromboembolism Diagnosis and Management
Random Matrices and Applications
Mathematical and Theoretical Epidemiology and Ecology Models

University of Warwick
2020-2024

University of Edinburgh
2020

University of Cambridge
2017-2020

Centre de Recherche en Économie et Statistique
2019

University of Wisconsin–Madison
2018

Efficient multivariate entropy estimation via $k$-nearest neighbour distances

OPENALEX - Publications

Thomas B. Berrett Richard J. Samworth Ming Yuan

Many statistical procedures, including goodness-of-fit tests and methods for independent component analysis, rely critically on the estimation of entropy a distribution. In this paper, we seek estimators that are efficient achieve local asymptotic minimax lower bound with respect to squared error loss. To end, study weighted averages originally proposed by Kozachenko Leonenko [Probl. Inform. Transm. 23 (1987), 95–101], based $k$-nearest neighbour distances sample $n$ identically distributed...

10.1214/18-aos1688 article EN The Annals of Statistics 2018-11-30

Nonparametric independence testing via mutual information

OPENALEX - Publications

Thomas B. Berrett Richard J. Samworth

Summary We propose a test of independence two multivariate random vectors, given sample from the underlying population. Our approach is based on estimation mutual information, whose decomposition into joint and marginal entropies facilitates use recently developed efficient entropy estimators derived nearest neighbour distances. The proposed critical values may be obtained by simulation in case where an approximation to one available or permuting data otherwise. This size guarantees, we...

10.1093/biomet/asz024 article EN Biometrika 2019-04-15

The Conditional Permutation Test for Independence While Controlling for Confounders

OPENALEX - Publications

Thomas B. Berrett Yi Wang Rina Foygel Barber Richard J. Samworth

Summary We propose a general new method, the conditional permutation test, for testing independence of variables X and Y given potentially high dimensional random vector Z that may contain confounding factors. The test permutes entries non-uniformly, to respect existing dependence between thus account presence these confounders. Like randomization Candès co-workers in 2018, our relies on availability an approximation distribution X|Z—whereas their uses this estimate draw X-values, we use...

10.1111/rssb.12340 article EN cc-by Journal of the Royal Statistical Society Series B (Statistical Methodology) 2019-10-21

Vascular Positron Emission Tomography and Restenosis in Symptomatic Peripheral Arterial Disease

OPENALEX - Publications

Mohammed M. Chowdhury Jason M. Tarkin Mazen Albaghdadi Nicholas R. Evans Elizabeth Le and 10 more

This study determined whether in vivo positron emission tomography (PET) of arterial inflammation (18F-fluorodeoxyglucose [18F-FDG]) or microcalcification (18F-sodium fluoride [18F-NaF]) could predict restenosis following PTA. Restenosis lower limb percutaneous transluminal angioplasty (PTA) is common, unpredictable, and challenging to treat. Currently, it impossible which patient will suffer from angioplasty. In this prospective observational cohort study, 50 patients with symptomatic...

10.1016/j.jcmg.2019.03.031 article EN cc-by JACC. Cardiovascular imaging 2019-06-12

Optimal rates for independence testing via U-statistic permutation tests

OPENALEX - Publications

Thomas B. Berrett Ioannis Kontoyiannis Richard J. Samworth

We study the problem of independence testing given independent and identically distributed pairs taking values in a σ-finite, separable measure space. Defining natural dependence D(f) as squared L2-distance between joint density f product its marginals, we first show that there is no valid test uniformly consistent against alternatives form {f:D(f)≥ρ2}. therefore restrict attention to impose additional Sobolev-type smoothness constraints, define permutation based on basis expansion...

10.1214/20-aos2041 article EN The Annals of Statistics 2021-10-01

On robustness and local differential privacy

OPENALEX - Publications

Mengchu Li Thomas B. Berrett Yi Yu

It is of soaring demand to develop statistical analysis tools that are robust against contamination as well preserving individual data owners' privacy. In spite the fact both topics host a rich body literature, best our knowledge, we first systematically study connections between optimality under Huber's model and local differential privacy (LDP) constraints. this paper, start with general minimax lower bound result, which disentangles costs being Huber LDP. We further four concrete...

10.1214/23-aos2267 article EN The Annals of Statistics 2023-04-01

Local nearest neighbour classification with applications to semi-supervised learning

OPENALEX - Publications

Timothy I. Cannings Thomas B. Berrett Richard J. Samworth

We derive a new asymptotic expansion for the global excess risk of local-$k$-nearest neighbour classifier, where choice $k$ may depend upon test point. This elucidates conditions under which dominant contribution to comes from decision boundary optimal Bayes but we also show that if these are not satisfied, then arise tails marginal distribution features. Moreover, prove that, provided $d$-dimensional features has finite $\rho $th moment some >4$ (as well as other regularity conditions),...

10.1214/19-aos1868 article EN The Annals of Statistics 2020-06-01

Site and Burden of Lower Limb Atherosclerosis Predicts Long-term Mortality in a Cohort of Patients With Peripheral Arterial Disease

OPENALEX - Publications

Paul Jie Wen Tern Izabela Kujawiak Pratyasha Saha Thomas B. Berrett Mohammed M. Chowdhury and 1 more

10.1016/j.ejvs.2018.07.020 article EN publisher-specific-oa European Journal of Vascular and Endovascular Surgery 2018-10-01

Optimal nonparametric testing of Missing Completely At Random and its connections to compatibility

OPENALEX - Publications

Thomas B. Berrett Richard J. Samworth

Given a set of incomplete observations, we study the nonparametric problem testing whether data are Missing Completely At Random (MCAR). Our first contribution is to characterise precisely alternatives that can be distinguished from MCAR null hypothesis. This reveals interesting and novel links theory Fréchet classes (in particular, compatible distributions) linear programming, allow us propose tests consistent against all detectable alternatives. We define an incompatibility index as...

10.1214/23-aos2326 article EN The Annals of Statistics 2023-10-01

USP: an independence test that improves on Pearson’s chi-squared and the G -test

OPENALEX - Publications

Thomas B. Berrett Richard J. Samworth

We present the U -statistic permutation (USP) test of independence in context discrete data displayed a contingency table. Either Pearson's χ2 -test independence, or G -test, are typically used for this task, but we argue that these tests have serious deficiencies, both terms their inability to control size test, and power properties. By contrast, USP is guaranteed at nominal level all sample sizes, has no issues with small (or zero) cell counts, able detect distributions violate only...

10.1098/rspa.2021.0549 article EN cc-by Proceedings of the Royal Society A Mathematical Physical and Engineering Sciences 2021-12-01

Efficient multivariate entropy estimation via $k$-nearest neighbour distances

OPENALEX - Publications

Thomas B. Berrett Richard J. Samworth Ming Yuan

Many statistical procedures, including goodness-of-fit tests and methods for independent component analysis, rely critically on the estimation of entropy a distribution. In this paper, we seek estimators that are efficient achieve local asymptotic minimax lower bound with respect to squared error loss. To end, study weighted averages originally proposed by Kozachenko Leonenko (1987), based $k$-nearest neighbour distances sample $n$ identically distributed random vectors in $\mathbb{R}^d$. A...

10.48550/arxiv.1606.00304 preprint EN other-oa arXiv (Cornell University) 2016-01-01

Locally private non-asymptotic testing of discrete distributions is faster using interactive mechanisms

OPENALEX - Publications

Thomas B. Berrett Cristina Butucea

We find separation rates for testing multinomial or more general discrete distributions under the constraint of local differential privacy. construct efficient randomized algorithms and test procedures, in both case where only non-interactive privacy mechanisms are allowed also all sequentially interactive allowed. The faster latter case. prove information theoretical bounds that allow us to establish optimality our among pairs most usual cases. Considered examples include uniform,...

10.48550/arxiv.2005.12601 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Efficient functional estimation and the super-oracle phenomenon

OPENALEX - Publications

Thomas B. Berrett Richard J. Samworth

We consider the estimation of two-sample integral functionals, type that occur naturally, for example, when object interest is a divergence between unknown probability densities. Our first main result that, in wide generality, weighted nearest neighbour estimator efficient, sense achieving local asymptotic minimax lower bound. Moreover, we also prove corresponding central limit theorem, which facilitates construction asymptotically valid confidence intervals functional, having minimal width....

10.1214/23-aos2265 article EN The Annals of Statistics 2023-04-01

Classification under local differential privacy

OPENALEX - Publications

Thomas B. Berrett Cristina Butucea

We consider the binary classification problem in a setup that preserves privacy of original sample. provide mechanism is locally differentially private and then construct classifier based on sample universally consistent Euclidean spaces. Under stronger assumptions, we establish minimax rates convergence excess risk see they are slower than case when available.

10.48550/arxiv.1912.04629 preprint EN other-oa arXiv (Cornell University) 2019-01-01

Strongly universally consistent nonparametric regression and classification with privatised data

OPENALEX - Publications

Thomas B. Berrett László Györfi Harro Walk

In this paper we revisit the classical problem of nonparametric regression, but impose local differential privacy constraints. Under such constraints, raw data (X1,Y1),...,(Xn,Yn), taking values in Rd×R, cannot be directly observed, and all estimators are functions randomised output from a suitable mechanism. The statistician is free to choose form mechanism, here add Laplace distributed noise discretisation location feature vector Xi value its response variable Yi. Based on data, design...

10.1214/21-ejs1845 article EN cc-by Electronic Journal of Statistics 2021-01-01

Nonparametric independence testing via mutual information

OPENALEX - Publications

Thomas B. Berrett Richard J. Samworth

We propose a test of independence two multivariate random vectors, given sample from the underlying population. Our approach, which we call MINT, is based on estimation mutual information, whose decomposition into joint and marginal entropies facilitates use recently-developed efficient entropy estimators derived nearest neighbour distances. The proposed critical values, may be obtained simulation (in case where one known) or resampling, guarantee that has nominal size, provide local power...

10.48550/arxiv.1711.06642 preprint EN other-oa arXiv (Cornell University) 2017-01-01

The conditional permutation test for independence while controlling for confounders

OPENALEX - Publications

Thomas B. Berrett Yi Wang Rina Foygel Barber Richard J. Samworth

We propose a general new method, the conditional permutation test, for testing independence of variables X and Y given potentially high dimensional random vector Z that may contain confounding factors. The test permutes entries non‐uniformly, to respect existing dependence between thus account presence these confounders. Like randomization Candes co‐workers in 2018, our relies on availability an approximation distribution X|Z—whereas their uses this estimate draw X‐values, we use design...

10.17863/cam.43806 article EN Journal of the Royal Statistical Society Series A (Statistics in Society) 2019-09-11

Discussion of ‘Multi-scale Fisher’s independence test for multivariate dependence’

OPENALEX - Publications

Thomas B. Berrett

Journal Article Discussion of ‘Multi-scale Fisher’s independence test for multivariate dependence’ Get access T B Berrett Department Statistics, University Warwick, Coventry CV4 7AL, U.K. tom.berrett@warwick.ac.uk https://orcid.org/0000-0002-2005-110X Search other works by this author on: Oxford Academic Google Scholar Biometrika, Volume 109, Issue 3, September 2022, Pages 589–592, https://doi.org/10.1093/biomet/asac023 Published: 24 August 2022 history Editorial decision: 08 April Received:

10.1093/biomet/asac023 article EN Biometrika 2022-04-14

Tests of Missing Completely At Random based on sample covariance matrices

OPENALEX - Publications

Alberto Bordino Thomas B. Berrett

We study the problem of testing whether missing values a potentially high-dimensional dataset are Missing Completely at Random (MCAR). relax MCAR to compatibility sequence covariance matrices, motivated by fact that this procedure is feasible when dimension grows with sample size. Tests can be used test feasibility positive semi-definite matrix completion problems noisy observations, and thus our results may independent interest. Our first contributions define natural measure incompatibility...

10.48550/arxiv.2401.05256 preprint EN cc-by arXiv (Cornell University) 2024-01-01

Coming Soon ...