Richard J. Samworth

ORCID: 0000-0003-2426-4679
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Statistical Methods and Inference
  • Bayesian Methods and Mixture Models
  • Advanced Statistical Methods and Models
  • Statistical Methods and Bayesian Inference
  • Sparse and Compressive Sensing Techniques
  • Machine Learning and Algorithms
  • Markov Chains and Monte Carlo Methods
  • Blind Source Separation Techniques
  • Random Matrices and Applications
  • Neural Networks and Applications
  • Face and Expression Recognition
  • Advanced Statistical Process Monitoring
  • Bayesian Modeling and Causal Inference
  • Statistical Methods in Clinical Trials
  • Advanced Causal Inference Techniques
  • Control Systems and Identification
  • SARS-CoV-2 and COVID-19 Research
  • Gaussian Processes and Bayesian Inference
  • Gene expression and cancer classification
  • Domain Adaptation and Few-Shot Learning
  • Complex Systems and Time Series Analysis
  • Statistical and numerical algorithms
  • Machine Learning and Data Classification
  • Multiple Myeloma Research and Treatments
  • Sensory Analysis and Statistical Methods

University of Cambridge
2015-2024

University of Sheffield
2024

University of Edinburgh
2020

University of Chicago
2019

University of Wisconsin–Madison
2018

University of Michigan
2018

Columbia University
2018

Sungshin Women's University
2018

Statistical Service
2018

California Institute of Technology
2016

Journal Article A useful variant of the Davis–Kahan theorem for statisticians Get access Y. Yu, Yu Statistical Laboratory, University Cambridge, Wilberforce Road, Cambridge CB3 0WB, U.K., y.yu@statslab.cam.ac.ukt.wang@statslab.cam.ac.ukr.samworth@statslab.cam.ac.uk Search other works by this author on: Oxford Academic Google Scholar T. Wang, Wang R. J. Samworth Biometrika, Volume 102, Issue 2, June 2015, Pages 315–323, https://doi.org/10.1093/biomet/asv008 Published: 28 April 2014 history...

10.1093/biomet/asv008 article EN Biometrika 2015-04-28

We derive an asymptotic expansion for the excess risk (regret) of a weighted nearest-neighbour classifier. This allows us to find asymptotically optimal vector nonnegative weights, which has rather simple form. show that ratio regret this classifier unweighted k-nearest neighbour depends only on dimension d feature vectors, and not underlying populations. The improvement is greatest when d=4, but thereafter decreases as $d\rightarrow\infty$. popular bagged nearest can also be regarded...

10.1214/12-aos1049 article EN other-oa The Annals of Statistics 2012-10-01

Stability Selection was recently introduced by Meinshausen and Buhlmann (2010) as a very general technique designed to improve the performance of variable selection algorithm. It is based on aggregating results applying procedure subsamples data. We introduce variant, called Complementary Pairs (CPSS), derive bounds both expected number variables included CPSS that have low probability under original procedure, high are excluded. These require no (e.g. exchangeability) assumptions underlying...

10.1111/j.1467-9868.2011.01034.x article EN Journal of the Royal Statistical Society Series B (Statistical Methodology) 2012-06-21

Summary Change points are a very common feature of ‘big data’ that arrive in the form data stream. We study high dimensional time series which, at certain points, mean structure changes sparse subset co-ordinates. The challenge is to borrow strength across co-ordinates detect smaller than could be observed any individual component series. propose two-stage procedure called inspect for estimation change points: first, we argue good projection direction can obtained as leading left singular...

10.1111/rssb.12243 article EN cc-by Journal of the Royal Statistical Society Series B (Statistical Methodology) 2017-08-11

The kth-nearest neighbor rule is arguably the simplest and most intuitively appealing nonparametric classification procedure. However, application of this method inhibited by lack knowledge about its properties, in particular, manner which it influenced value k; absence techniques for empirical choice k. In present paper we detail way k determines misclassification error. We consider two models, Poisson Binomial, training samples. Under first model, data are recorded a stream “assigned” to...

10.1214/07-aos537 article EN The Annals of Statistics 2008-10-01

Summary Let X1,…,Xn be independent and identically distributed random vectors with a (Lebesgue) density f. We first prove that, probability 1, there is unique log-concave maximum likelihood estimator f^n of The use this attractive because, unlike kernel estimation, the method fully automatic, no smoothing parameters to choose. Although existence proof non-constructive, we can reformulate issue computing in terms non-differentiable convex optimization problem, thus combine techniques...

10.1111/j.1467-9868.2010.00753.x article EN Journal of the Royal Statistical Society Series B (Statistical Methodology) 2010-10-12

In the North Atlantic Ocean, flow of Deep Water (NADW), and its ancient counterpart Northern Component (NCW), across Greenland‐Scotland Ridge (GSR) is thought to have played an important role in ocean circulation. Over last 60 Ma, Iceland Plume has dynamically supported area which encompasses GSR. Consequently, bathymetry GSR varied with time due a combination lithospheric plate cooling fluctuations temperature buoyancy within underlying convecting mantle. Here, we reassess importance...

10.1029/2005gc001085 article EN Geochemistry Geophysics Geosystems 2006-06-01

In recent years, sparse principal component analysis has emerged as an extremely popular dimension reduction technique for high-dimensional data. The theoretical challenge, in the simplest case, is to estimate leading eigenvector of a population covariance matrix under assumption that this sparse. An impressive range estimators have been proposed; some these are fast compute, while others known achieve minimax optimal rate over certain Gaussian or sub-Gaussian classes. paper, we show that,...

10.1214/15-aos1369 article EN other-oa The Annals of Statistics 2016-09-12

Summary We introduce a very general method for high dimensional classification, based on careful combination of the results applying an arbitrary base classifier to random projections feature vectors into lower space. In one special case that we study in detail, are divided disjoint groups, and within each group select projection yielding smallest estimate test error. Our random-projection ensemble then aggregates selected projections, with data-driven voting threshold determine final...

10.1111/rssb.12228 article EN cc-by Journal of the Royal Statistical Society Series B (Statistical Methodology) 2017-06-30

Many statistical procedures, including goodness-of-fit tests and methods for independent component analysis, rely critically on the estimation of entropy a distribution. In this paper, we seek estimators that are efficient achieve local asymptotic minimax lower bound with respect to squared error loss. To end, study weighted averages originally proposed by Kozachenko Leonenko [Probl. Inform. Transm. 23 (1987), 95–101], based $k$-nearest neighbour distances sample $n$ identically distributed...

10.1214/18-aos1688 article EN The Annals of Statistics 2018-11-30

Purpose Current diagnostic tests for diffuse large B-cell lymphoma use the updated WHO criteria based on biologic, morphologic, and clinical heterogeneity. We propose a refined classification system subset-specific B-cell–associated gene signatures (BAGS) in normal hierarchy, hypothesizing that it can provide new biologic insight prognostic value. Patients Methods combined fluorescence-activated cell sorting, expression profiling, statistical modeling to generate BAGS naive, centrocyte,...

10.1200/jco.2014.57.7080 article EN Journal of Clinical Oncology 2015-03-24

We consider the variable selection problem, which seeks to identify important variables influencing a response $Y$ out of many candidate features $X_{1},\ldots ,X_{p}$. wish do so while offering finite-sample guarantees about fraction false positives—selected $X_{j}$ that in fact have no effect on after other are known. When number $p$ is large (perhaps even larger than sample size $n$), and we prior knowledge regarding type dependence between $X$, model-X knockoffs framework nonetheless...

10.1214/19-aos1852 article EN The Annals of Statistics 2020-06-01

Summary We propose a test of independence two multivariate random vectors, given sample from the underlying population. Our approach is based on estimation mutual information, whose decomposition into joint and marginal entropies facilitates use recently developed efficient entropy estimators derived nearest neighbour distances. The proposed critical values may be obtained by simulation in case where an approximation to one available or permuting data otherwise. This size guarantees, we...

10.1093/biomet/asz024 article EN Biometrika 2019-04-15

Summary We propose a general new method, the conditional permutation test, for testing independence of variables X and Y given potentially high dimensional random vector Z that may contain confounding factors. The test permutes entries non-uniformly, to respect existing dependence between thus account presence these confounders. Like randomization Candès co-workers in 2018, our relies on availability an approximation distribution X|Z—whereas their uses this estimate draw X-values, we use...

10.1111/rssb.12340 article EN cc-by Journal of the Royal Statistical Society Series B (Statistical Methodology) 2019-10-21

The BNT162b2 mRNA COVID-19 vaccine (Pfizer-BioNTech) is being utilised internationally for mass vaccination. Evidence of single-dose protection against symptomatic disease has encouraged some countries to opt delayed booster doses BNT162b2, but the effect this strategy on rates asymptomatic SARS-CoV-2 infection remains unknown. We previously demonstrated frequent pauci- and amongst healthcare workers (HCWs) during UK’s first wave pandemic, using a comprehensive PCR-based HCW screening...

10.7554/elife.68808 article EN cc-by eLife 2021-04-08
Dinesh Aggarwal Ben Warne Aminu S. Jahun William L. Hamilton Thomas Fieldman and 95 more Louis du Plessis Verity Hill Beth Blane Emmeline Watkins Elizabeth Wright Grant Hall Catherine Ludden Richard Myers Myra Hosmillo Yasmin Chaudhry Malte L. Pinckert Iliana Georgana Rhys Izuagbe Danielle Leek Olisaeloka Nsonwu G. Hughes Simon Packer Andrew J. Page Marina Metaxaki Stewart Fuller Gillian Weale Jon Holgate Christopher Brown Alexandra L. Orton Julie A. Douthwaite Steve Rees Christopher Brown Roger Clark Daniel R. Jones Fred Kuenzi Jennifer Rankin Ian D. Waddell Patrick H. Maxwell Nicholas J. Matheson Chris Abell Vickie Braithwaite Craig Brierley Jon Crowcroft Aastha Dahal Kathryn Faulkner Michael Glover Ian Goodfellow Jane Greatorex Laura P. James Paul J. Lehner Ian Leslie Kathleen Liddell Ben Margolis Sally Morgan Linda Sheridan Sally Valletta Anna Vignoles Martin Vinnell Mark R. Wills Sarah Hilborne Sarah Berry Mahin Bagheri Kahkeshi Dawn Hancock Jennifer Winster Jessica Enright Richard J. Samworth Vijay Samtani Gabriela Ahmadi‐Assalemi Tom Feather Robin Goodall Steve Hoensch Dean Johnson Martin Hunt Nick Mathieson К Е Никитина Zara Sheldrake Martin Keen Aris Sato David J. Connor Jonathan Tolhurst Jack Williman Victoria Hollamby Sinead Jordan Tania Fatseas Peter C. Taylor Christine Georgiou Michelle Caspersz Claire McNulty Richard Davies Rebecca Clarke Darius Danaei Rory Dyer Rob Glew Oliver Lambson Karen DiValerio Gibbs Barbara Mozdzen Gabor Raub Asako Radecki Phil White Robert C. Hughes

Understanding SARS-CoV-2 transmission in higher education settings is important to limit spread between students, and into at-risk populations. In this study, we sequenced 482 isolates from the University of Cambridge 5 October 6 December 2020. We perform a detailed phylogenetic comparison with 972 surrounding community, complemented epidemiological contact tracing data, determine dynamics. observe limited viral introductions university; majority student cases were linked single genetic...

10.1038/s41467-021-27942-w article EN cc-by Nature Communications 2022-02-08

Objective To determine current UK medical students’ career intentions after graduation and on completing the Foundation Programme (FP), to ascertain motivations behind these intentions. Design Cross-sectional, mixed-methods survey of students, using a non-random sampling method. Setting All 44 schools recognised by General Medical Council. Participants students were eligible participate. The study sample consisted 10 486 participants, approximately 25.50% student population. Outcome measures...

10.1136/bmjopen-2023-075598 article EN cc-by-nc BMJ Open 2023-08-01

We present theoretical properties of the log-concave maximum likelihood estimator a density based on an independent and identically distributed sample in ℝd. Our study covers both case where true underlying is log-concave, this model misspecified. begin by showing that for sequence densities, convergence distribution implies much stronger types – particular, it Hellinger distance even certain exponentially weighted total variation norms. In our main result, we prove existence uniqueness...

10.1214/09-ejs505 article EN cc-by Electronic Journal of Statistics 2010-01-01

We study the approximation of arbitrary distributions $P$ on $d$-dimensional space by with log-concave density. Approximation means minimizing a Kullback--Leibler-type functional. show that such an exists if and only has finite first moments is not supported some hyperplane. Furthermore we this depends continuously respect to Mallows distance $D_1(\cdot,\cdot)$. This result implies consistency maximum likelihood estimator density under fairly general conditions. It also allows us prove...

10.1214/10-aos853 article EN The Annals of Statistics 2011-03-09

The estimation of a log-concave density on $\mathbb{R}^{d}$ represents central problem in the area nonparametric inference under shape constraints. In this paper, we study performance estimators with respect to global loss functions, and adopt minimax approach. We first show that no statistical procedure based sample size $n$ can estimate squared Hellinger function supremum risk smaller than order $n^{-4/5}$, when $d=1$, $n^{-2/(d+1)}$ $d\geq2$. particular, reveals sense which, $d\geq3$, is...

10.1214/16-aos1480 article EN other-oa The Annals of Statistics 2016-11-23

Summary We study generalized additive models, with shape restrictions (e.g. monotonicity, convexity and concavity) imposed on each component of the prediction function. show that this framework facilitates a non-parametric estimator component, obtained by maximizing likelihood. The procedure is free tuning parameters under mild conditions proved to be uniformly consistent compact intervals. More generally, our methodology can applied index models. Here again, justified theoretical grounds...

10.1111/rssb.12137 article EN cc-by Journal of the Royal Statistical Society Series B (Statistical Methodology) 2015-10-26

We study the least squares regression function estimator over class of real-valued functions on $[0,1]^{d}$ that are increasing in each coordinate. For uniformly bounded signals and with a fixed, cubic lattice design, we establish achieves minimax rate order $n^{-\min\{2/(d+2),1/d\}}$ empirical $L_{2}$ loss, up to polylogarithmic factors. Further, prove sharp oracle inequality, which reveals particular when true is piecewise constant $k$ hyperrectangles, enjoys faster, adaptive convergence...

10.1214/18-aos1753 article EN The Annals of Statistics 2019-08-03

In recent years, log-concave density estimation via maximum likelihood has emerged as a fascinating alternative to traditional nonparametric smoothing techniques, such kernel estimation, which require the choice of one or more bandwidths. The purpose this article is describe some properties class densities on $\mathbb{R}^{d}$ make it so attractive from statistical perspective, and outline latest methodological, theoretical computational advances in area.

10.1214/18-sts666 article EN Statistical Science 2018-11-01
Coming Soon ...