- Statistical Methods and Inference
- Neural Networks and Applications
- Financial Risk and Volatility Modeling
- Bayesian Methods and Mixture Models
- Stochastic processes and financial applications
- Statistical Methods and Bayesian Inference
- Stochastic Gradient Optimization Techniques
- Advanced Statistical Methods and Models
- Sparse and Compressive Sensing Techniques
- Face and Expression Recognition
- Statistical Mechanics and Entropy
- Image and Signal Denoising Methods
- Gaussian Processes and Bayesian Inference
- Advanced Statistical Process Monitoring
- Machine Learning and Algorithms
- Markov Chains and Monte Carlo Methods
- Complex Systems and Time Series Analysis
- Numerical methods in inverse problems
- Monetary Policy and Economic Impact
- Aortic aneurysm repair treatments
- Model Reduction and Neural Networks
- Machine Learning in Materials Science
- Probabilistic and Robust Engineering Design
- Medical Image Segmentation Techniques
- Bayesian Modeling and Causal Inference
University of Twente
2019-2025
Leiden University
2014-2021
University of Göttingen
2009-2019
University of Canterbury
2019
King's College London
2019
École Nationale de la Statistique et de l'Administration Économique
2014
Centre de Recherche en Économie et Statistique
2014
Vrije Universiteit Amsterdam
2013
Gesellschaft Fur Mathematik Und Datenverarbeitung
2010
Consider the multivariate nonparametric regression model. It is shown that estimators based on sparsely connected deep neural networks with ReLU activation function and properly chosen network architecture achieve minimax rates of convergence (up to $\log n$-factors) under a general composition assumption function. The framework includes many well-studied structural constraints such as (generalized) additive models. While there lot flexibility in architecture, tuning parameter sparsity...
Deep neural networks (DNNs) generate much richer function spaces than shallow networks. Since the induced by have several approximation theoretic drawbacks, this explains, however, not necessarily success of deep In article we take another route comparing expressive power DNNs with ReLU activation to linear spline methods. We show that MARS (multivariate adaptive regression splines) is improper learnable in sense for any given can be expressed as a M parameters there exists multilayer...
We study full Bayesian procedures for high-dimensional linear regression under sparsity constraints. The prior is a mixture of point masses at zero and continuous distributions. Under compatibility conditions on the design matrix, posterior distribution shown to contract optimal rate recovery unknown sparse vector, give prediction response vector. It also select correct model, or least coefficients that are significantly different from zero. asymptotic shape characterized employed...
There is a longstanding debate whether the Kolmogorov–Arnold representation theorem can explain use of more than one hidden layer in neural networks. The decomposes multivariate function into an interior and outer therefore has indeed similar structure as network with two layers. But there are distinctive differences. One main obstacles that depends on represented be wildly varying even if smooth. We derive modifications transfer smoothness properties to well approximated by ReLU It appears...
We investigate the problem of deriving posterior concentration rates under different loss functions in nonparametric Bayes. first provide a lower bound on coverages shrinking neighbourhoods that relates metric or which neighbourhood is considered, and an intrinsic pre-metric linked to frequentist separation rates. In Gaussian white noise model, we construct feasible priors based spike slab procedure reminiscent wavelet thresholding achieve adaptive contraction $L^2$ $L^{\infty}$ metrics when...
The first Bayesian results for the sparse normal means problem were proven spike-and-slab priors. However, these priors are less convenient from a computational point of view. In meanwhile, large number continuous shrinkage has been proposed. Many can be written as scale mixture normals, which makes them particularly easy to implement. We propose general conditions on prior local variance in mixtures such that posterior contraction at minimax rate is assured. require tails least heavy...
Whereas recovery of the manifold from data is a well-studied topic, approximation rates for functions defined on manifolds are less known. In this work, we study regression problem with inputs $d^*$-dimensional that embedded into space potentially much larger ambient dimension. It shown sparsely connected deep ReLU networks can approximate Hölder function smoothness index $β$ up to error $ε$ using order $ε^{-d^*/β}\log(1/ε)$ many non-zero network parameters. As an application, derive...
We derive multiscale statistics for deconvolution in order to detect qualitative features of the unknown density. An important example covered within this framework is test local monotonicity on all scales simultaneously. investigate moderately ill-posed setting, where Fourier transform error density model polynomial decay. For testing, we consider a calibration, motivated by modulus continuity Brownian motion. performance our results from both theoretical and simulation based point view. A...
The ordinal patterns of a fixed number consecutive values in time series is the spatial ordering these values. Counting how often specific pattern occurs provides important insights into properties series. In this work, we prove asymptotic normality relative frequency for with linear increments. Moreover, apply to detect changes distribution
For classification problems, trained deep neural networks return probabilities of class memberships. In this work we study convergence the learned to true conditional probabilities. More specifically consider sparse ReLU network reconstructions minimizing cross-entropy loss in multiclass setup. Interesting phenomena occur when membership are close zero. Convergence rates derived that depend on near-zero behaviour via a margin-type condition.
Convergence properties of empirical risk minimizers can be conveniently expressed in terms the associated population risk. To derive bounds for performance estimator under covariate shift, however, pointwise convergence rates are required. Under weak assumptions on design distribution, it is shown that least squares estimators (LSE) over 1-Lipschitz functions also minimax rate optimal with respect to a weighted uniform norm, where weighting accounts natural way non-uniformity distribution....
We consider the models Yi,n=∫0i/nσ(s)dWs+τ(i/n)εi,n, and Ỹi,n=σ(i/n)Wi/n+τ(i/n)εi,n, i=1,…,n, where (Wt)t∈[0,1] denotes a standard Brownian motion εi,n are centered i.i.d. random variables with E (εi,n2)=1 finite fourth moment. Furthermore, σ τ unknown deterministic functions (ε1,n,…,εn,n) assumed to be independent processes. Based on spectral decomposition of covariance structures we derive series estimators for σ2 τ2 investigate their rate convergence MISE in dependence smoothness. To...
On étudie l’estimation non-paramétrique du coefficient de diffusion à partir d’observations discrètes, lorsque les observations sont bruitées par un bruit additionnel. De tels problèmes se développés au cours des dix dernières années dans plusieurs champs d’application, en particuler pour la modélisation données haute fréquence finance, cependant plutôt d’un point vue paramétrique ou semi-paramétrique. Ce travail concerne trajectoire (éventuellement stochastique) cadre relativement général....
The random coefficients model is an extension of the linear regression that allows for unobserved heterogeneity in population by modeling as variables. Given data from this model, statistical challenge to recover information about joint density which a multivariate and ill-posed problem. Because curse dimensionality ill-posedness, nonparametric estimation difficult suffers slow convergence rates. Larger features, such increase along some direction or well-accentuated mode can, however, be...
It is well-known that density estimation on the unit interval asymptotically equivalent to a Gaussian white noise experiment, provided densities have H\"older smoothness larger than $1/2$ and are uniformly bounded away from zero. We derive matching lower constructive upper bounds for Le Cam deficiencies between these experiments, with explicit dependence both sample size of in parameter space. As consequence, we sharp conditions how small can be asymptotic equivalence hold. The related case...
We study a class of statistical inverse problems with nonlinear pointwise operators motivated by concrete applications. A two-step procedure is proposed, where the first step smoothes data and inverts nonlinearity. This reduces initial problem to linear deterministic noise, which then solved in second step. The noise reduction based on wavelet thresholding shown be minimax optimal (up logarithmic factors) function-dependent sense. Our analysis modified notion Hölder smoothness scales that...
It is a common phenomenon that for high-dimensional and nonparametric statistical models, rate-optimal estimators balance squared bias variance. Although this balancing widely observed, little known whether methods exist could avoid the trade-off between We propose general strategy to obtain lower bounds on variance of any estimator with smaller than prespecified bound. This shows which extent bias-variance unavoidable allows quantify loss performance do not obey it. The approach based...
Given data from a Poisson point process with intensity $(x,y)\mapston\mathbf{1}(f(x)\leq y)$, frequentist properties for the Bayesian reconstruction of support boundary function $f$ are derived. We mainly study compound priors fixed proving that posterior contracts nearly optimal rate monotone boundaries and adapts to Hölder smooth boundaries. then derive limiting shape result prior space increasing parameter dimension. It is shown marginal mean functional performs an automatic bias...
Recently, significant progress has been made regarding the statistical understanding of artificial neural networks (ANNs). ANNs are motivated by functioning brain, but differ in several crucial aspects. In particular, locality updating rule connection parameters biological (BNNs) makes it biologically implausible that learning brain is based on gradient descent. this work, we look at as a method for supervised learning. The main contribution to relate local BNNs zero-order optimization...