- Stochastic Gradient Optimization Techniques
- Matrix Theory and Algorithms
- Advanced Optimization Algorithms Research
- Bayesian Modeling and Causal Inference
- Data Stream Mining Techniques
- Machine Learning and Data Classification
- Machine Learning and Algorithms
- Anomaly Detection Techniques and Applications
- Cloud Computing and Resource Management
- Mathematical Approximation and Integration
- Advanced Neural Network Applications
- Advanced Statistical Process Monitoring
- Sparse and Compressive Sensing Techniques
- Distributed and Parallel Computing Systems
- Distributed Control Multi-Agent Systems
- Numerical methods in inverse problems
- Mathematical functions and polynomials
- Adversarial Robustness in Machine Learning
- Computer Graphics and Visualization Techniques
- Nonlinear Dynamics and Pattern Formation
- Medical Image Segmentation Techniques
- Iterative Methods for Nonlinear Equations
- Advanced Mathematical Modeling in Engineering
- Neural Networks and Reservoir Computing
- IoT and Edge/Fog Computing
École Polytechnique Fédérale de Lausanne
2019-2024
Sign-based algorithms (e.g. signSGD) have been proposed as a biased gradient compression technique to alleviate the communication bottleneck in training large neural networks across multiple workers. We show simple convex counter-examples where signSGD does not converge optimum. Further, even when it converge, may generalize poorly compared with SGD. These issues arise because of nature sign operator. then that using error-feedback, i.e. incorporating error made by operator into next step,...
Convergence guarantees for optimization over bounded-rank matrices are delicate to obtain because the feasible set is a non-smooth and non-convex algebraic variety. Existing techniques include projected gradient descent, fixed-rank (over maximal-rank stratum), LR parameterization. They all lack either global (the ability accumulate only at critical points) or fast local convergence (e.g., if limit has non-maximal rank). We seek algorithms that enjoy both. Khrulkov Oseledets [2018]...
Optimization algorithms can see their local convergence rates deteriorate when the Hessian at optimum is singular. These singularities are inescapable optima non-isolated. Yet, under right circumstances, several preserve favorable even form a continuum (e.g., due to over-parameterization). This has been explained various structural assumptions, including Polyak--{\L}ojasiewicz inequality, Quadratic Growth and Error Bound. We show that, for cost functions which twice continuously...
We propose a simple yet effective policy for the predictive auto-scaling of horizontally scalable applications running in cloud environments, where compute resources can only be added with delay, and deployment throughput is limited. Our uses probabilistic forecast workload to make scaling decisions dependent on risk aversion application owner. show our experiments using real-world synthetic data that this compares favorably mathematically more sophisticated approaches as well benchmark policies.
We consider the dynamics of $n$ points on a sphere in $\mathbb{R}^d$ ($d \geq 2$) which attract each other according to function $\varphi$ their inner products. When is linear ($\varphi(t) = t$), converge common value (i.e., synchronize) various connectivity scenarios: this part classical work Kuramoto oscillator networks. exponential e^{\beta t}$), these correspond limit how idealized transformers process data, as described by Geshkovski et al. (2024). Accordingly, they ask whether...
Abstract Trust-region methods (TR) can converge quadratically to minima where the Hessian is positive definite. However, if are not isolated, then there cannot be The weaker Polyak–Łojasiewicz (PŁ) condition compatible with non-isolated minima, and it enough for many algorithms preserve good local behavior. Yet, TR an exact subproblem solver lacks even basic features such as a capture theorem under PŁ. In practice, popular inexact truncated conjugate gradient method (tCG). Empirically,...
We describe a series of algorithms that efficiently implement Gaussian model-X knockoffs to control the false discovery rate on large-scale feature selection problems. Identifying knockoff distribution requires solving semidefinite program for which we derive several efficient methods. One handles generic covariance matrices and has complexity scaling as $\mathcal{O}(p^3)$, where $p$ is ambient dimension, while another assumes rank-$k$ factor model matrix reduce this bound...
Trust-region methods (TR) can converge quadratically to minima where the Hessian is positive definite. However, if are not isolated, then there cannot be The weaker Polyak$\unicode{x2013}${\L}ojasiewicz (P{\L}) condition compatible with non-isolated minima, and it enough for many algorithms preserve good local behavior. Yet, TR an $\textit{exact}$ subproblem solver lacks even basic features such as a capture theorem under P{\L}. In practice, popular $\textit{inexact}$ truncated conjugate...
We describe a series of algorithms that efficiently implement Gaussian model-X knockoffs to control the false discovery rate on large scale feature selection problems. Identifying knockoff distribution requires solving semidefinite program for which we derive several efficient methods. One handles generic covariance matrices, has complexity scaling as $O(p^3)$ where $p$ is ambient dimension, while another assumes rank $k$ factor model matrix reduce this bound $O(pk^2)$. also procedures both...
This article proposes novel rules for false discovery rate control (FDRC) geared towards online anomaly detection in time series. Online FDRC allow to the properties of a sequence statistical tests. In context detection, null hypothesis is that an observation normal and alternative it anomalous. users target lower bound on precision unsupervised settings. The methods proposed this overcome short-comings previous particular ensuring power remains high even when exceedingly rare (typical...