NFDI4DS | UHH-SEMS - Publication Details

Vitaly Feldman

ORCID: 0000-0002-3904-759X

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5054675941

Research Areas

Machine Learning and Algorithms
Privacy-Preserving Technologies in Data
Complexity and Algorithms in Graphs
Stochastic Gradient Optimization Techniques
Algorithms and Data Compression
Machine Learning and Data Classification
Cryptography and Data Security
Imbalanced Data Classification Techniques
Bayesian Modeling and Causal Inference
Optimization and Search Problems
Statistical Methods and Inference
Sparse and Compressive Sensing Techniques
Adversarial Robustness in Machine Learning
Internet Traffic Analysis and Secure E-voting
Advanced Bandit Algorithms Research
Domain Adaptation and Few-Shot Learning
Neural Networks and Applications
Computability, Logic, AI Algorithms
Distributed Sensor Networks and Detection Algorithms
Evolutionary Algorithms and Applications
semigroups and automata theory
Advanced Causal Inference Techniques
Multi-Criteria Decision Making
Constraint Satisfaction and Optimization
Error Correcting Code Techniques

Apple (United States)
2021-2024

Fukushima Prefectural Culture Center
2024

Apple (United Kingdom)
2023-2024

Google (United States)
2018-2021

Apple (Israel)
2020-2021

IBM (United States)
2005-2019

Berkeley College
2018

University of California, Berkeley
2018

University of Birmingham
2018

IBM Research - Almaden
2008-2017

The reusable holdout: Preserving validity in adaptive data analysis

OPENALEX - Publications

Cynthia Dwork Vitaly Feldman Moritz Hardt Toniann Pitassi Omer Reingold and 1 more

Testing hypotheses privately Large data sets offer a vast scope for testing already-formulated ideas and exploring new ones. Unfortunately, researchers who attempt to do both on the same set run risk of making false discoveries, even when exploration are carried out distinct subsets data. Based drawn from differential privacy, Dwork et al. now provide theoretical solution. Ideas tested against aggregate information, whereas individual components remain confidential. Preserving that privacy...

10.1126/science.aaa9375 article EN Science 2015-08-06

Preserving Statistical Validity in Adaptive Data Analysis

OPENALEX - Publications

Cynthia Dwork Vitaly Feldman Moritz Hardt Toniann Pitassi Omer Reingold and 1 more

A great deal of effort has been devoted to reducing the risk spurious scientific discoveries, from use sophisticated validation techniques, deep statistical methods for controlling false discovery rate in multiple hypothesis testing. However, there is a fundamental disconnect between theoretical results and practice data analysis: theory inference assumes fixed collection hypotheses be tested, or learning algorithms applied, selected non-adaptively before are gathered, whereas shared reused...

10.1145/2746539.2746580 article EN 2015-06-03

Cognitive computing building block: A versatile and efficient digital neuron model for neurosynaptic cores

OPENALEX - Publications

Andrew Cassidy Paul Merolla John V. Arthur Steve K. Esser Bryan L. Jackson and 11 more

Marching along the DARPA SyNAPSE roadmap, IBM unveils a trilogy of innovations towards TrueNorth cognitive computing system inspired by brain's function and efficiency. Judiciously balancing dual objectives functional capability implementation/operational cost, we develop simple, digital, reconfigurable, versatile spiking neuron model that supports one-to-one equivalence between hardware simulation is implementable using only 1272 ASIC gates. Starting with classic leaky integrate-and-fire...

10.1109/ijcnn.2013.6707077 article EN 2022 International Joint Conference on Neural Networks (IJCNN) 2013-08-01

Does learning require memorization? a short tale about a long tail

OPENALEX - Publications

Vitaly Feldman

State-of-the-art results on image recognition tasks are achieved using over-parameterized learning algorithms that (nearly) perfectly fit the training set and known to well even random labels. This tendency memorize seemingly useless data labels is not explained by existing theoretical analyses. Memorization of also presents significant privacy risks when contains sensitive personal information thus it important understand whether such memorization necessary for accurate learning.

10.1145/3357713.3384290 article EN 2020-06-07

Amplification by shuffling: from local to central differential privacy via anonymity

OPENALEX - Publications

Úlfar Erlingsson Vitaly Feldman Ilya Mironov Ananth Raghunathan Kunal Talwar and 1 more

Sensitive statistics are often collected across sets of users, with repeated collection reports done over time. For example, trends in users' private preferences or software usage may be monitored via such reports. We study the local differential privacy (LDP) model, and describe an algorithm whose cost is polylogarithmic number changes to a user's value.More fundamentally---by building on anonymity reports---we also demonstrate how our LDP can actually much lower when viewed central model...

10.5555/3310435.3310586 article EN 2019-01-06

Agnostic Learning of Monomials by Halfspaces Is Hard

OPENALEX - Publications

Vitaly Feldman Venkatesan Guruswami Prasad Raghavendra Yi Wu

We prove the following strong hardness result for learning: Given a distribution of labeled examples from hypercube such that there exists monomial consistent with $(1-\epsilon)$ it is NP-hard to find halfspace correct on $(1/2+\epsilon)$ arbitrary constants $\epsilon>0$. In learning theory terms, weak agnostic monomials hard, even if one allowed output hypothesis much bigger concept class halfspaces. This subsumes long line previous results, including two recent results proper and As an...

10.1137/120865094 article EN SIAM Journal on Computing 2012-01-01

Statistical Algorithms and a Lower Bound for Detecting Planted Cliques

OPENALEX - Publications

Vitaly Feldman Elena Grigorescu Lev Reyzin Santosh Vempala Ying Xiao

We introduce a framework for proving lower bounds on computational problems over distributions against algorithms that can be implemented using access to statistical query oracle. For such algorithms, the input distribution is limited obtaining an estimate of expectation any given function sample drawn randomly from rather than directly accessing samples. Most natural interest in theory and practice, example, moments-based methods, local search, standard iterative methods convex...

10.1145/3046674 article EN Journal of the ACM 2017-04-15

What Neural Networks Memorize and Why: Discovering the Long Tail via Influence Estimation

OPENALEX - Publications

Vitaly Feldman Chiyuan Zhang

Deep learning algorithms are well-known to have a propensity for fitting the training data very well and often fit even outliers mislabeled points. Such requires memorization of labels, phenomenon that has attracted significant research interest but not been given compelling explanation so far. A recent work Feldman (2019) proposes theoretical this based on combination two insights. First, natural image distributions (informally) known be long-tailed, is fraction rare atypical examples....

10.48550/arxiv.2008.03703 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Hiding Among the Clones: A Simple and Nearly Optimal Analysis of Privacy Amplification by Shuffling

OPENALEX - Publications

Vitaly Feldman Audra McMillan Kunal Talwar

Recent work of Erlingsson, Feldman, Mironov, Raghunathan, Talwar, and Thakurta [1] demonstrates that random shuffling amplifies differential privacy guarantees locally randomized data. Such amplification implies substan-tially stronger for systems in which data is contributed anonymously [2] has lead to significant interest the shuffle model [3], [1]. We give a characterization guarantee <tex xmlns:mml="http://www.w3.org/1998/Math/MathML"...

10.1109/focs52979.2021.00096 preprint EN 2022-02-01

New Results for Learning Noisy Parities and Halfspaces

OPENALEX - Publications

Vitaly Feldman Parikshit Gopalan Subhash Khot Ashok Kumar Ponnuswami

We address well-studied problems concerning the learn-ability of parities and halfspaces in presence classification noise. Learning under uniform distribution with random noise, also called noisy parity problem is a famous open computational learning. reduce number basic regarding learning to parities. show that distribution, adversarial noise reduces Together algorithm Blum et al. (2003), this gives first nontrivial for DNF expressions just logarithmic variables. k-juntas k These reductions...

10.1109/focs.2006.51 article EN 2006-01-01

Statistical algorithms and a lower bound for detecting planted cliques

OPENALEX - Publications

Vitaly Feldman Elena Grigorescu Lev Reyzin Santosh Vempala Ying Xiao

We introduce a framework for proving lower bounds on computational problems over distributions, based class of algorithms called statistical algorithms. For such algorithms, access to the input distribution is limited obtaining an estimate expectation any given function sample drawn randomly from distribution, rather than directly accessing samples. Most natural interest in theory and practice, e.g., moments-based methods, local search, standard iterative methods convex optimization, MCMC...

10.1145/2488608.2488692 article EN 2013-05-28

Privacy Amplification by Iteration

OPENALEX - Publications

Vitaly Feldman Ilya Mironov Kunal Talwar Abhradeep Thakurta

Many commonly used learning algorithms work by iteratively updating an intermediate solution using one or a few data points in each iteration. Analysis of differential privacy for such often involves ensuring step and then reasoning about the cumulative cost algorithm. This is enabled composition theorems that allow releasing all results. In this work, we demonstrate contractive iterations, not results strongly amplifies guarantees. We describe several applications new analysis technique to...

10.1109/focs.2018.00056 preprint EN 2018-10-01

Private stochastic convex optimization: optimal rates in linear time

OPENALEX - Publications

Vitaly Feldman Tomer Koren Kunal Talwar

We study differentially private (DP) algorithms for stochastic convex optimization: the problem of minimizing population loss given i.i.d. samples from a distribution over functions. A recent work Bassily et al. (2019) has established optimal bound on excess achievable n samples. Unfortunately, their algorithm achieving this is relatively inefficient: it requires O(min{n 3/2, 5/2/d}) gradient computations, where d dimension optimization problem.

10.1145/3357713.3384335 article EN 2020-06-07

On Agnostic Learning of Parities, Monomials, and Halfspaces

OPENALEX - Publications

Vitaly Feldman Parikshit Gopalan Subhash Khot Ashok Kumar Ponnuswami

We study the learnability of several fundamental concept classes in agnostic learning framework [D. Haussler, Inform. and Comput., 100 (1992), pp. 78–150] [M. Kearns, R. Schapire, L. Sellie, Machine Learning, 17 (1994), 115–141]. show that under uniform distribution, agnostically parities reduce to with random classification noise, commonly referred as noisy parity problem. Together algorithm [A. Blum, A. Kalai, H. Wasserman, J. ACM, 50 (2003), 506–519], this gives first nontrivial for...

10.1137/070684914 article EN SIAM Journal on Computing 2009-01-01

Generalization in Adaptive Data Analysis and Holdout Reuse

OPENALEX - Publications

Cynthia Dwork Vitaly Feldman Moritz Hardt Toniann Pitassi Omer Reingold and 1 more

Overfitting is the bane of data analysts, even when are plentiful. Formal approaches to understanding this problem focus on statistical inference and generalization individual analysis procedures. Yet practice an inherently interactive adaptive process: new analyses hypotheses proposed after seeing results previous ones, parameters tuned basis obtained results, datasets shared reused. An investigation gap has recently been initiated by authors in (Dwork et al., 2014), where we focused...

10.48550/arxiv.1506.02629 preprint EN other-oa arXiv (Cornell University) 2015-01-01

Evolvability from learning algorithms

OPENALEX - Publications

Vitaly Feldman

Valiant has recently introduced a framework for analyzing the capabilities and limitations of evolutionary process random change guided by selection. In his acquiring complex functionality is viewed as substantially restricted form PAC learning an unknown function from certain set functions. showed that classes functions evolvable in model are also learnable statistical query (SQ) Kearns asked whether converse true.

10.1145/1374376.1374465 article EN 2008-05-17

On the Complexity of Random Satisfiability Problems with Planted Solutions

OPENALEX - Publications

Vitaly Feldman Will Perkins Santosh Vempala

The problem of identifying a planted assignment given random k-SAT formula consistent with the exhibits large algorithmic gap: while solution can always be identified O(n log n) clauses, there are distributions over clauses for which best known efficient algorithms require nk/2 clauses. We propose and study unified model k-SAT, captures well-known special cases. An instance is described by σ distribution on k literals. define its complexity as largest r not r-wise independent (1 ≤ any assignment).

10.1145/2746539.2746577 article EN 2015-06-03

Statistical query algorithms for mean vector estimation and stochastic convex optimization

OPENALEX - Publications

Vitaly Feldman Cristóbal Guzmán Santosh Vempala

Stochastic convex optimization, where the objective is expectation of a random function, an important and widely used method with numerous applications in machine learning, statistics, operations research other areas. We study complexity stochastic optimization given only statistical query (SQ) access to function. show that well-known popular first-order iterative methods can be implemented using queries. For many cases interest we derive nearly matching upper lower bounds on estimation...

10.5555/3039686.3039768 article EN Symposium on Discrete Algorithms 2017-01-16

High probability generalization bounds for uniformly stable algorithms with nearly optimal rate

OPENALEX - Publications

Vitaly Feldman J. Vondrák

Algorithmic stability is a classical approach to understanding and analysis of the generalization error learning algorithms. A notable weakness most stability-based bounds that they hold only in expectation. Generalization with high probability has been established landmark paper Bousquet Elisseeff (2002) albeit at expense an additional $\sqrt{n}$ factor bound. Specifically, their bound on estimation any $γ$-uniformly stable algorithm $n$ samples range $[0,1]$ $O(γ\sqrt{n \log(1/δ)} +...

10.48550/arxiv.1902.10710 preprint EN other-oa arXiv (Cornell University) 2019-01-01

Coming Soon ...