Jiantao Jiao

ORCID: 0000-0003-3766-8031
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Machine Learning and Algorithms
  • Statistical Methods and Inference
  • Advanced Bandit Algorithms Research
  • Reinforcement Learning in Robotics
  • Distributed Sensor Networks and Detection Algorithms
  • Adversarial Robustness in Machine Learning
  • Bayesian Modeling and Causal Inference
  • Wireless Communication Security Techniques
  • Markov Chains and Monte Carlo Methods
  • Bayesian Methods and Mixture Models
  • Statistical Mechanics and Entropy
  • Anomaly Detection Techniques and Applications
  • Advanced Statistical Methods and Models
  • Algorithms and Data Compression
  • Topic Modeling
  • Statistical Methods and Bayesian Inference
  • Blind Source Separation Techniques
  • Control Systems and Identification
  • Stochastic processes and financial applications
  • Cellular Automata and Applications
  • Smart Grid Energy Management
  • Neural Networks and Applications
  • Optimization and Search Problems
  • Sparse and Compressive Sensing Techniques
  • Natural Language Processing Techniques

University of California, Berkeley
2010-2024

New York University
2022

Stanford University
2013-2018

Tsinghua University
2010-2012

We identify a trade-off between robustness and accuracy that serves as guiding principle in the design of defenses against adversarial examples. Although this problem has been widely studied empirically, much remains unknown concerning theory underlying trade-off. In work, we decompose prediction error for examples (robust error) sum natural (classification) boundary error, provide differentiable upper bound using classification-calibrated loss, which is shown to be tightest possible uniform...

10.48550/arxiv.1901.08573 preprint EN other-oa arXiv (Cornell University) 2019-01-01

We propose a general methodology for the construction and analysis of essentially minimax estimators wide class functionals finite dimensional parameters, elaborate on case discrete distributions, where support size S is unknown may be comparable with or even much larger than number observations n. treat respective regions functional nonsmooth smooth separately. In regime, we apply an unbiased estimator best polynomial approximation whereas, in bias-corrected version maximum likelihood...

10.1109/tit.2015.2412945 article EN IEEE Transactions on Information Theory 2015-03-13

Offline reinforcement learning (RL) algorithms seek to learn an optimal policy from a fixed dataset without active data collection. Based on the composition of offline dataset, two main methods are used: imitation which is suitable for expert datasets, and vanilla RL often requires uniform coverage datasets. From practical standpoint, datasets deviate these extremes exact usually unknown. To bridge this gap, we present new framework, called single-policy concentrability, that smoothly...

10.1109/tit.2022.3185139 article EN IEEE Transactions on Information Theory 2022-06-22

Secure aggregation is a critical component in federated learning (FL), which enables the server to learn aggregate model of users without observing their local models. Conventionally, secure algorithms focus only on ensuring privacy individual single training round. We contend that such designs can lead significant leakages over multiple rounds, due partial user selection/participation at each round FL. In fact, we show conventional random selection strategies FL leaking users' models within...

10.1609/aaai.v37i8.26177 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2023-06-26

Four estimators of the directed information rate between a pair jointly stationary ergodic finite-alphabet processes are proposed, based on universal probability assignments. The first one is Shannon--McMillan--Breiman type estimator, similar to those used by Verd\'u (2005) and Cai, Kulkarni, (2006) for estimation other measures. We show almost sure $L_1$ convergence properties estimator any underlying assignment. three map assignments different functionals, each exhibiting relative merits...

10.1109/tit.2013.2267934 article EN IEEE Transactions on Information Theory 2013-07-09

We consider the problem of estimating functionals discrete distributions, and focus on tight nonasymptotic analysis worst case squared error risk widely used estimators. apply concentration inequalities to analyze random fluctuation these estimators around their expectations, theory approximation using positive linear operators deviation expectations from true functional, namely \emph{bias}. characterize incurred by Maximum Likelihood Estimator (MLE) in Shannon entropy $H(P) = \sum_{i 1}^S...

10.1109/tit.2017.2733537 article EN IEEE Transactions on Information Theory 2017-07-31

We consider the problem of minimax estimation entropy a density over Lipschitz balls. Dropping usual assumption that is bounded away from zero, we obtain rates $(n\ln n)^{-s/(s+d)}+n^{-1/2}$ for $0<s\leq 2$ densities supported on $[0,1]^{d}$, where $s$ smoothness parameter and $n$ number independent samples. generalize results to with unbounded support: given an Orlicz functions $\Psi $ rapid growth (such as subexponential sub-Gaussian classes), $-Orlicz norm increase n)^{-s/(s+d)}(\Psi...

10.1214/19-aos1927 article EN The Annals of Statistics 2020-12-01

We consider a natural measure of relevance: the reduction in optimal prediction risk presence side information. For any given loss function, this relevance captures benefit information for performing inference on random variable under function. When such satisfies data processing property, and interest has alphabet size greater than two, we show that it is uniquely characterized by mutual information, corresponding function coincides with logarithmic loss. In doing so, our work provides new...

10.1109/tit.2015.2462848 article EN IEEE Transactions on Information Theory 2015-07-30

We propose a framework to analyze and quantify the bias in adaptive data analysis. It generalizes that proposed by Russo Zou'15, applying measurements whose moment generating function exists, with finite p-norm, general Orlicz spaces. introduce new class of dependence measures which retain key properties mutual information while more effectively quantifying exploration for heavy tailed distributions. provide examples cases where our bounds are nearly tight situations original Zou'15 does not apply.

10.1109/isit.2017.8006774 article EN 2022 IEEE International Symposium on Information Theory (ISIT) 2017-06-01

We consider the problem of discrete distribution estimation under l <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</sub> loss. provide tight upper and lower bounds on maximum risk empirical (the likelihood estimator), minimax in regimes where support size S may grow with number observations n. show that among distributions bounded entropy H, asymptotic for is 2H/ln n, while H/ ln Moreover, we a hard-thresholding estimator oblivious to unknown...

10.1109/tit.2015.2478816 article EN IEEE Transactions on Information Theory 2015-09-15

We consider the problem of estimating $L_1$ distance between two discrete probability measures $P$ and $Q$ from empirical data in a nonasymptotic large alphabet setting. When is known one obtains $n$ samples $P$, we show that for every $Q$, minimax rate-optimal estimator with achieves performance comparable to maximum likelihood (MLE) $n\ln n$ samples. both are unknown, construct estimators whose worst case essentially being uniform, implying uniform most difficult case. The \emph{effective...

10.1109/tit.2018.2846245 article EN publisher-specific-oa IEEE Transactions on Information Theory 2018-06-11

Large Language Models (LLMs) excel at reasoning and planning when trained on chainof-thought (CoT) data, where the step-by-step thought process is explicitly outlined by text tokens. However, this results in lengthy inputs many words support textual coherence rather than core information, processing these consumes substantial computation resources. In work, we propose a hybrid representation of process, partially abstract away initial steps using latent discrete tokens generated VQ-VAE,...

10.48550/arxiv.2502.03275 preprint EN arXiv (Cornell University) 2025-02-05

"Socrates is human. All humans are mortal. Therefore, Socrates mortal." This classical example demonstrates two-hop reasoning, where a conclusion logically follows from two connected premises. While transformer-based Large Language Models (LLMs) can make they tend to collapse random guessing when faced with distracting To understand the underlying mechanism, we train three-layer transformer on synthetic reasoning tasks. The training dynamics show stages: slow learning phase, 3-layer performs...

10.48550/arxiv.2502.13913 preprint EN arXiv (Cornell University) 2025-02-19

New methods are needed for the nondestructive measurement of tooth demineralization and remineralization to monitor progression incipient caries lesions (tooth decay) effective nonsurgical intervention evaluate performance anticaries treatments such as chemical or laser irradiation. Studies have shown that optical coherence tomography (OCT) has great potential fulfill this role since it can be used measure depth severity early with an axial resolution exceeding 10 µm, is easy apply in vivo...

10.1109/jstqe.2009.2033610 article EN IEEE Journal of Selected Topics in Quantum Electronics 2010-04-08

Generative Adversarial Networks (GANs) are a thriving unsupervised machine learning technique that has led to significant advances in various fields such as computer vision, natural language processing, among others. However, GANs known be difficult train and usually suffer from mode collapse the discriminator winning problem. To interpret empirical observations of design better ones, we deconstruct study into three components make following contributions. Formulation: propose perturbation...

10.1109/tit.2020.2983698 article EN publisher-specific-oa IEEE Transactions on Information Theory 2020-03-27

We consider the problem of estimating L <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</sub> distance between two discrete probability measures P and Q from empirical data in a nonasymptotic large alphabet setting. construct minimax rate-optimal estimators for (P,Q) when is either known or unknown, show that performance optimal with n samples essentially Maximum Likelihood Estimators (MLE) ln samples. Hence, we demonstrate effective sample...

10.1109/isit.2016.7541399 article EN 2022 IEEE International Symposium on Information Theory (ISIT) 2016-07-01

We study the minimax estimation of $\alpha$-divergences between discrete distributions for integer $\alpha\ge 1$, which include Kullback--Leibler divergence and $\chi^2$-divergences as special examples. Dropping usual theoretical tricks to acquire independence, we construct first rate-optimal estimator does not require any Poissonization, sample splitting, or explicit construction approximating polynomials. The uses a hybrid approach solves problem-independent linear program based on moment...

10.1109/jsait.2020.3041036 article EN publisher-specific-oa IEEE Journal on Selected Areas in Information Theory 2020-11-01

Reinforcement learning (RL) provides a theoretical framework for continuously improving an agent's behavior via trial and error. However, efficiently policies from scratch can be very difficult, particularly tasks with exploration challenges. In such settings, it might desirable to initialize RL existing policy, offline data, or demonstrations. naively performing initialization in often works poorly, especially value-based methods. this paper, we present meta algorithm that use...

10.48550/arxiv.2204.02372 preprint EN other-oa arXiv (Cornell University) 2022-01-01

We provide a theoretical framework for Reinforcement Learning with Human Feedback (RLHF). Our analysis shows that when the true reward function is linear, widely used maximum likelihood estimator (MLE) converges under both Bradley-Terry-Luce (BTL) model and Plackett-Luce (PL) model. However, we show training policy based on learned model, MLE fails while pessimistic provides policies improved performance certain coverage assumptions. Additionally, demonstrate PL an alternative splits...

10.48550/arxiv.2301.11270 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Generative AI's expanding footprint across numerous industries has led to both excitement and increased scrutiny. This paper delves into the unique security challenges posed by AI, outlines potential research directions for managing these risks.

10.48550/arxiv.2402.12617 preprint EN arXiv (Cornell University) 2024-02-19

Four problems related to information divergence measures defined on finite alphabets are considered. In three of the cases we consider, illustrate a contrast which arises between binary-alphabet and larger-alphabet settings. This is surprising in some instances, since characterizations for settings do not generalize their counterparts. Specifically, show that $f$-divergences unique decomposable divergences binary satisfy data processing inequality, thereby clarifying claims have previously...

10.1109/tit.2014.2360184 article EN IEEE Transactions on Information Theory 2014-09-25

We consider estimating the Shannon entropy of a discrete distribution P from n i.i.d. samples. Recently, Jiao, Venkat, Han, and Weissman (JVHW), Wu Yang constructed approximation theoretic estimators that achieve minimax L <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sub> rates in entropy. Their are consistent given ≫ S/lnS samples, where S is support size, it best possible sample complexity. In contrast, Maximum Likelihood Estimator...

10.1109/isit.2015.7282680 article EN 2022 IEEE International Symposium on Information Theory (ISIT) 2015-06-01

Robust statistics traditionally focuses on outliers, or perturbations in total variation distance. However, a dataset could be corrupted many other ways, such as systematic measurement errors and missing covariates. We generalize the robust approach to consider under any Wasserstein distance, show that estimation is possible whenever distribution's population are certain family of friendly perturbations. This generalizes property called resilience previously employed special case mean with...

10.48550/arxiv.1909.08755 preprint EN other-oa arXiv (Cornell University) 2019-01-01
Coming Soon ...