Hengshuai Yao

ORCID: 0000-0003-1258-1845
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Reinforcement Learning in Robotics
  • Advanced Bandit Algorithms Research
  • Age of Information Optimization
  • Advanced Neural Network Applications
  • Neural Networks and Applications
  • Domain Adaptation and Few-Shot Learning
  • Autonomous Vehicle Technology and Safety
  • Adversarial Robustness in Machine Learning
  • Adaptive Dynamic Programming Control
  • Evolutionary Algorithms and Applications
  • Explainable Artificial Intelligence (XAI)
  • Machine Learning and Algorithms
  • Artificial Intelligence in Games
  • Machine Learning and Data Classification
  • Optimization and Search Problems
  • Simulation Techniques and Applications
  • Model Reduction and Neural Networks
  • Advanced Control Systems Optimization
  • Human Pose and Action Recognition
  • Reservoir Engineering and Simulation Methods
  • Stochastic Gradient Optimization Techniques
  • Sparse and Compressive Sensing Techniques
  • Robotic Path Planning Algorithms
  • Multimodal Machine Learning Applications
  • Advanced Memory and Neural Computing

University of Alberta
2009-2023

Huawei Technologies (Canada)
2018-2021

Huawei Technologies (France)
2019-2021

Huawei Technologies (China)
2019-2020

Huawei Technologies (United Kingdom)
2019

City University of Hong Kong
2008-2009

Tsinghua University
2006

Autonomous driving has achieved significant milestones in research and development over the last two decades. There is increasing interest field as deployment of autonomous vehicles (AVs) promises safer more ecologically friendly transportation systems. With rapid progress computationally powerful artificial intelligence (AI) techniques, AVs can sense their environment with high precision, make safe real-time decisions, operate reliably without human intervention. However, intelligent...

10.1109/access.2024.3431437 article EN cc-by-nc-nd IEEE Access 2024-01-01

Autonomous driving has achieved significant milestones in research and development over the last decade. There is increasing interest field as deployment of self-operating vehicles promises safer more ecologically friendly transportation systems. With rise computationally powerful artificial intelligence (AI) techniques, autonomous can sense their environment with high precision, make safe real-time decisions, operate reliably without human intervention. However, intelligent decision-making...

10.48550/arxiv.2112.11561 preprint EN cc-by arXiv (Cornell University) 2021-01-01

The rapid growth of research in explainable artificial intelligence (XAI) follows on two substantial developments. First, the enormous application success modern machine learning methods, especially deep and reinforcement learning, have created high expectations for industrial, commercial, social value. Second, emerging growing concern creating ethical trusted AI systems, including compliance with regulatory principles to ensure transparency trust. These threads a kind “perfect storm”...

10.3390/make3040045 article EN cc-by Machine Learning and Knowledge Extraction 2021-11-18

Significant progress has been made recently in developing few-shot object segmentation methods. Learning is shown to be successful settings, using pixel-level, scribbles and bounding box supervision. This paper takes another approach, i.e., only requiring image-level label for segmentation. We propose a novel multi-modal interaction module that utilizes co-attention mechanism both visual word embedding. Our model labels achieves 4.8% improvement over previously proposed It also outperforms...

10.24963/ijcai.2020/120 article EN 2020-07-01

In distributional reinforcement learning (RL), the estimated distribution of value function models both parametric and intrinsic uncertainties. We propose a novel efficient exploration method for deep RL that has two components. The first is decaying schedule to suppress uncertainty. second an bonus calculated from upper quantiles learned distribution. Atari 2600 games, our outperforms QR-DQN in 12 out 14 hard games (achieving 483 \% average gain across 49 cumulative rewards over with big...

10.48550/arxiv.1905.06125 preprint EN other-oa arXiv (Cornell University) 2019-01-01

Discounted reinforcement learning is fundamentally incompatible with function approximation for control in continuing tasks. It not an optimization problem its usual formulation, so when using there no optimal policy. We substantiate these claims, then go on to address some misconceptions about discounting and connection the average reward formulation. encourage researchers adopt rigorous approaches, such as maximizing reward,

10.48550/arxiv.1910.02140 preprint EN other-oa arXiv (Cornell University) 2019-01-01

We present the first provably convergent two-timescale off-policy actor-critic algorithm (COF-PAC) with function approximation. Key to COF-PAC is introduction of a new critic, emphasis which trained via Gradient Emphasis Learning (GEM), novel combination key ideas Temporal Difference and Emphatic Learning. With help critic canonical value we show convergence for COF-PAC, where critics are linear actor can be nonlinear.

10.48550/arxiv.1911.04384 preprint EN other-oa arXiv (Cornell University) 2019-01-01

We consider the problem of mapless collision-avoidance navigation where humans are present using 2D laser scans. Our proposed method uses ego-safety to measure collision from robot's perspective and social-safety impact actions on surrounding pedestrians. Specifically, part predicts intrusion action into interaction area with humans. train policy reinforcement learning a simple simulator directly evaluate learned in Gazebo real robot tests. Experiments show smoothly transferred different...

10.1109/icra40945.2020.9197148 preprint EN 2020-05-01

The popular Proximal Policy Optimization (PPO) algorithm approximates the solution in a clipped policy space. Does there exist better policies outside of this space? By using novel surrogate objective that employs sigmoid function (which provides an interesting way exploration), we found answer is "YES", and are fact located very far from We show PPO insufficient "off-policyness", according to off-policy metric called DEON. Our explores much larger space than PPO, it maximizes Conservative...

10.1609/aaai.v37i6.25864 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2023-06-26

In this paper, we propose the Quantile Option Architecture (QUOTA) for exploration based on recent advances in distributional reinforcement learning (RL). QUOTA, decision making is quantiles of a value distribution, not only mean. QUOTA provides new dimension via use both optimism and pessimism distribution. We demonstrate performance advantage challenging video games physical robot simulators.

10.1609/aaai.v33i01.33015797 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2019-07-17

In this paper, we propose an actor ensemble algorithm, named ACE, for continuous control with a deterministic policy in reinforcement learning. use (i.e., multiple actors) to search the global maxima of critic. Besides perspective, also formulate ACE option framework by extending option-critic architecture intra-option policies, revealing relationship between and options. Furthermore, perform look-ahead tree those actors learned value prediction model, resulting refined estimation. We...

10.1609/aaai.v33i01.33015789 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2019-07-17

In this paper we introduce the concept of pseudo-MDPs to develop abstractions. Pseudo-MDPs relax requirement that transition kernel has be a probability kernel. We show new framework captures many existing also factored linear action models; special case. Again, relation models and works are discussed. use general theory for bounding suboptimality policies derived from pseudo-MDPs. Specializing framework, recover results. give leastsquares approach constrained optimization learning model as...

10.1109/adprl.2014.7010633 article EN 2014-12-01

In deep neural network, the cross-entropy loss function is commonly used for classification. Minimizing equivalent to maximizing likelihood under assumptions of uniform feature and class distributions. It belongs generative training criteria which does not directly discriminate correct from competing classes. We propose a discriminative with negative log ratio between significantly outperforms on CIFAR-10 image classification task.

10.48550/arxiv.1804.10690 preprint EN other-oa arXiv (Cornell University) 2018-01-01

In this paper we consider the problem of finding a good policy given some batch data.We propose new approach, LAM-API, that first builds so-called linear action model (LAM) from data and then uses learned collected in approximate iteration (API) to find policy.A natural choice for evaluation step algorithm is use least-squares temporal difference (LSTD) learning algorithm.Empirical results on three benchmark problems show particular instance LAM-API performs competitively as compared with...

10.1609/aaai.v26i1.8319 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2021-09-20

This paper extends many of the recent popular policy evaluation algorithms to a generalized framework that includes least-squares temporal difference (LSTD) learning, (LSPE) and variant incremental LSTD (iLSTD). The basis this extension is preconditioning technique solves stochastic model equation. also studies three significant issues new framework: it presents rule step-size can be computed online, provides an iterative way apply preconditioning, reduces complexity related near (TD) learning.

10.1145/1390156.1390308 article EN 2008-01-01

Learning an effective representation for high-dimensional data is a challenging problem in reinforcement learning (RL). Deep (DRL) such as Q networks (DQN) achieves remarkable success computer games by deeply encoded from convolution networks. In this paper, we propose simple yet very method with DRL algorithms. Our key insight that features learned algorithms are highly correlated, which interferes learning. By adding regularized loss penalizes correlation latent (with only slight...

10.48550/arxiv.1903.07765 preprint EN other-oa arXiv (Cornell University) 2019-01-01

Dyna is an architecture for model based reinforcement learning (RL), where simulated experience from a used to update policies or value functions. A key component of search control, the mechanism generate state and action which agent queries model, remains largely unexplored. In this work, we propose such states by using trajectory obtained Hill Climbing (HC) current estimate function. This has effect propagating high regions preemptively updating estimates that likely visit next. We derive...

10.24963/ijcai.2019/445 preprint EN 2019-07-28

Recent developments in Neural Architecture Search (NAS) resort to training the supernet of a predefined search space with weight sharing speed up architecture evaluation. These include random schemes, as well various schemes based on optimization or reinforcement learning, particular policy gradient, that aim optimize parametric distribution and shared model weights simultaneously. In this paper, we focus efficiently exploring important region neural learning. We propose Deep Deterministic...

10.1109/access.2021.3101975 article EN cc-by IEEE Access 2021-01-01
Coming Soon ...