NFDI4DS | UHH-SEMS - Publication Details

Hengshuai Yao

ORCID: 0000-0003-1258-1845

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5050876115

Research Areas

Reinforcement Learning in Robotics
Advanced Bandit Algorithms Research
Age of Information Optimization
Advanced Neural Network Applications
Neural Networks and Applications
Domain Adaptation and Few-Shot Learning
Autonomous Vehicle Technology and Safety
Adversarial Robustness in Machine Learning
Adaptive Dynamic Programming Control
Evolutionary Algorithms and Applications
Explainable Artificial Intelligence (XAI)
Machine Learning and Algorithms
Artificial Intelligence in Games
Machine Learning and Data Classification
Optimization and Search Problems
Simulation Techniques and Applications
Model Reduction and Neural Networks
Advanced Control Systems Optimization
Human Pose and Action Recognition
Reservoir Engineering and Simulation Methods
Stochastic Gradient Optimization Techniques
Sparse and Compressive Sensing Techniques
Robotic Path Planning Algorithms
Multimodal Machine Learning Applications
Advanced Memory and Neural Computing

University of Alberta
2009-2023

Huawei Technologies (Canada)
2018-2021

Huawei Technologies (France)
2019-2021

Huawei Technologies (China)
2019-2020

Huawei Technologies (United Kingdom)
2019

City University of Hong Kong
2008-2009

Tsinghua University
2006

Explainable Artificial Intelligence for Autonomous Driving: A Comprehensive Overview and Field Guide for Future Research Directions

OPENALEX - Publications

Shahin Atakishiyev Mohammad Salameh Hengshuai Yao Randy Goebel

Autonomous driving has achieved significant milestones in research and development over the last two decades. There is increasing interest field as deployment of autonomous vehicles (AVs) promises safer more ecologically friendly transportation systems. With rapid progress computationally powerful artificial intelligence (AI) techniques, AVs can sense their environment with high precision, make safe real-time decisions, operate reliably without human intervention. However, intelligent...

10.1109/access.2024.3431437 article EN cc-by-nc-nd IEEE Access 2024-01-01

Explainable Artificial Intelligence for Autonomous Driving: A Comprehensive Overview and Field Guide for Future Research Directions

OPENALEX - Publications

Shahin Atakishiyev Mohammad Salameh Hengshuai Yao Randy Goebel

Autonomous driving has achieved significant milestones in research and development over the last decade. There is increasing interest field as deployment of self-operating vehicles promises safer more ecologically friendly transportation systems. With rise computationally powerful artificial intelligence (AI) techniques, autonomous can sense their environment with high precision, make safe real-time decisions, operate reliably without human intervention. However, intelligent decision-making...

10.48550/arxiv.2112.11561 preprint EN cc-by arXiv (Cornell University) 2021-01-01

A Multi-Component Framework for the Analysis and Design of Explainable Artificial Intelligence

OPENALEX - Publications

Miyoung Kim Shahin Atakishiyev Housam Khalifa Bashier Babiker Nawshad Farruque Randy Goebel and 6 more

The rapid growth of research in explainable artificial intelligence (XAI) follows on two substantial developments. First, the enormous application success modern machine learning methods, especially deep and reinforcement learning, have created high expectations for industrial, commercial, social value. Second, emerging growing concern creating ethical trusted AI systems, including compliance with regulatory principles to ensure transparency trust. These threads a kind “perfect storm”...

10.3390/make3040045 article EN cc-by Machine Learning and Knowledge Extraction 2021-11-18

Weakly Supervised Few-shot Object Segmentation using Co-Attention with Visual and Semantic Embeddings

OPENALEX - Publications

Mennatullah Siam Naren Doraiswamy Boris N. Oreshkin Hengshuai Yao Martin Jägersand

Significant progress has been made recently in developing few-shot object segmentation methods. Learning is shown to be successful settings, using pixel-level, scribbles and bounding box supervision. This paper takes another approach, i.e., only requiring image-level label for segmentation. We propose a novel multi-modal interaction module that utilizes co-attention mechanism both visual word embedding. Our model labels achieves 4.8% improvement over previously proposed It also outperforms...

10.24963/ijcai.2020/120 article EN 2020-07-01

Distributional Reinforcement Learning for Efficient Exploration

OPENALEX - Publications

Borislav Mavrin Shangtong Zhang Hengshuai Yao Linglong Kong Kaiwen Wu and 1 more

In distributional reinforcement learning (RL), the estimated distribution of value function models both parametric and intrinsic uncertainties. We propose a novel efficient exploration method for deep RL that has two components. The first is decaying schedule to suppress uncertainty. second an bonus calculated from upper quantiles learned distribution. Atari 2600 games, our outperforms QR-DQN in 12 out 14 hard games (achieving 483 \% average gain across 49 cumulative rewards over with big...

10.48550/arxiv.1905.06125 preprint EN other-oa arXiv (Cornell University) 2019-01-01

Discounted Reinforcement Learning Is Not an Optimization Problem

OPENALEX - Publications

Abhishek Naik Roshan Shariff Niko Yasui Hengshuai Yao Richard S. Sutton

Discounted reinforcement learning is fundamentally incompatible with function approximation for control in continuing tasks. It not an optimization problem its usual formulation, so when using there no optimal policy. We substantiate these claims, then go on to address some misconceptions about discounting and connection the average reward formulation. encourage researchers adopt rigorous approaches, such as maximizing reward,

10.48550/arxiv.1910.02140 preprint EN other-oa arXiv (Cornell University) 2019-01-01

Provably Convergent Two-Timescale Off-Policy Actor-Critic with Function Approximation

OPENALEX - Publications

Shangtong Zhang Bo Liu Hengshuai Yao Shimon Whiteson

We present the first provably convergent two-timescale off-policy actor-critic algorithm (COF-PAC) with function approximation. Key to COF-PAC is introduction of a new critic, emphasis which trained via Gradient Emphasis Learning (GEM), novel combination key ideas Temporal Difference and Emphatic Learning. With help critic canonical value we show convergence for COF-PAC, where critics are linear actor can be nonlinear.

10.48550/arxiv.1911.04384 preprint EN other-oa arXiv (Cornell University) 2019-01-01

Mapless Navigation among Dynamics with Social-safety-awareness: a reinforcement learning approach from 2D laser scans

OPENALEX - Publications

Jun Jin Nhat M. Nguyen Nazmus Sakib Daniel Graves Hengshuai Yao and 1 more

We consider the problem of mapless collision-avoidance navigation where humans are present using 2D laser scans. Our proposed method uses ego-safety to measure collision from robot's perspective and social-safety impact actions on surrounding pedestrians. Specifically, part predicts intrusion action into interaction area with humans. train policy reinforcement learning a simple simulator directly evaluate learned in Gazebo real robot tests. Experiments show smoothly transferred different...

10.1109/icra40945.2020.9197148 preprint EN 2020-05-01

The Sufficiency of Off-Policyness and Soft Clipping: PPO Is Still Insufficient according to an Off-Policy Measure

OPENALEX - Publications

Xing Chen Dongcui Diao Hechang Chen Hengshuai Yao Haiyin Piao and 5 more

The popular Proximal Policy Optimization (PPO) algorithm approximates the solution in a clipped policy space. Does there exist better policies outside of this space? By using novel surrogate objective that employs sigmoid function (which provides an interesting way exploration), we found answer is "YES", and are fact located very far from We show PPO insufficient "off-policyness", according to off-policy metric called DEON. Our explores much larger space than PPO, it maximizes Conservative...

10.1609/aaai.v37i6.25864 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2023-06-26

QUOTA: The Quantile Option Architecture for Reinforcement Learning

OPENALEX - Publications

Shangtong Zhang Hengshuai Yao

In this paper, we propose the Quantile Option Architecture (QUOTA) for exploration based on recent advances in distributional reinforcement learning (RL). QUOTA, decision making is quantiles of a value distribution, not only mean. QUOTA provides new dimension via use both optimism and pessimism distribution. We demonstrate performance advantage challenging video games physical robot simulators.

10.1609/aaai.v33i01.33015797 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2019-07-17

ACE: An Actor Ensemble Algorithm for Continuous Control with Tree Search

OPENALEX - Publications

Shangtong Zhang Hengshuai Yao

In this paper, we propose an actor ensemble algorithm, named ACE, for continuous control with a deterministic policy in reinforcement learning. use (i.e., multiple actors) to search the global maxima of critic. Besides perspective, also formulate ACE option framework by extending option-critic architecture intra-option policies, revealing relationship between and options. Furthermore, perform look-ahead tree those actors learned value prediction model, resulting refined estimation. We...

10.1609/aaai.v33i01.33015789 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2019-07-17

Pseudo-MDPs and factored linear action models

OPENALEX - Publications

Hengshuai Yao Csaba Szepesvári Bernardo Ávila Pires Xinhua Zhang

In this paper we introduce the concept of pseudo-MDPs to develop abstractions. Pseudo-MDPs relax requirement that transition kernel has be a probability kernel. We show new framework captures many existing also factored linear action models; special case. Again, relation models and works are discussed. use general theory for bounding suboptimality policies derived from pseudo-MDPs. Specializing framework, recover results. give leastsquares approach constrained optimization learning model as...

10.1109/adprl.2014.7010633 article EN 2014-12-01

Negative Log Likelihood Ratio Loss for Deep Neural Network Classification

OPENALEX - Publications

Donglai Zhu Hengshuai Yao Bei Jiang Yu Peng

In deep neural network, the cross-entropy loss function is commonly used for classification. Minimizing equivalent to maximizing likelihood under assumptions of uniform feature and class distributions. It belongs generative training criteria which does not directly discriminate correct from competing classes. We propose a discriminative with negative log ratio between significantly outperforms on CIFAR-10 image classification task.

10.48550/arxiv.1804.10690 preprint EN other-oa arXiv (Cornell University) 2018-01-01

Approximate Policy Iteration with Linear Action Models

OPENALEX - Publications

Hengshuai Yao Csaba Szepesvári

In this paper we consider the problem of finding a good policy given some batch data.We propose new approach, LAM-API, that first builds so-called linear action model (LAM) from data and then uses learned collected in approximate iteration (API) to find policy.A natural choice for evaluation step algorithm is use least-squares temporal difference (LSTD) learning algorithm.Empirical results on three benchmark problems show particular instance LAM-API performs competitively as compared with...

10.1609/aaai.v26i1.8319 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2021-09-20

Preconditioned temporal difference learning

OPENALEX - Publications

Hengshuai Yao Zhiqiang Liu

This paper extends many of the recent popular policy evaluation algorithms to a generalized framework that includes least-squares temporal difference (LSTD) learning, (LSPE) and variant incremental LSTD (iLSTD). The basis this extension is preconditioning technique solves stochastic model equation. also studies three significant issues new framework: it presents rule step-size can be computed online, provides an iterative way apply preconditioning, reduces complexity related near (TD) learning.

10.1145/1390156.1390308 article EN 2008-01-01

Deep Reinforcement Learning with Decorrelation

OPENALEX - Publications

Borislav Mavrin Hengshuai Yao Linglong Kong

Learning an effective representation for high-dimensional data is a challenging problem in reinforcement learning (RL). Deep (DRL) such as Q networks (DQN) achieves remarkable success computer games by deeply encoded from convolution networks. In this paper, we propose simple yet very method with DRL algorithms. Our key insight that features learned algorithms are highly correlated, which interferes learning. By adding regularized loss penalizes correlation latent (with only slight...

10.48550/arxiv.1903.07765 preprint EN other-oa arXiv (Cornell University) 2019-01-01

Hill Climbing on Value Estimates for Search-control in Dyna

OPENALEX - Publications

Yangchen Pan Hengshuai Yao Amir‐massoud Farahmand Martha White

Dyna is an architecture for model based reinforcement learning (RL), where simulated experience from a used to update policies or value functions. A key component of search control, the mechanism generate state and action which agent queries model, remains largely unexplored. In this work, we propose such states by using trajectory obtained Hill Climbing (HC) current estimate function. This has effect propagating high regions preemptively updating estimates that likely visit next. We derive...

10.24963/ijcai.2019/445 preprint EN 2019-07-28

Exploring Neural Architecture Search Space via Deep Deterministic Sampling

OPENALEX - Publications

Keith G. Mills Mohammad Salameh Di Niu Fred X. Han Seyed Saeed Changiz Rezaei and 4 more

Recent developments in Neural Architecture Search (NAS) resort to training the supernet of a predefined search space with weight sharing speed up architecture evaluation. These include random schemes, as well various schemes based on optimization or reinforcement learning, particular policy gradient, that aim optimize parametric distribution and shared model weights simultaneously. In this paper, we focus efficiently exploring important region neural learning. We propose Deep Deterministic...

10.1109/access.2021.3101975 article EN cc-by IEEE Access 2021-01-01

Coming Soon ...