NFDI4DS | UHH-SEMS - Publication Details

Dong Yan

ORCID: 0000-0003-4549-9469

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5080233678

Research Areas

Reinforcement Learning in Robotics
Artificial Intelligence in Games
Advanced Bandit Algorithms Research
Adversarial Robustness in Machine Learning
Evolutionary Algorithms and Applications
Metaheuristic Optimization Algorithms Research
Digital Games and Media
Autonomous Vehicle Technology and Safety
Robot Manipulation and Learning
AI-based Problem Solving and Planning
Explainable Artificial Intelligence (XAI)
Power System Optimization and Stability
Software Engineering Research
Smart Grid Energy Management
Scheduling and Optimization Algorithms
Advanced Data Storage Technologies
Gambling Behavior and Treatments
Radiation Detection and Scintillator Technologies
Advanced Memory and Neural Computing
Bayesian Modeling and Causal Inference
Artificial Immune Systems Applications
Multi-Criteria Decision Making
Scientific Computing and Data Management
Sustainable Building Design and Assessment
Advanced Database Systems and Queries

Hangzhou Dianzi University
2024

Tsinghua University
2012-2023

China Electric Power Research Institute
2023

Robert Bosch (Taiwan)
2022

Wuhan Engineering Science & Technology Institute
2022

China University of Mining and Technology
2022

Guangzhou Vocational College of Science and Technology
2022

Intel (United States)
2020

Center for Information Technology
2019

Tianshou: a Highly Modularized Deep Reinforcement Learning Library

OPENALEX - Publications

Jiayi Weng Huayu Chen Dong Yan Kaichao You Alexis Duburcq and 3 more

In this paper, we present Tianshou, a highly modularized Python library for deep reinforcement learning (DRL) that uses PyTorch as its backend. Tianshou intends to be research-friendly by providing flexible and reliable infrastructure of DRL algorithms. It supports online offline training with more than 20 classic algorithms through unified interface. To facilitate related research prove Tianshou's reliability, have released benchmark MuJoCo environments, covering eight state-of-the-art...

10.48550/arxiv.2107.14171 preprint EN other-oa arXiv (Cornell University) 2021-01-01

Reward Shaping via Meta-Learning

OPENALEX - Publications

Haosheng Zou Tongzheng Ren Dong Yan Hang Su Jun Zhu

Reward shaping is one of the most effective methods to tackle crucial yet challenging problem credit assignment in Reinforcement Learning (RL). However, designing functions usually requires much expert knowledge and hand-engineering, difficulties are further exacerbated given multiple similar tasks solve. In this paper, we consider reward on a distribution tasks, propose general meta-learning framework automatically learn efficient newly sampled assuming only shared state space but not...

10.48550/arxiv.1901.09330 preprint EN other-oa arXiv (Cornell University) 2019-01-01

Playing FPS Games With Environment-Aware Hierarchical Reinforcement Learning

OPENALEX - Publications

Shihong Song Jiayi Weng Hang Su Dong Yan Haosheng Zou and 1 more

Learning rational behaviors in First-person-shooter (FPS) games is a challenging task for Reinforcement (RL) with the primary difficulties of huge action space and insufficient exploration. To address this, we propose hierarchical agent based on combined options intrinsic rewards to drive Specifically, present model that works manager-worker fashion over two levels hierarchy. The high-level manager learns policy options, low-level workers, motivated by reward, learn execute options....

10.24963/ijcai.2019/482 article EN 2019-07-28

Deep reinforcement learning with credit assignment for combinatorial optimization

OPENALEX - Publications

Dong Yan Jiayi Weng Shiyu Huang Chongxuan Li Yichi Zhou and 2 more

10.1016/j.patcog.2021.108466 article EN Pattern Recognition 2021-11-27

A Novel Bio-Inspired Evolution Algorithm for Nonlinear Optimization Applied to Heterogeneous Unmanned Systems

OPENALEX - Publications

L. Gong Dong Yan Wenyan Gong Dechao Chen

10.2139/ssrn.5276236 preprint EN 2025-01-01

Learning Task-Distribution Reward Shaping with Meta-Learning

OPENALEX - Publications

Haosheng Zou Tongzheng Ren Dong Yan Hang Su Jun Zhu

Reward shaping is one of the most effective methods to tackle crucial yet challenging problem credit assignment and accelerate Reinforcement Learning. However, designing functions usually requires rich expert knowledge hand-engineering, difficulties are further exacerbated given multiple tasks solve. In this paper, we consider reward on a distribution that share state spaces but not necessarily action spaces. We provide insights into optimal shaping, propose novel meta-learning framework...

10.1609/aaai.v35i12.17337 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2021-05-18

Towards Safe Reinforcement Learning via Constraining Conditional Value-at-Risk

OPENALEX - Publications

Chengyang Ying Xinning Zhou Hang Su Dong Yan Ning Chen and 1 more

Though deep reinforcement learning (DRL) has obtained substantial success, it may encounter catastrophic failures due to the intrinsic uncertainty of both transition and observation. Most existing methods for safe can only handle disturbance or observation since these two kinds affect different parts agent; besides, popular worst-case return lead overly pessimistic policies. To address issues, we first theoretically prove that performance degradation under depends on a novel metric Value...

10.24963/ijcai.2022/510 article EN Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence 2022-07-01

Lazy-CFR: fast and near optimal regret minimization for extensive games with imperfect information

OPENALEX - Publications

Yichi Zhou Tongzheng Ren Jialian Li Dong Yan Jun Zhu

Counterfactual regret minimization (CFR) is the most popular algorithm on solving two-player zero-sum extensive games with imperfect information and achieves state-of-the-art performance in practice. However, of CFR not fully understood, since empirical results are much better than upper bound proved \cite{zinkevich2008regret}. Another issue that has to traverse whole game tree each round, which time-consuming large scale games. In this paper, we present a novel technique, lazy update, can...

10.48550/arxiv.1810.04433 preprint EN other-oa arXiv (Cornell University) 2018-01-01

Advanced graph model for tainted variable tracking

OPENALEX - Publications

Chao Ma Dong Yan Yu-Ping Wang Shi‐Min Hu

10.1007/s11432-012-4674-8 article EN Science China Information Sciences 2012-09-28

Standard and Event Cameras Fusion for Feature Tracking

OPENALEX - Publications

Dong Yan Tao Zhang

Standard cameras are frame-based sensors that capture the scene at a fixed rate. They cannot provide information between two frames and suffer from motion blur problem in high-speed robotic vision applications. By contrast, event-based novel type of generate asynchronous "events" if intensity changes particular pixel. The data types these fundamentally different. In this paper, we leverage complementarity standard propose fusion strategy for feature tracking. Features extracted frames,...

10.1145/3459066.3459075 article EN 2021-02-20

Research and Application of Safe Reinforcement Learning in Power System

OPENALEX - Publications

Jian Li Xinying Wang Sheng Chen Dong Yan

Agent exploration of reinforcement learning is a necessary way for algorithms to obtain information. In order more exploratory information, some deep even increase the agents. Reinforcement has been successfully applied in many intelligent control fields, however unlimited may bring disastrous consequences agents, there are still concerns that need attention application real world, one which safety issue. The safe approximately enforces constraint conditions each policy update, thus further...

10.1109/acpee56931.2023.10135995 article EN 2022 7th Asia Conference on Power and Electrical Engineering (ACPEE) 2023-04-01

On the Reuse Bias in Off-Policy Reinforcement Learning

OPENALEX - Publications

Chengyang Ying Zhongkai Hao Xinning Zhou Hang Su Dong Yan and 1 more

Importance sampling (IS) is a popular technique in off-policy evaluation, which re-weights the return of trajectories replay buffer to boost sample efficiency. However, training with IS can be unstable and previous attempts address this issue mainly focus on analyzing variance IS. In paper, we reveal that instability also related new notion Reuse Bias --- bias evaluation caused by reuse for optimization. We theoretically show optimization current policy data from result an overestimation...

10.24963/ijcai.2023/502 article EN 2023-08-01

A New Bio-Inspired Method for Nonlinear Optimization: Elephantnose Fish Foraging Algorithm

OPENALEX - Publications

Dechao Chen Chentong Shi Dong Yan Shuai Li Wenyan Gong

Traditional heuristic optimization algorithms face challenges such as getting trapped in local optima, unstable optimal solutions, and lack of adaptability when solving problems. To address these issues, this paper proposes a algorithm based on the foraging behavior elephant fish populations, called Elephant Fish Foraging Algorithm (EFFA). The EFFA simulates including Levy flights, radar detection, greedy selection, resource competition. By employing swarm intelligence interactive search...

10.2139/ssrn.4848594 preprint EN 2024-01-01

Boosting Deductive Reasoning with Step Signals In RLHF

OPENALEX - Publications

Jialian Li Yipin Zhang Wei Shen Y.J. Yan Jian Xie and 1 more

Logical reasoning is a crucial task for Large Language Models (LLMs), enabling them to tackle complex problems. Among tasks, multi-step poses particular challenge. Grounded in the theory of formal logic, we have developed an automated method, Multi-step Deduction (MuseD), deductive data. MuseD has allowed us create training and testing datasets reasoning. Our generation method enables control over complexity generated instructions, facilitating evaluation models across different difficulty...

10.48550/arxiv.2410.09528 preprint EN arXiv (Cornell University) 2024-10-12

Combining Tree Search and Action Prediction for State-of-the-Art Performance in DouDiZhu

OPENALEX - Publications

Yunsheng Zhang Dong Yan Bei Shi Haobo Fu Qiang Fu and 3 more

AlphaZero has achieved superhuman performance on various perfect-information games, such as chess, shogi and Go. However, directly applying to imperfect-information games (IIG) is infeasible, due the fact that traditional MCTS methods cannot handle missing information of other players. Meanwhile, there have been several extensions for IIGs, by implicitly or explicitly sampling a state But, inability private public well, these not satisfactory. In this paper, we extend multiplayer IIGs...

10.24963/ijcai.2021/470 article EN 2021-08-01

A Decision Support System for Optimal Building Cold Source Selection

OPENALEX - Publications

Qing Li Dong Yan Rong-Guang Cao Le Li Zhibin Chen

Building energy consumption is increasingly becoming a matter of global concern. A key aspect this the nature building cold source systems and their effectiveness. However, choosing for complex decision-making process. The traditional evaluation method relatively simple, it difficult to comprehensively consider multiple factors mutual influences, even more increase consideration whole life-cycle cost on basis. Especially in China, extra-large public projects invested constructed by...

10.1155/2022/5605477 article EN cc-by Shock and Vibration 2022-05-18

Bridging Reinforcement Learning and Planning to Solve Combinatorial Optimization Problems with Nested Sub-Tasks

OPENALEX - Publications

Xiaohan Shan Pengjiu Wang M. Wan Dong Yan Jialian Li and 1 more

Combinatorial Optimization (CO) problems have been intensively studied for decades with a wide range of applications. For some classic CO problems, e.g., the Traveling Salesman Problem (TSP), both traditional planning algorithms and emerging reinforcement learning made solid progress in recent years. However, nested sub-tasks, neither end-to-end nor evolutionary methods can obtain satisfactory strategies within limited time computational resources. In this paper, we propose an algorithmic...

10.26599/air.2023.9150025 article EN cc-by CAAI Artificial Intelligence Research 2023-12-01

Coming Soon ...