- Reinforcement Learning in Robotics
- Artificial Intelligence in Games
- Advanced Bandit Algorithms Research
- Adversarial Robustness in Machine Learning
- Evolutionary Algorithms and Applications
- Metaheuristic Optimization Algorithms Research
- Digital Games and Media
- Autonomous Vehicle Technology and Safety
- Robot Manipulation and Learning
- AI-based Problem Solving and Planning
- Explainable Artificial Intelligence (XAI)
- Power System Optimization and Stability
- Software Engineering Research
- Smart Grid Energy Management
- Scheduling and Optimization Algorithms
- Advanced Data Storage Technologies
- Gambling Behavior and Treatments
- Radiation Detection and Scintillator Technologies
- Advanced Memory and Neural Computing
- Bayesian Modeling and Causal Inference
- Artificial Immune Systems Applications
- Multi-Criteria Decision Making
- Scientific Computing and Data Management
- Sustainable Building Design and Assessment
- Advanced Database Systems and Queries
Hangzhou Dianzi University
2024
Tsinghua University
2012-2023
China Electric Power Research Institute
2023
Robert Bosch (Taiwan)
2022
Wuhan Engineering Science & Technology Institute
2022
China University of Mining and Technology
2022
Guangzhou Vocational College of Science and Technology
2022
Intel (United States)
2020
Center for Information Technology
2019
In this paper, we present Tianshou, a highly modularized Python library for deep reinforcement learning (DRL) that uses PyTorch as its backend. Tianshou intends to be research-friendly by providing flexible and reliable infrastructure of DRL algorithms. It supports online offline training with more than 20 classic algorithms through unified interface. To facilitate related research prove Tianshou's reliability, have released benchmark MuJoCo environments, covering eight state-of-the-art...
Reward shaping is one of the most effective methods to tackle crucial yet challenging problem credit assignment in Reinforcement Learning (RL). However, designing functions usually requires much expert knowledge and hand-engineering, difficulties are further exacerbated given multiple similar tasks solve. In this paper, we consider reward on a distribution tasks, propose general meta-learning framework automatically learn efficient newly sampled assuming only shared state space but not...
Learning rational behaviors in First-person-shooter (FPS) games is a challenging task for Reinforcement (RL) with the primary difficulties of huge action space and insufficient exploration. To address this, we propose hierarchical agent based on combined options intrinsic rewards to drive Specifically, present model that works manager-worker fashion over two levels hierarchy. The high-level manager learns policy options, low-level workers, motivated by reward, learn execute options....
Reward shaping is one of the most effective methods to tackle crucial yet challenging problem credit assignment and accelerate Reinforcement Learning. However, designing functions usually requires rich expert knowledge hand-engineering, difficulties are further exacerbated given multiple tasks solve. In this paper, we consider reward on a distribution that share state spaces but not necessarily action spaces. We provide insights into optimal shaping, propose novel meta-learning framework...
Though deep reinforcement learning (DRL) has obtained substantial success, it may encounter catastrophic failures due to the intrinsic uncertainty of both transition and observation. Most existing methods for safe can only handle disturbance or observation since these two kinds affect different parts agent; besides, popular worst-case return lead overly pessimistic policies. To address issues, we first theoretically prove that performance degradation under depends on a novel metric Value...
Counterfactual regret minimization (CFR) is the most popular algorithm on solving two-player zero-sum extensive games with imperfect information and achieves state-of-the-art performance in practice. However, of CFR not fully understood, since empirical results are much better than upper bound proved \cite{zinkevich2008regret}. Another issue that has to traverse whole game tree each round, which time-consuming large scale games. In this paper, we present a novel technique, lazy update, can...
Standard cameras are frame-based sensors that capture the scene at a fixed rate. They cannot provide information between two frames and suffer from motion blur problem in high-speed robotic vision applications. By contrast, event-based novel type of generate asynchronous "events" if intensity changes particular pixel. The data types these fundamentally different. In this paper, we leverage complementarity standard propose fusion strategy for feature tracking. Features extracted frames,...
Agent exploration of reinforcement learning is a necessary way for algorithms to obtain information. In order more exploratory information, some deep even increase the agents. Reinforcement has been successfully applied in many intelligent control fields, however unlimited may bring disastrous consequences agents, there are still concerns that need attention application real world, one which safety issue. The safe approximately enforces constraint conditions each policy update, thus further...
Importance sampling (IS) is a popular technique in off-policy evaluation, which re-weights the return of trajectories replay buffer to boost sample efficiency. However, training with IS can be unstable and previous attempts address this issue mainly focus on analyzing variance IS. In paper, we reveal that instability also related new notion Reuse Bias --- bias evaluation caused by reuse for optimization. We theoretically show optimization current policy data from result an overestimation...
Traditional heuristic optimization algorithms face challenges such as getting trapped in local optima, unstable optimal solutions, and lack of adaptability when solving problems. To address these issues, this paper proposes a algorithm based on the foraging behavior elephant fish populations, called Elephant Fish Foraging Algorithm (EFFA). The EFFA simulates including Levy flights, radar detection, greedy selection, resource competition. By employing swarm intelligence interactive search...
Logical reasoning is a crucial task for Large Language Models (LLMs), enabling them to tackle complex problems. Among tasks, multi-step poses particular challenge. Grounded in the theory of formal logic, we have developed an automated method, Multi-step Deduction (MuseD), deductive data. MuseD has allowed us create training and testing datasets reasoning. Our generation method enables control over complexity generated instructions, facilitating evaluation models across different difficulty...
AlphaZero has achieved superhuman performance on various perfect-information games, such as chess, shogi and Go. However, directly applying to imperfect-information games (IIG) is infeasible, due the fact that traditional MCTS methods cannot handle missing information of other players. Meanwhile, there have been several extensions for IIGs, by implicitly or explicitly sampling a state But, inability private public well, these not satisfactory. In this paper, we extend multiplayer IIGs...
Building energy consumption is increasingly becoming a matter of global concern. A key aspect this the nature building cold source systems and their effectiveness. However, choosing for complex decision-making process. The traditional evaluation method relatively simple, it difficult to comprehensively consider multiple factors mutual influences, even more increase consideration whole life-cycle cost on basis. Especially in China, extra-large public projects invested constructed by...
Combinatorial Optimization (CO) problems have been intensively studied for decades with a wide range of applications. For some classic CO problems, e.g., the Traveling Salesman Problem (TSP), both traditional planning algorithms and emerging reinforcement learning made solid progress in recent years. However, nested sub-tasks, neither end-to-end nor evolutionary methods can obtain satisfactory strategies within limited time computational resources. In this paper, we propose an algorithmic...