- Advanced Bandit Algorithms Research
- Optimization and Search Problems
- Reinforcement Learning in Robotics
- Machine Learning and Algorithms
- Auction Theory and Applications
- Smart Grid Energy Management
- Data Stream Mining Techniques
- Game Theory and Applications
- Adversarial Robustness in Machine Learning
- Clinical Nutrition and Gastroenterology
- Recommender Systems and Techniques
- Meta-analysis and systematic reviews
- Radiomics and Machine Learning in Medical Imaging
- Occupational Health and Safety Research
- Advanced Wireless Network Optimization
- Technology Adoption and User Behaviour
- Health Systems, Economic Evaluations, Quality of Life
- Stock Market Forecasting Methods
- Clostridium difficile and Clostridium perfringens research
- Risk and Safety Analysis
- Occupational Health and Performance
- BIM and Construction Integration
- Manufacturing Process and Optimization
- Advanced Causal Inference Techniques
- Seismology and Earthquake Studies
Xi'an University of Science and Technology
2022-2024
Wuhan University of Technology
2023
Beijing University of Posts and Telecommunications
2023
Tsinghua University
2018-2022
Zhengzhou University
2009-2020
In this paper, we consider the stochastic multi-armed bandits problem with adversarial corruptions, where random rewards of arms are partially modified by an adversary to fool algorithm. We apply policy gradient algorithm SAMBA setting, and show that it is computationally efficient, achieves a state-of-the-art $O(K\log T/\Delta) + O(C/\Delta)$ regret upper bound, $K$ number arms, $C$ unknown corruption level, $\Delta$ minimum expected reward gap between best arm other ones, $T$ time horizon....
In this paper, we study the application of Thompson sampling (TS) methodology to stochastic combinatorial multi-armed bandit (CMAB) framework. We first analyze standard TS algorithm for general CMAB model when outcome distributions all base arms are independent, and obtain a distribution-dependent regret bound $O(m\log K_{\max}\log T / \Delta_{\min})$, where $m$ is number arms, $K_{\max}$ size largest super arm, $T$ time horizon, $\Delta_{\min}$ minimum gap between expected reward optimal...
Most coal mine accidents are caused by the unsafe behavior of employees. Previous studies have shown that there is a significant connection among working environment, psychological state employees, and behaviors. However, internal biological mechanism has not been revealed. To explore physiological alterations workers underlying mechanisms cause behaviors, current study established novel environment simulation (CEBS) model in mice. This recreated underground workplace facts mines such as...
We propose and study the known-compensation multi-arm bandit (KCMAB) problem, where a system controller offers set of arms to many short-term players for $T$ steps. In each step, one player arrives system. Upon arrival, aims select an arm with current best average reward receives stochastic associated arm. order incentivize explore other arms, provides proper payment compensation players. The objective is maximize total collected by while minimizing compensation. first provide lower bound...
We study the online restless bandit problem, where state of each arm evolves according to a Markov chain, and reward pulling an depends on both pulled current corresponding chain. In this paper, we propose Restless-UCB, learning policy that follows explore-then-commit framework. present novel method construct offline instances, which only requires $O(N)$ time-complexity ($N$ is number arms) exponentially better than complexity existing policy. also prove Restless-UCB achieves regret upper...
We study the multi-armed bandit (MAB) problem with composite and anonymous feedback. In this model, reward of pulling an arm spreads over a period time (we call as interval) player receives partial rewards action, convoluted from other arms, successively. Existing results on model require prior knowledge about interval size input to their algorithms. paper, we propose adaptive algorithms for both stochastic adversarial cases, without requiring any information interval. For case, prove that...
Existing risk-aware multi-armed bandit models typically focus on risk measures of individual options such as variance. As a result, they cannot be directly applied to important real-world online decision making problems with correlated options. In this paper, we propose novel Continuous Mean-Covariance Bandit (CMCB) model explicitly take into account option correlation. Specifically, in CMCB, there is learner who sequentially chooses weight vectors given and observes random feedback...
In this paper, we study the combinatorial semi-bandits (CMAB) and focus on reducing dependency of batch-size $K$ in regret bound, where is total number arms that can be pulled or triggered each round. First, for setting CMAB with probabilistically (CMAB-T), discover a novel (directional) triggering probability variance modulated (TPVM) condition replace previously-used smoothness various applications, such as cascading bandits, online network exploration influence maximization. Under new...
We introduce a novel framework of combinatorial multi-armed bandits (CMAB) with multivariant and probabilistically triggering arms (CMAB-MT), where the outcome each arm is $d$-dimensional random variable feedback follows general process. Compared existing CMAB works, CMAB-MT not only enhances modeling power but also allows improved results by leveraging distinct statistical properties for variables. For CMAB-MT, we propose 1-norm probability-modulated smoothness condition, an optimistic...
Withdrawal Statement The authors have withdrawn their manuscript owing to [Internal Revision]. Therefore, the do not wish this work be cited as reference for project. If you any questions, please contact corresponding author.
The coal mine workplace environment is a significant factor in inducing occupational health issues, such as intestinal dysfunction miners. However, the mechanism by which induces still unclear. Therefore, we applied Coal Mine Workplace Environment Biological Simulation (CEBS) model was previously constructed to detect pathological manifestations and changes gut microbiota of mice from perspectives function, tissue morphology, cell molecules. CEBS showed increased fecal water content,...
We study contextual combinatorial bandits with probabilistically triggered arms (C$^2$MAB-T) under a variety of smoothness conditions that capture wide range applications, such as cascading and influence maximization bandits. Under the triggering probability modulated (TPM) condition, we devise C$^2$-UCB-T algorithm propose novel analysis achieves an $\tilde{O}(d\sqrt{KT})$ regret bound, removing potentially exponentially large factor $O(1/p_{\min})$, where $d$ is dimension contexts,...
In this paper, we study a novel episodic risk-sensitive Reinforcement Learning (RL) problem, named Iterated CVaR RL, which aims to maximize the tail of reward-to-go at each step, and focuses on tightly controlling risk getting into catastrophic situations stage. This formulation is applicable real-world tasks that demand strong avoidance throughout decision process, such as autonomous driving, clinical treatment planning robotics. We investigate two performance metrics under i.e., Regret...
Abstract With the rapid development of social and economic growth Internet, online shopping has become an indispensable part people’s lives, college students a main force in shopping. Although gradually matured, there are still many problems, problems worth discussing. Based on background big data, based questionnaire survey consumption International College Zhengzhou University, this article uses basic statistical analysis methods, correspondence analysis, SPSS software EXCEL software. Use...
In this paper, we study a family of conservative bandit problems (CBPs) with sample-path reward constraints, i.e., the learner's performance must be at least as well given baseline any time. We propose One-Size-Fits-All solution to CBPs and present its applications three encompassed problems, i.e. multi-armed bandits (CMAB), linear (CLB) contextual combinatorial (CCCB). Different from previous works which consider high probability constraints on expected reward, focus constraint actually...
We study a general multi-dueling bandit problem, where an agent compares multiple options simultaneously and aims to minimize the regret due selecting suboptimal arms. This setting generalizes traditional two-dueling problem finds many real-world applications involving subjective feedback on options. start with propose two efficient algorithms, DoublerBAI MultiSBM-Feedback. provides generic schema for translating known results best arm identification algorithms dueling achieves bound of...
Existing methods of combinatorial pure exploration mainly focus on the UCB approach. To make algorithm efficient, they usually use sum upper confidence bounds within arm set $S$ to represent bound $S$, which can be much larger than tight and leads a higher complexity necessary, since empirical means different arms in are independent. deal with this challenge, we explore idea Thompson Sampling (TS) that uses independent random samples instead bounds, design first TS-based TS-Explore for...
The congestion game is a powerful model that encompasses range of engineering systems such as traffic networks and resource allocation. It describes the behavior group agents who share common set $F$ facilities take actions subsets with $k$ facilities. In this work, we study online formulation games, where participate in repeatedly observe feedback randomness. We propose CongestEXP, decentralized algorithm applies classic exponential weights method. By maintaining on facility level, regret...