Siwei Wang

ORCID: 0000-0003-0764-5592
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Advanced Bandit Algorithms Research
  • Optimization and Search Problems
  • Reinforcement Learning in Robotics
  • Machine Learning and Algorithms
  • Auction Theory and Applications
  • Smart Grid Energy Management
  • Data Stream Mining Techniques
  • Game Theory and Applications
  • Adversarial Robustness in Machine Learning
  • Clinical Nutrition and Gastroenterology
  • Recommender Systems and Techniques
  • Meta-analysis and systematic reviews
  • Radiomics and Machine Learning in Medical Imaging
  • Occupational Health and Safety Research
  • Advanced Wireless Network Optimization
  • Technology Adoption and User Behaviour
  • Health Systems, Economic Evaluations, Quality of Life
  • Stock Market Forecasting Methods
  • Clostridium difficile and Clostridium perfringens research
  • Risk and Safety Analysis
  • Occupational Health and Performance
  • BIM and Construction Integration
  • Manufacturing Process and Optimization
  • Advanced Causal Inference Techniques
  • Seismology and Earthquake Studies

Xi'an University of Science and Technology
2022-2024

Wuhan University of Technology
2023

Beijing University of Posts and Telecommunications
2023

Tsinghua University
2018-2022

Zhengzhou University
2009-2020

In this paper, we consider the stochastic multi-armed bandits problem with adversarial corruptions, where random rewards of arms are partially modified by an adversary to fool algorithm. We apply policy gradient algorithm SAMBA setting, and show that it is computationally efficient, achieves a state-of-the-art $O(K\log T/\Delta) + O(C/\Delta)$ regret upper bound, $K$ number arms, $C$ unknown corruption level, $\Delta$ minimum expected reward gap between best arm other ones, $T$ time horizon....

10.48550/arxiv.2502.14146 preprint EN arXiv (Cornell University) 2025-02-19

In this paper, we study the application of Thompson sampling (TS) methodology to stochastic combinatorial multi-armed bandit (CMAB) framework. We first analyze standard TS algorithm for general CMAB model when outcome distributions all base arms are independent, and obtain a distribution-dependent regret bound $O(m\log K_{\max}\log T / \Delta_{\min})$, where $m$ is number arms, $K_{\max}$ size largest super arm, $T$ time horizon, $\Delta_{\min}$ minimum gap between expected reward optimal...

10.48550/arxiv.1803.04623 preprint EN other-oa arXiv (Cornell University) 2018-01-01

Most coal mine accidents are caused by the unsafe behavior of employees. Previous studies have shown that there is a significant connection among working environment, psychological state employees, and behaviors. However, internal biological mechanism has not been revealed. To explore physiological alterations workers underlying mechanisms cause behaviors, current study established novel environment simulation (CEBS) model in mice. This recreated underground workplace facts mines such as...

10.3389/fnbeh.2022.896545 article EN cc-by Frontiers in Behavioral Neuroscience 2022-06-16

We propose and study the known-compensation multi-arm bandit (KCMAB) problem, where a system controller offers set of arms to many short-term players for $T$ steps. In each step, one player arrives system. Upon arrival, aims select an arm with current best average reward receives stochastic associated arm. order incentivize explore other arms, provides proper payment compensation players. The objective is maximize total collected by while minimizing compensation. first provide lower bound...

10.48550/arxiv.1811.01715 preprint EN other-oa arXiv (Cornell University) 2018-01-01

We study the online restless bandit problem, where state of each arm evolves according to a Markov chain, and reward pulling an depends on both pulled current corresponding chain. In this paper, we propose Restless-UCB, learning policy that follows explore-then-commit framework. present novel method construct offline instances, which only requires $O(N)$ time-complexity ($N$ is number arms) exponentially better than complexity existing policy. also prove Restless-UCB achieves regret upper...

10.48550/arxiv.2011.02664 preprint EN other-oa arXiv (Cornell University) 2020-01-01

We study the multi-armed bandit (MAB) problem with composite and anonymous feedback. In this model, reward of pulling an arm spreads over a period time (we call as interval) player receives partial rewards action, convoluted from other arms, successively. Existing results on model require prior knowledge about interval size input to their algorithms. paper, we propose adaptive algorithms for both stochastic adversarial cases, without requiring any information interval. For case, prove that...

10.1609/aaai.v35i11.17224 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2021-05-18

Existing risk-aware multi-armed bandit models typically focus on risk measures of individual options such as variance. As a result, they cannot be directly applied to important real-world online decision making problems with correlated options. In this paper, we propose novel Continuous Mean-Covariance Bandit (CMCB) model explicitly take into account option correlation. Specifically, in CMCB, there is learner who sequentially chooses weight vectors given and observes random feedback...

10.48550/arxiv.2102.12090 preprint EN cc-by arXiv (Cornell University) 2021-01-01

In this paper, we study the combinatorial semi-bandits (CMAB) and focus on reducing dependency of batch-size $K$ in regret bound, where is total number arms that can be pulled or triggered each round. First, for setting CMAB with probabilistically (CMAB-T), discover a novel (directional) triggering probability variance modulated (TPVM) condition replace previously-used smoothness various applications, such as cascading bandits, online network exploration influence maximization. Under new...

10.48550/arxiv.2208.14837 preprint EN other-oa arXiv (Cornell University) 2022-01-01

We introduce a novel framework of combinatorial multi-armed bandits (CMAB) with multivariant and probabilistically triggering arms (CMAB-MT), where the outcome each arm is $d$-dimensional random variable feedback follows general process. Compared existing CMAB works, CMAB-MT not only enhances modeling power but also allows improved results by leveraging distinct statistical properties for variables. For CMAB-MT, we propose 1-norm probability-modulated smoothness condition, an optimistic...

10.48550/arxiv.2406.01386 preprint EN arXiv (Cornell University) 2024-06-03

Withdrawal Statement The authors have withdrawn their manuscript owing to [Internal Revision]. Therefore, the do not wish this work be cited as reference for project. If you any questions, please contact corresponding author.

10.1101/2024.07.03.24309897 preprint EN medRxiv (Cold Spring Harbor Laboratory) 2024-07-04

The coal mine workplace environment is a significant factor in inducing occupational health issues, such as intestinal dysfunction miners. However, the mechanism by which induces still unclear. Therefore, we applied Coal Mine Workplace Environment Biological Simulation (CEBS) model was previously constructed to detect pathological manifestations and changes gut microbiota of mice from perspectives function, tissue morphology, cell molecules. CEBS showed increased fecal water content,...

10.3389/fmicb.2024.1453798 article EN cc-by Frontiers in Microbiology 2024-12-11

We study contextual combinatorial bandits with probabilistically triggered arms (C$^2$MAB-T) under a variety of smoothness conditions that capture wide range applications, such as cascading and influence maximization bandits. Under the triggering probability modulated (TPM) condition, we devise C$^2$-UCB-T algorithm propose novel analysis achieves an $\tilde{O}(d\sqrt{KT})$ regret bound, removing potentially exponentially large factor $O(1/p_{\min})$, where $d$ is dimension contexts,...

10.48550/arxiv.2303.17110 preprint EN other-oa arXiv (Cornell University) 2023-01-01

In this paper, we study a novel episodic risk-sensitive Reinforcement Learning (RL) problem, named Iterated CVaR RL, which aims to maximize the tail of reward-to-go at each step, and focuses on tightly controlling risk getting into catastrophic situations stage. This formulation is applicable real-world tasks that demand strong avoidance throughout decision process, such as autonomous driving, clinical treatment planning robotics. We investigate two performance metrics under i.e., Regret...

10.48550/arxiv.2206.02678 preprint EN cc-by arXiv (Cornell University) 2022-01-01

Abstract With the rapid development of social and economic growth Internet, online shopping has become an indispensable part people’s lives, college students a main force in shopping. Although gradually matured, there are still many problems, problems worth discussing. Based on background big data, based questionnaire survey consumption International College Zhengzhou University, this article uses basic statistical analysis methods, correspondence analysis, SPSS software EXCEL software. Use...

10.1088/1742-6596/1616/1/012009 article EN Journal of Physics Conference Series 2020-08-01

In this paper, we study a family of conservative bandit problems (CBPs) with sample-path reward constraints, i.e., the learner's performance must be at least as well given baseline any time. We propose One-Size-Fits-All solution to CBPs and present its applications three encompassed problems, i.e. multi-armed bandits (CMAB), linear (CLB) contextual combinatorial (CCCB). Different from previous works which consider high probability constraints on expected reward, focus constraint actually...

10.1609/aaai.v35i8.16891 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2021-05-18

We study a general multi-dueling bandit problem, where an agent compares multiple options simultaneously and aims to minimize the regret due selecting suboptimal arms. This setting generalizes traditional two-dueling problem finds many real-world applications involving subjective feedback on options. start with propose two efficient algorithms, DoublerBAI MultiSBM-Feedback. provides generic schema for translating known results best arm identification algorithms dueling achieves bound of...

10.48550/arxiv.2211.10293 preprint EN cc-by arXiv (Cornell University) 2022-01-01

Existing methods of combinatorial pure exploration mainly focus on the UCB approach. To make algorithm efficient, they usually use sum upper confidence bounds within arm set $S$ to represent bound $S$, which can be much larger than tight and leads a higher complexity necessary, since empirical means different arms in are independent. deal with this challenge, we explore idea Thompson Sampling (TS) that uses independent random samples instead bounds, design first TS-based TS-Explore for...

10.48550/arxiv.2206.09150 preprint EN cc-by arXiv (Cornell University) 2022-01-01

The congestion game is a powerful model that encompasses range of engineering systems such as traffic networks and resource allocation. It describes the behavior group agents who share common set $F$ facilities take actions subsets with $k$ facilities. In this work, we study online formulation games, where participate in repeatedly observe feedback randomness. We propose CongestEXP, decentralized algorithm applies classic exponential weights method. By maintaining on facility level, regret...

10.48550/arxiv.2306.13673 preprint EN cc-by arXiv (Cornell University) 2023-01-01
Coming Soon ...