NFDI4DS | UHH-SEMS - Publication Details

Siwei Wang

ORCID: 0000-0003-0764-5592

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5100656647

Research Areas

Advanced Bandit Algorithms Research
Optimization and Search Problems
Reinforcement Learning in Robotics
Machine Learning and Algorithms
Auction Theory and Applications
Smart Grid Energy Management
Data Stream Mining Techniques
Game Theory and Applications
Adversarial Robustness in Machine Learning
Clinical Nutrition and Gastroenterology
Recommender Systems and Techniques
Meta-analysis and systematic reviews
Radiomics and Machine Learning in Medical Imaging
Occupational Health and Safety Research
Advanced Wireless Network Optimization
Technology Adoption and User Behaviour
Health Systems, Economic Evaluations, Quality of Life
Stock Market Forecasting Methods
Clostridium difficile and Clostridium perfringens research
Risk and Safety Analysis
Occupational Health and Performance
BIM and Construction Integration
Manufacturing Process and Optimization
Advanced Causal Inference Techniques
Seismology and Earthquake Studies

Xi'an University of Science and Technology
2022-2024

Wuhan University of Technology
2023

Beijing University of Posts and Telecommunications
2023

Tsinghua University
2018-2022

Zhengzhou University
2009-2020

Efficient and Optimal Policy Gradient Algorithm for Corrupted Multi-armed Bandits

OPENALEX - Publications

Jiayuan Liu Siwei Wang Zhixuan Fang

In this paper, we consider the stochastic multi-armed bandits problem with adversarial corruptions, where random rewards of arms are partially modified by an adversary to fool algorithm. We apply policy gradient algorithm SAMBA setting, and show that it is computationally efficient, achieves a state-of-the-art $O(K\log T/\Delta) + O(C/\Delta)$ regret upper bound, $K$ number arms, $C$ unknown corruption level, $\Delta$ minimum expected reward gap between best arm other ones, $T$ time horizon....

10.48550/arxiv.2502.14146 preprint EN arXiv (Cornell University) 2025-02-19

Thompson Sampling for Combinatorial Semi-Bandits

OPENALEX - Publications

Siwei Wang Wei Chen

In this paper, we study the application of Thompson sampling (TS) methodology to stochastic combinatorial multi-armed bandit (CMAB) framework. We first analyze standard TS algorithm for general CMAB model when outcome distributions all base arms are independent, and obtain a distribution-dependent regret bound $O(m\log K_{\max}\log T / \Delta_{\min})$, where $m$ is number arms, $K_{\max}$ size largest super arm, $T$ time horizon, $\Delta_{\min}$ minimum gap between expected reward optimal...

10.48550/arxiv.1803.04623 preprint EN other-oa arXiv (Cornell University) 2018-01-01

The Impacts of Workplace Environment on Coal Miners’ Emotion and Cognition Depicted in a Mouse Model

OPENALEX - Publications

Lei Li Siwei Wang Lu Huang Mei Zhi Qing Cai and 4 more

Most coal mine accidents are caused by the unsafe behavior of employees. Previous studies have shown that there is a significant connection among working environment, psychological state employees, and behaviors. However, internal biological mechanism has not been revealed. To explore physiological alterations workers underlying mechanisms cause behaviors, current study established novel environment simulation (CEBS) model in mice. This recreated underground workplace facts mines such as...

10.3389/fnbeh.2022.896545 article EN cc-by Frontiers in Behavioral Neuroscience 2022-06-16

Multi-armed Bandits with Compensation

OPENALEX - Publications

Siwei Wang Longbo Huang

We propose and study the known-compensation multi-arm bandit (KCMAB) problem, where a system controller offers set of arms to many short-term players for $T$ steps. In each step, one player arrives system. Upon arrival, aims select an arm with current best average reward receives stochastic associated arm. order incentivize explore other arms, provides proper payment compensation players. The objective is maximize total collected by while minimizing compensation. first provide lower bound...

10.48550/arxiv.1811.01715 preprint EN other-oa arXiv (Cornell University) 2018-01-01

Restless-UCB, an Efficient and Low-complexity Algorithm for Online Restless Bandits

OPENALEX - Publications

Siwei Wang Longbo Huang John C. S. Lui

We study the online restless bandit problem, where state of each arm evolves according to a Markov chain, and reward pulling an depends on both pulled current corresponding chain. In this paper, we propose Restless-UCB, learning policy that follows explore-then-commit framework. present novel method construct offline instances, which only requires $O(N)$ time-complexity ($N$ is number arms) exponentially better than complexity existing policy. also prove Restless-UCB achieves regret upper...

10.48550/arxiv.2011.02664 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Adaptive Algorithms for Multi-armed Bandit with Composite and Anonymous Feedback

OPENALEX - Publications

Siwei Wang Haoyun Wang Longbo Huang

We study the multi-armed bandit (MAB) problem with composite and anonymous feedback. In this model, reward of pulling an arm spreads over a period time (we call as interval) player receives partial rewards action, convoluted from other arms, successively. Existing results on model require prior knowledge about interval size input to their algorithms. paper, we propose adaptive algorithms for both stochastic adversarial cases, without requiring any information interval. For case, prove that...

10.1609/aaai.v35i11.17224 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2021-05-18

Continuous Mean-Covariance Bandits

OPENALEX - Publications

Yihan Du Siwei Wang Zhixuan Fang Longbo Huang

Existing risk-aware multi-armed bandit models typically focus on risk measures of individual options such as variance. As a result, they cannot be directly applied to important real-world online decision making problems with correlated options. In this paper, we propose novel Continuous Mean-Covariance Bandit (CMCB) model explicitly take into account option correlation. Specifically, in CMCB, there is learner who sequentially chooses weight vectors given and observes random feedback...

10.48550/arxiv.2102.12090 preprint EN cc-by arXiv (Cornell University) 2021-01-01

Batch-Size Independent Regret Bounds for Combinatorial Semi-Bandits with Probabilistically Triggered Arms or Independent Arms

OPENALEX - Publications

Xutong Liu Jinhang Zuo Siwei Wang Carlee Joe‐Wong John C. S. Lui and 1 more

In this paper, we study the combinatorial semi-bandits (CMAB) and focus on reducing dependency of batch-size $K$ in regret bound, where is total number arms that can be pulled or triggered each round. First, for setting CMAB with probabilistically (CMAB-T), discover a novel (directional) triggering probability variance modulated (TPVM) condition replace previously-used smoothness various applications, such as cascading bandits, online network exploration influence maximization. Under new...

10.48550/arxiv.2208.14837 preprint EN other-oa arXiv (Cornell University) 2022-01-01

Combinatorial Multivariant Multi-Armed Bandits with Applications to Episodic Reinforcement Learning and Beyond

OPENALEX - Publications

Xutong Liu Siwei Wang Jinhang Zuo Han Zhong Xuchuang Wang and 5 more

We introduce a novel framework of combinatorial multi-armed bandits (CMAB) with multivariant and probabilistically triggering arms (CMAB-MT), where the outcome each arm is $d$-dimensional random variable feedback follows general process. Compared existing CMAB works, CMAB-MT not only enhances modeling power but also allows improved results by leveraging distinct statistical properties for variables. For CMAB-MT, we propose 1-norm probability-modulated smoothness condition, an optimistic...

10.48550/arxiv.2406.01386 preprint EN arXiv (Cornell University) 2024-06-03

WITHDRAWN: AID-SLR: A Generative Artificial Intelligence-Driven Automated System for Systematic Literature Review

OPENALEX - Publications

Kyeryoung Lee Surabhi Datta Hunki Paek Majid Rastegar-Mojarad Liang‐Chin Huang and 4 more

Withdrawal Statement The authors have withdrawn their manuscript owing to [Internal Revision]. Therefore, the do not wish this work be cited as reference for project. If you any questions, please contact corresponding author.

10.1101/2024.07.03.24309897 preprint EN medRxiv (Cold Spring Harbor Laboratory) 2024-07-04

The effect of workplace environment on coal miners' gut microbiota in a mouse model

OPENALEX - Publications

Lei Li Mei Zhi Siwei Wang Jun Deng Qing Cai and 1 more

The coal mine workplace environment is a significant factor in inducing occupational health issues, such as intestinal dysfunction miners. However, the mechanism by which induces still unclear. Therefore, we applied Coal Mine Workplace Environment Biological Simulation (CEBS) model was previously constructed to detect pathological manifestations and changes gut microbiota of mice from perspectives function, tissue morphology, cell molecules. CEBS showed increased fecal water content,...

10.3389/fmicb.2024.1453798 article EN cc-by Frontiers in Microbiology 2024-12-11

Contextual Combinatorial Bandits with Probabilistically Triggered Arms

OPENALEX - Publications

Xutong Liu Jinhang Zuo Siwei Wang John C. S. Lui Mohammad Hajiesmaili and 2 more

We study contextual combinatorial bandits with probabilistically triggered arms (C$^2$MAB-T) under a variety of smoothness conditions that capture wide range applications, such as cascading and influence maximization bandits. Under the triggering probability modulated (TPM) condition, we devise C$^2$-UCB-T algorithm propose novel analysis achieves an $\tilde{O}(d\sqrt{KT})$ regret bound, removing potentially exponentially large factor $O(1/p_{\min})$, where $d$ is dimension contexts,...

10.48550/arxiv.2303.17110 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Provably Efficient Risk-Sensitive Reinforcement Learning: Iterated CVaR and Worst Path

OPENALEX - Publications

Yihan Du Siwei Wang Longbo Huang

In this paper, we study a novel episodic risk-sensitive Reinforcement Learning (RL) problem, named Iterated CVaR RL, which aims to maximize the tail of reward-to-go at each step, and focuses on tightly controlling risk getting into catastrophic situations stage. This formulation is applicable real-world tasks that demand strong avoidance throughout decision process, such as autonomous driving, clinical treatment planning robotics. We investigate two performance metrics under i.e., Regret...

10.48550/arxiv.2206.02678 preprint EN cc-by arXiv (Cornell University) 2022-01-01

Investigation and Study on Students’ Online Shopping Consumption under the Background of Big Data

OPENALEX - Publications

Chun-rui Tao Siwei Wang

Abstract With the rapid development of social and economic growth Internet, online shopping has become an indispensable part people’s lives, college students a main force in shopping. Although gradually matured, there are still many problems, problems worth discussing. Based on background big data, based questionnaire survey consumption International College Zhengzhou University, this article uses basic statistical analysis methods, correspondence analysis, SPSS software EXCEL software. Use...

10.1088/1742-6596/1616/1/012009 article EN Journal of Physics Conference Series 2020-08-01

A One-Size-Fits-All Solution to Conservative Bandit Problems

OPENALEX - Publications

Yihan Du Siwei Wang Longbo Huang

In this paper, we study a family of conservative bandit problems (CBPs) with sample-path reward constraints, i.e., the learner's performance must be at least as well given baseline any time. We propose One-Size-Fits-All solution to CBPs and present its applications three encompassed problems, i.e. multi-armed bandits (CMAB), linear (CLB) contextual combinatorial (CCCB). Different from previous works which consider high probability constraints on expected reward, focus constraint actually...

10.1609/aaai.v35i8.16891 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2021-05-18

Dueling Bandits: From Two-dueling to Multi-dueling

OPENALEX - Publications

Yihan Du Siwei Wang Longbo Huang

We study a general multi-dueling bandit problem, where an agent compares multiple options simultaneously and aims to minimize the regret due selecting suboptimal arms. This setting generalizes traditional two-dueling problem finds many real-world applications involving subjective feedback on options. start with propose two efficient algorithms, DoublerBAI MultiSBM-Feedback. provides generic schema for translating known results best arm identification algorithms dueling achieves bound of...

10.48550/arxiv.2211.10293 preprint EN cc-by arXiv (Cornell University) 2022-01-01

Thompson Sampling for (Combinatorial) Pure Exploration

OPENALEX - Publications

Siwei Wang Jun Zhu

Existing methods of combinatorial pure exploration mainly focus on the UCB approach. To make algorithm efficient, they usually use sum upper confidence bounds within arm set $S$ to represent bound $S$, which can be much larger than tight and leads a higher complexity necessary, since empirical means different arms in are independent. deal with this challenge, we explore idea Thompson Sampling (TS) that uses independent random samples instead bounds, design first TS-based TS-Explore for...

10.48550/arxiv.2206.09150 preprint EN cc-by arXiv (Cornell University) 2022-01-01

The pure exploration problem with general reward functions depending on full distributions

OPENALEX - Publications

Siwei Wang Wei Chen

10.1007/s10994-022-06214-8 article EN Machine Learning 2022-07-18

Taming the Exponential Action Set: Sublinear Regret and Fast Convergence to Nash Equilibrium in Online Congestion Games

OPENALEX - Publications

Jing Dong Jingyu Wu Siwei Wang Baoxiang Wang Wei Chen

The congestion game is a powerful model that encompasses range of engineering systems such as traffic networks and resource allocation. It describes the behavior group agents who share common set $F$ facilities take actions subsets with $k$ facilities. In this work, we study online formulation games, where participate in repeatedly observe feedback randomness. We propose CongestEXP, decentralized algorithm applies classic exponential weights method. By maintaining on facility level, regret...

10.48550/arxiv.2306.13673 preprint EN cc-by arXiv (Cornell University) 2023-01-01

Coming Soon ...