NFDI4DS | UHH-SEMS - Publication Details

Zheng Wen

ORCID: 0000-0001-6987-6463

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5067555437

Research Areas

Advanced Bandit Algorithms Research
Machine Learning and Algorithms
Reinforcement Learning in Robotics
Optimization and Search Problems
Auction Theory and Applications
Data Stream Mining Techniques
Smart Grid Energy Management
Supply Chain and Inventory Management
Adversarial Robustness in Machine Learning
Multimodal Machine Learning Applications
Consumer Market Behavior and Pricing
Gaussian Processes and Bayesian Inference
Domain Adaptation and Few-Shot Learning
Power Systems and Renewable Energy
Recommender Systems and Techniques
Mobile Crowdsensing and Crowdsourcing
Advanced Algorithms and Applications
Explainable Artificial Intelligence (XAI)
Advanced Control Systems Optimization
Game Theory and Applications
Topic Modeling
Optical measurement and interference techniques
Distributed Sensor Networks and Detection Algorithms
Transportation and Mobility Innovations
Generative Adversarial Networks and Image Synthesis

Jiangsu University
2023

DeepMind (United Kingdom)
2019-2023

Adobe Systems (United States)
2016-2021

Google (United States)
2019-2021

Yunnan University
2018

Northeastern University
2007-2017

CRRC (China)
2016

Statistics Belgium
2016

Shanghai University
2016

Yahoo (United States)
2014-2015

A Tutorial on Thompson Sampling

OPENALEX - Publications

Daniel Russo Benjamin Van Roy Abbas Kazerouni Ian Osband Zheng Wen

10.1561/2200000070 article EN Foundations and Trends® in Machine Learning 2018-01-01

Optimal Demand Response Using Device-Based Reinforcement Learning

OPENALEX - Publications

Zheng Wen Daniel O’Neill Hamid Reza Maei

Demand response (DR) for residential and small commercial buildings is estimated to account as much 65% of the total energy savings potential DR, previous work shows that a fully automated management system (EMS) necessary prerequisite DR in these areas. In this paper, we propose novel EMS formulation problems sectors. Specifically, formulate EMS's rescheduling problem reinforcement learning (RL) problem, argue RL can be approximately solved by decomposing it over device clusters. Compared...

10.1109/tsg.2015.2396993 article EN IEEE Transactions on Smart Grid 2015-02-16

Generalization and Exploration via Randomized Value Functions

OPENALEX - Publications

Ian Osband Benjamin Van Roy Zheng Wen

We propose randomized least-squares value iteration (RLSVI) -- a new reinforcement learning algorithm designed to explore and generalize efficiently via linearly parameterized functions. explain why versions of that use Boltzmann or epsilon-greedy exploration can be highly inefficient, we present computational results demonstrate dramatic efficiency gains enjoyed by RLSVI. Further, establish an upper bound on the expected regret RLSVI demonstrates near-optimality in tabula rasa context. More...

10.48550/arxiv.1402.0635 preprint EN other-oa arXiv (Cornell University) 2014-01-01

Cascading Bandits: Learning to Rank in the Cascade Model

OPENALEX - Publications

Branislav Kveton Csaba Szepesvári Zheng Wen Azin Ashkan

A search engine usually outputs a list of $K$ web pages. The user examines this list, from the first page to last, and chooses attractive page. This model behavior is known as cascade model. In paper, we propose cascading bandits, learning variant where objective identify most items. We formulate our problem stochastic combinatorial partial monitoring problem. two algorithms for solving it, CascadeUCB1 CascadeKL-UCB. also prove gap-dependent upper bounds on regret these derive lower bound in...

10.48550/arxiv.1502.02763 preprint EN other-oa arXiv (Cornell University) 2015-01-01

Tight Regret Bounds for Stochastic Combinatorial Semi-Bandits

OPENALEX - Publications

Branislav Kveton Zheng Wen Azin Ashkan Csaba Szepesvári

A stochastic combinatorial semi-bandit is an online learning problem where at each step a agent chooses subset of ground items subject to constraints, and then observes weights these receives their sum as payoff. In this paper, we close the computationally sample efficient in semi-bandits. particular, analyze UCB-like algorithm for solving problem, which known be efficient; prove $O(K L (1 / Δ) \log n)$ $O(\sqrt{K n n})$ upper bounds on its $n$-step regret, $L$ number items, $K$ maximum...

10.48550/arxiv.1410.0949 preprint EN other-oa arXiv (Cornell University) 2014-01-01

Deep Exploration via Randomized Value Functions

OPENALEX - Publications

Ian Osband Benjamin Van Roy Daniel Russo Zheng Wen

We study the use of randomized value functions to guide deep exploration in reinforcement learning. This offers an elegant means for synthesizing statistically and computationally efficient with common practical approaches function present several learning algorithms that leverage demonstrate their efficacy through computational studies. also prove a regret bound establishes statistical efficiency tabular representation.

10.48550/arxiv.1703.07608 preprint EN other-oa arXiv (Cornell University) 2017-01-01

Best Policy Learning from Trajectory Preference Feedback

OPENALEX - Publications

Akhil Agnihotri Rahul Jain Deepak Ramachandran Zheng Wen

We address the problem of best policy identification in preference-based reinforcement learning (PbRL), where occurs from noisy binary preferences over trajectory pairs rather than explicit numerical rewards. This approach is useful for post-training optimization generative AI models during multi-turn user interactions, preference feedback more robust handcrafted reward models. In this setting, driven by both an offline dataset -- collected a rater unknown 'competence' and online data with...

10.48550/arxiv.2501.18873 preprint EN arXiv (Cornell University) 2025-01-30

Matroid Bandits: Fast Combinatorial Optimization with Learning

OPENALEX - Publications

Branislav Kveton Zheng Wen Azin Ashkan Hoda Eydgahi Brian Eriksson

A matroid is a notion of independence in combinatorial optimization which closely related to computational efficiency. In particular, it well known that the maximum constrained modular function can be found greedily if and only constraints are associated with matroid. this paper, we bring together ideas bandits matroids, propose new class bandits, bandits. The objective these problems learn how maximize on This stochastic initially unknown. We practical algorithm for solving our problem,...

10.48550/arxiv.1403.5045 preprint EN other-oa arXiv (Cornell University) 2014-01-01

Efficient Learning in Large-Scale Combinatorial Semi-Bandits

OPENALEX - Publications

Zheng Wen Branislav Kveton Azin Ashkan

A stochastic combinatorial semi-bandit is an online learning problem where at each step a agent chooses subset of ground items subject to constraints, and then observes weights these receives their sum as payoff. In this paper, we consider efficient in large-scale semi-bandits with linear generalization, solution, propose two algorithms called Combinatorial Linear Thompson Sampling (CombLinTS) UCB (CombLinUCB). Both are computationally long the offline version can be solved efficiently. We...

10.48550/arxiv.1406.7443 preprint EN other-oa arXiv (Cornell University) 2014-01-01

Cascading Bandits for Large-Scale Recommendation Problems

OPENALEX - Publications

Shi Zong Hao Ni Kenny Sung Nan Rosemary Ke Zheng Wen and 1 more

Most recommender systems recommend a list of items. The user examines the list, from first item to last, and often chooses attractive does not examine rest. This type behavior can be modeled by cascade model. In this work, we study cascading bandits, an online learning variant model where goal is $K$ most items large set $L$ candidate We propose two algorithms for solving problem, which are based on idea linear generalization. key in our solutions that learn predictor attraction...

10.48550/arxiv.1603.05359 preprint EN other-oa arXiv (Cornell University) 2016-01-01

Nearly Optimal Adaptive Procedure with Change Detection for Piecewise-Stationary Bandit

OPENALEX - Publications

Yang Cao Zheng Wen Branislav Kveton Yao Xie

Multi-armed bandit (MAB) is a class of online learning problems where agent aims to maximize its expected cumulative reward while repeatedly selecting pull arms with unknown distributions. We consider scenario the distributions may change in piecewise-stationary fashion at time steps. show that by incorporating simple change-detection component classic UCB algorithms detect and adapt changes, our so-called M-UCB algorithm can achieve nearly optimal regret bound on order $O(\sqrt{MKT\log...

10.48550/arxiv.1802.03692 preprint EN other-oa arXiv (Cornell University) 2018-01-01

A Tutorial on Thompson Sampling

OPENALEX - Publications

Daniel Russo Benjamin Van Roy Abbas Kazerouni Ian Osband Zheng Wen

Thompson sampling is an algorithm for online decision problems where actions are taken sequentially in a manner that must balance between exploiting what known to maximize immediate performance and investing accumulate new information may improve future performance. The addresses broad range of computationally efficient therefore enjoying wide use. This tutorial covers the its application, illustrating concepts through examples, including Bernoulli bandit problems, shortest path product...

10.48550/arxiv.1707.02038 preprint EN other-oa arXiv (Cornell University) 2017-01-01

Garbage In, Reward Out: Bootstrapping Exploration in Multi-Armed Bandits

OPENALEX - Publications

Branislav Kveton Csaba Szepesvári Sharan Vaswani Zheng Wen Mohammad Ghavamzadeh and 1 more

We propose a bandit algorithm that explores by randomizing its history of rewards. Specifically, it pulls the arm with highest mean reward in non-parametric bootstrap sample pseudo design rewards such is optimistic sufficiently high probability. call our Giro, which stands for garbage in, out. analyze Giro Bernoulli and derive $O(K Δ^{-1} \log n)$ bound on $n$-round regret, where $Δ$ difference expected optimal best suboptimal arms, $K$ number arms. The main advantage exploration easily...

10.48550/arxiv.1811.05154 preprint EN other-oa arXiv (Cornell University) 2018-01-01

Coming Soon ...