- Advanced Bandit Algorithms Research
- Machine Learning and Algorithms
- Reinforcement Learning in Robotics
- Optimization and Search Problems
- Data Stream Mining Techniques
- Recommender Systems and Techniques
- Mobile Crowdsensing and Crowdsourcing
- Auction Theory and Applications
- Topic Modeling
- Smart Grid Energy Management
- Bayesian Modeling and Causal Inference
- Domain Adaptation and Few-Shot Learning
- Advanced Multi-Objective Optimization Algorithms
- Stochastic Gradient Optimization Techniques
- Machine Learning and Data Classification
- Face and Expression Recognition
- Consumer Market Behavior and Pricing
- Privacy-Preserving Technologies in Data
- Anomaly Detection Techniques and Applications
- Human Mobility and Location-Based Analysis
- Advanced Database Systems and Queries
- Time Series Analysis and Forecasting
- Sparse and Compressive Sensing Techniques
- Advanced Control Systems Optimization
- Complex Network Analysis Techniques
Amazon (United States)
2022-2024
Seattle University
2024
Amazon (Germany)
2021-2024
Technicolor (United States)
2011-2021
Google (United States)
2018-2021
Adobe Systems (United States)
2014-2020
Technicolor (Germany)
2007-2014
Intel (United States)
2007-2013
Emerson (United States)
2013
University of Pittsburgh
2003-2012
A search engine usually outputs a list of $K$ web pages. The user examines this list, from the first page to last, and chooses attractive page. This model behavior is known as cascade model. In paper, we propose cascading bandits, learning variant where objective identify most items. We formulate our problem stochastic combinatorial partial monitoring problem. two algorithms for solving it, CascadeUCB1 CascadeKL-UCB. also prove gap-dependent upper bounds on regret these derive lower bound in...
We propose a practical methodology to protect user's private data, when he wishes publicly release data that is correlated with his in the hope of getting some utility. Our approach relies on general statistical inference framework captures privacy threat under attacks, given utility constraints. Under this framework, distorted before it released, according privacy-preserving probabilistic mapping. This mapping obtained by solving convex optimization problem, which minimizes information...
The probability that a user will click search result depends both on its relevance and position the results page. based model explains this behavior by ascribing to every item an attraction probability, examination probability. To be clicked, must attractive examined. probabilities of item-position pair being clicked thus form entries rank-1 matrix. We propose learning problem Bernoulli bandit where at each step, agent chooses row column arms, receives product their Bernoulli-distributed...
We propose a practical methodology to protect user's private data, when he wishes publicly release data that is correlated with his in the hope of getting some utility. Our approach relies on general statistical inference framework captures privacy threat under attacks, given utility constraints. Under this framework, distorted before it released, according privacy-preserving probabilistic mapping. This mapping obtained by solving convex optimization problem, which minimizes information...
This paper considers stochastic bandits with side observations, a model that accounts for both the exploration/exploitation dilemma and relationships between arms. In this setting, after pulling an arm i, decision maker also observes rewards some other actions related to i. We will see is suited content recommendation in social networks, where users' reactions may be endorsed or not by their friends. provide efficient algorithms based on upper confidence bounds (UCBs) leverage additional...
A stochastic combinatorial semi-bandit is an online learning problem where at each step a agent chooses subset of ground items subject to constraints, and then observes weights these receives their sum as payoff. In this paper, we close the computationally sample efficient in semi-bandits. particular, analyze UCB-like algorithm for solving problem, which known be efficient; prove $O(K L (1 / Δ) \log n)$ $O(\sqrt{K n n})$ upper bounds on its $n$-step regret, $L$ number items, $K$ maximum...
A search engine recommends to the user a list of web pages. The examines this list, from first page last, and clicks on all attractive pages until is satisfied. This behavior can be described by dependent click model (DCM). We propose DCM bandits, an online learning variant where goal maximize probability recommending satisfactory items, such as main challenge our problem that we do not observe which item satisfactory. computationally-efficient algorithm for solving problem, dcmKL-UCB;...
We study the online influence maximization problem in social networks under independent cascade model. Specifically, we aim to learn set of "best influencers" a network while repeatedly interacting with it. address challenges (i) combinatorial action space, since number feasible influencer sets grows exponentially maximum influencers, and (ii) limited feedback, only influenced portion is observed. Under stochastic semi-bandit propose analyze IMLinUCB, computationally efficient UCB-based...
Markov decision processes (MDPs) are an established framework for solving sequential decision-making problems under uncertainty. In this work, we propose a new method batch-mode reinforcement learning (RL) with continuous state variables. The is approximation to kernel-based RL on set of k representative states. Similarly RL, our solution fixed point kernelized Bellman operator and can approximate the optimal arbitrary level granularity. Unlike fast. particular, policies be computed in O(n)...
We study estimator selection and hyper-parameter tuning in off-policy evaluation. Although cross-validation is the most popular method for model supervised learning, evaluation relies mostly on theory, which provides only limited guidance to practitioners. show how use This challenges a belief that not feasible. evaluate our empirically it addresses variety of cases.
A matroid is a notion of independence in combinatorial optimization which closely related to computational efficiency. In particular, it well known that the maximum constrained modular function can be found greedily if and only constraints are associated with matroid. this paper, we bring together ideas bandits matroids, propose new class bandits, bandits. The objective these problems learn how maximize on This stochastic initially unknown. We practical algorithm for solving our problem,...
A stochastic combinatorial semi-bandit is an online learning problem where at each step a agent chooses subset of ground items subject to constraints, and then observes weights these receives their sum as payoff. In this paper, we consider efficient in large-scale semi-bandits with linear generalization, solution, propose two algorithms called Combinatorial Linear Thompson Sampling (CombLinTS) UCB (CombLinUCB). Both are computationally long the offline version can be solved efficiently. We...
Many web systems rank and present a list of items to users, from recommender search advertising. An important problem in practice is evaluate new ranking policies offline optimize them before they are deployed. We address this by proposing evaluation algorithms for estimating the expected number clicks on ranked lists historical logged data. The existing not guaranteed be statistically efficient our because recommended can grow exponentially with their length. To overcome challenge, we use...
Most recommender systems recommend a list of items. The user examines the list, from first item to last, and often chooses attractive does not examine rest. This type behavior can be modeled by cascade model. In this work, we study cascading bandits, an online learning variant model where goal is $K$ most items large set $L$ candidate We propose two algorithms for solving problem, which are based on idea linear generalization. key in our solutions that learn predictor attraction...
Multi-armed bandit (MAB) is a class of online learning problems where agent aims to maximize its expected cumulative reward while repeatedly selecting pull arms with unknown distributions. We consider scenario the distributions may change in piecewise-stationary fashion at time steps. show that by incorporating simple change-detection component classic UCB algorithms detect and adapt changes, our so-called M-UCB algorithm can achieve nearly optimal regret bound on order $O(\sqrt{MKT\log...
Offline data-driven evaluation is considered a low-cost and more accessible alternative to the online empirical method of assessing quality recommender systems. Despite their popularity effectiveness, most approaches are unsuitable for evaluating interactive In this article, we attempt address issue by simulating user interactions with system as part process. Particularly, demonstrate that simulated users find desired item efficiently when recommendations presented list carousels compared...