Branislav Kveton

ORCID: 0000-0002-3965-1367
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Advanced Bandit Algorithms Research
  • Machine Learning and Algorithms
  • Reinforcement Learning in Robotics
  • Optimization and Search Problems
  • Data Stream Mining Techniques
  • Recommender Systems and Techniques
  • Mobile Crowdsensing and Crowdsourcing
  • Auction Theory and Applications
  • Topic Modeling
  • Smart Grid Energy Management
  • Bayesian Modeling and Causal Inference
  • Domain Adaptation and Few-Shot Learning
  • Advanced Multi-Objective Optimization Algorithms
  • Stochastic Gradient Optimization Techniques
  • Machine Learning and Data Classification
  • Face and Expression Recognition
  • Consumer Market Behavior and Pricing
  • Privacy-Preserving Technologies in Data
  • Anomaly Detection Techniques and Applications
  • Human Mobility and Location-Based Analysis
  • Advanced Database Systems and Queries
  • Time Series Analysis and Forecasting
  • Sparse and Compressive Sensing Techniques
  • Advanced Control Systems Optimization
  • Complex Network Analysis Techniques

Amazon (United States)
2022-2024

Seattle University
2024

Amazon (Germany)
2021-2024

Technicolor (United States)
2011-2021

Google (United States)
2018-2021

Adobe Systems (United States)
2014-2020

Technicolor (Germany)
2007-2014

Intel (United States)
2007-2013

Emerson (United States)
2013

University of Pittsburgh
2003-2012

A search engine usually outputs a list of $K$ web pages. The user examines this list, from the first page to last, and chooses attractive page. This model behavior is known as cascade model. In paper, we propose cascading bandits, learning variant where objective identify most items. We formulate our problem stochastic combinatorial partial monitoring problem. two algorithms for solving it, CascadeUCB1 CascadeKL-UCB. also prove gap-dependent upper bounds on regret these derive lower bound in...

10.48550/arxiv.1502.02763 preprint EN other-oa arXiv (Cornell University) 2015-01-01

We propose a practical methodology to protect user's private data, when he wishes publicly release data that is correlated with his in the hope of getting some utility. Our approach relies on general statistical inference framework captures privacy threat under attacks, given utility constraints. Under this framework, distorted before it released, according privacy-preserving probabilistic mapping. This mapping obtained by solving convex optimization problem, which minimizes information...

10.1109/globalsip.2013.6736867 article EN IEEE Global Conference on Signal and Information Processing 2013-12-01

The probability that a user will click search result depends both on its relevance and position the results page. based model explains this behavior by ascribing to every item an attraction probability, examination probability. To be clicked, must attractive examined. probabilities of item-position pair being clicked thus form entries rank-1 matrix. We propose learning problem Bernoulli bandit where at each step, agent chooses row column arms, receives product their Bernoulli-distributed...

10.24963/ijcai.2017/278 preprint EN 2017-07-28

We propose a practical methodology to protect user's private data, when he wishes publicly release data that is correlated with his in the hope of getting some utility. Our approach relies on general statistical inference framework captures privacy threat under attacks, given utility constraints. Under this framework, distorted before it released, according privacy-preserving probabilistic mapping. This mapping obtained by solving convex optimization problem, which minimizes information...

10.1109/jstsp.2015.2442227 article EN IEEE Journal of Selected Topics in Signal Processing 2015-06-05

This paper considers stochastic bandits with side observations, a model that accounts for both the exploration/exploitation dilemma and relationships between arms. In this setting, after pulling an arm i, decision maker also observes rewards some other actions related to i. We will see is suited content recommendation in social networks, where users' reactions may be endorsed or not by their friends. provide efficient algorithms based on upper confidence bounds (UCBs) leverage additional...

10.48550/arxiv.1210.4839 preprint EN other-oa arXiv (Cornell University) 2012-01-01

A stochastic combinatorial semi-bandit is an online learning problem where at each step a agent chooses subset of ground items subject to constraints, and then observes weights these receives their sum as payoff. In this paper, we close the computationally sample efficient in semi-bandits. particular, analyze UCB-like algorithm for solving problem, which known be efficient; prove $O(K L (1 / Δ) \log n)$ $O(\sqrt{K n n})$ upper bounds on its $n$-step regret, $L$ number items, $K$ maximum...

10.48550/arxiv.1410.0949 preprint EN other-oa arXiv (Cornell University) 2014-01-01

A search engine recommends to the user a list of web pages. The examines this list, from first page last, and clicks on all attractive pages until is satisfied. This behavior can be described by dependent click model (DCM). We propose DCM bandits, an online learning variant where goal maximize probability recommending satisfactory items, such as main challenge our problem that we do not observe which item satisfactory. computationally-efficient algorithm for solving problem, dcmKL-UCB;...

10.48550/arxiv.1602.03146 preprint EN other-oa arXiv (Cornell University) 2016-01-01

We study the online influence maximization problem in social networks under independent cascade model. Specifically, we aim to learn set of "best influencers" a network while repeatedly interacting with it. address challenges (i) combinatorial action space, since number feasible influencer sets grows exponentially maximum influencers, and (ii) limited feedback, only influenced portion is observed. Under stochastic semi-bandit propose analyze IMLinUCB, computationally efficient UCB-based...

10.48550/arxiv.1605.06593 preprint EN other-oa arXiv (Cornell University) 2016-01-01

Markov decision processes (MDPs) are an established framework for solving sequential decision-making problems under uncertainty. In this work, we propose a new method batch-mode reinforcement learning (RL) with continuous state variables. The is approximation to kernel-based RL on set of k representative states. Similarly RL, our solution fixed point kernelized Bellman operator and can approximate the optimal arbitrary level granularity. Unlike fast. particular, policies be computed in O(n)...

10.1609/aaai.v26i1.8294 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2021-09-20

We study estimator selection and hyper-parameter tuning in off-policy evaluation. Although cross-validation is the most popular method for model supervised learning, evaluation relies mostly on theory, which provides only limited guidance to practitioners. show how use This challenges a belief that not feasible. evaluate our empirically it addresses variety of cases.

10.1609/aaai.v39i15.33765 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2025-04-11

A matroid is a notion of independence in combinatorial optimization which closely related to computational efficiency. In particular, it well known that the maximum constrained modular function can be found greedily if and only constraints are associated with matroid. this paper, we bring together ideas bandits matroids, propose new class bandits, bandits. The objective these problems learn how maximize on This stochastic initially unknown. We practical algorithm for solving our problem,...

10.48550/arxiv.1403.5045 preprint EN other-oa arXiv (Cornell University) 2014-01-01

A stochastic combinatorial semi-bandit is an online learning problem where at each step a agent chooses subset of ground items subject to constraints, and then observes weights these receives their sum as payoff. In this paper, we consider efficient in large-scale semi-bandits with linear generalization, solution, propose two algorithms called Combinatorial Linear Thompson Sampling (CombLinTS) UCB (CombLinUCB). Both are computationally long the offline version can be solved efficiently. We...

10.48550/arxiv.1406.7443 preprint EN other-oa arXiv (Cornell University) 2014-01-01

Many web systems rank and present a list of items to users, from recommender search advertising. An important problem in practice is evaluate new ranking policies offline optimize them before they are deployed. We address this by proposing evaluation algorithms for estimating the expected number clicks on ranked lists historical logged data. The existing not guaranteed be statistically efficient our because recommended can grow exponentially with their length. To overcome challenge, we use...

10.1145/3219819.3220028 preprint EN 2018-07-19

Most recommender systems recommend a list of items. The user examines the list, from first item to last, and often chooses attractive does not examine rest. This type behavior can be modeled by cascade model. In this work, we study cascading bandits, an online learning variant model where goal is $K$ most items large set $L$ candidate We propose two algorithms for solving problem, which are based on idea linear generalization. key in our solutions that learn predictor attraction...

10.48550/arxiv.1603.05359 preprint EN other-oa arXiv (Cornell University) 2016-01-01

Multi-armed bandit (MAB) is a class of online learning problems where agent aims to maximize its expected cumulative reward while repeatedly selecting pull arms with unknown distributions. We consider scenario the distributions may change in piecewise-stationary fashion at time steps. show that by incorporating simple change-detection component classic UCB algorithms detect and adapt changes, our so-called M-UCB algorithm can achieve nearly optimal regret bound on order $O(\sqrt{MKT\log...

10.48550/arxiv.1802.03692 preprint EN other-oa arXiv (Cornell University) 2018-01-01

Offline data-driven evaluation is considered a low-cost and more accessible alternative to the online empirical method of assessing quality recommender systems. Despite their popularity effectiveness, most approaches are unsuitable for evaluating interactive In this article, we attempt address issue by simulating user interactions with system as part process. Particularly, demonstrate that simulated users find desired item efficiently when recommendations presented list carousels compared...

10.1145/3643709 article EN cc-by ACM Transactions on Recommender Systems 2024-01-30
Coming Soon ...