- Advanced Bandit Algorithms Research
- Auction Theory and Applications
- Machine Learning and Algorithms
- Optimization and Search Problems
- Data Stream Mining Techniques
- Smart Grid Energy Management
- Distributed Sensor Networks and Detection Algorithms
- Reinforcement Learning in Robotics
- Game Theory and Voting Systems
- Machine Learning and Data Classification
- Consumer Market Behavior and Pricing
- Financial Markets and Investment Strategies
- Face and Expression Recognition
- Adversarial Robustness in Machine Learning
- Gaussian Processes and Bayesian Inference
- Statistical Methods in Clinical Trials
- Scheduling and Timetabling Solutions
- Target Tracking and Data Fusion in Sensor Networks
- Game Theory and Applications
- Experimental Behavioral Economics Studies
- Neural Networks and Applications
- Cognitive Radio Networks and Spectrum Sensing
- Smart Systems and Machine Learning
- Age of Information Optimization
- Retirement, Disability, and Employment
New York University
2021-2024
The University of Tokyo
2013-2019
We discuss a multiple-play multi-armed bandit (MAB) problem in which several arms are selected at each round. Recently, Thompson sampling (TS), randomized algorithm with Bayesian spirit, has attracted much attention for its empirically excellent performance, and it is revealed to have an optimal regret bound the standard single-play MAB problem. In this paper, we propose (MP-TS) algorithm, extension of TS problem, analysis. prove that MP-TS binary rewards upper matches lower provided by...
We study an online forecasting setting in which, over $T$ rounds, $N$ strategic experts each report a forecast to mechanism, the mechanism selects one forecast, and then outcome is revealed. In any given round, expert has belief about outcome, but wishes select its so as maximize total number of times it selected. The goal obtain low regret: difference between cumulative loss (based on selected forecasts) best hindsight (as measured by experts' beliefs). consider exactly truthful mechanisms...
Existing data-dependent and best-of-both-worlds regret bounds for multi-armed bandits problems have limited adaptivity as they are either but not (BOBW), BOBW or sub-optimal $O(\sqrt{T\ln{T}})$ worst-case guarantee in the adversarial regime. To overcome these limitations, we propose real-time stability-penalty matching (SPM), a new method obtaining that simultaneously data-dependent, $T$-optimal problems. In particular, show SPM obtains with guarantees of order $O(\sqrt{T})$ regime...
We analyze statistical discrimination in hiring markets using a multiarmed bandit model. Myopic firms face workers arriving with heterogeneous observable characteristics. The association between the worker’s skill and characteristics is unknown ex ante; thus, need to learn it. Laissez-faire causes perpetual underestimation: minority are rarely hired, therefore, underestimation tends persist. Even marginal imbalance population ratio frequently results underestimation. demonstrate that subsidy...
The sea crossing from Libya to Italy is one of the world's most dangerous and politically contentious migration routes, yet over half a million people have attempted since 2014. Leveraging data on aggregate flows individual incidents, we estimate how migrants smugglers reacted changes in border enforcement regime, namely rise interceptions by Libyan Coast Guard starting 2017 corresponding decrease probability rescue Europe. We find support for deterrence effect which crossings along Central...
Emerging patterns are whose support significantly differs between two databases. We study the problem of listing emerging with a multiple testing guarantee. Recently, Terada et al., proposed Limitless Arity Multiple-testing Procedure (LAMP) that controls family-wise error rate (FWER) in statistical association mining. LAMP reduces number "untestable" hypotheses without compromising its power. Still, FWER is restrictive, and as result, power inherently unsatisfying when large. On other hand,...
We study the $K$-armed dueling bandit problem, a variation of standard stochastic problem where feedback is limited to relative comparisons pair arms. introduce tight asymptotic regret lower bound that based on information divergence. An algorithm inspired by Deterministic Minimum Empirical Divergence (Honda and Takemura, 2010) proposed, its analyzed. The proposed found be first one with upper matches bound. Experimental algorithms show significantly outperforms existing ones.
The Multi-Armed Bandit (MAB) is a fundamental model capturing the dilemma between exploration and exploitation in sequential decision making. At every time step, maker selects set of arms observes reward from each chosen arms. In this paper, we present variant problem, which call Scaling MAB (S-MAB): goal not only to maximize cumulative rewards, i.e., choosing with highest expected reward, but also decide how many select so that, expectation, cost selecting does exceed rewards. This problem...
Algorithmic decision making process now affects many aspects of our lives. Standard tools for machine learning, such as classification and regression, are subject to the bias in data, thus direct application off-the-shelf could lead a specific group being unfairly discriminated. Removing sensitive attributes data does not solve this problem because \textit{disparate impact} can arise when non-sensitive correlated. Here, we study fair learning algorithm that avoids disparate impact decision....
This paper considers the capacity expansion problem in two-sided matchings, where policymaker is allowed to allocate some extra seats as well standard seats. In medical residency match, each hospital accepts a limited number of doctors. Such constraints are typically given advance. However, such exogenous can compromise welfare doctors; popular hospitals inevitably dismiss their favorite Meanwhile, it often case that also benefited accept few To tackle problem, we propose an anytime method...
We study the K-armed dueling bandit problem, a variation of standard stochastic problem where feedback is limited to relative comparisons pair arms. The hardness recommending Copeland winners, arms that beat greatest number other arms, characterized by deriving an asymptotic regret bound. propose Winners Relative Minimum Empirical Divergence (CW-RMED) and derive asymptotically optimal bound for it. However, it not known whether algorithm can be efficiently computed or not. To address this...
We study the budgeted multi-armed bandit problem with stochastic action costs. In this problem, a player not only receives reward but also pays cost for an of his/her choice. The goal is to maximize cumulative he/she before total exceeds budget. classical policy called KL-UCB known perform well. propose KL-UCB-SC, extension problem. prove that KL-UCB-SC asymptotically optimal case Bernoulli costs and rewards. To best our knowledge, first result shows asymptotic optimality in fact, regret...
Partial monitoring is a general model for sequential learning with limited feedback formalized as game between two players. In this game, the learner chooses an action and at same time opponent outcome, then suffers loss receives signal. The goal of to minimize total loss. paper, we study partial finite actions stochastic outcomes. We derive logarithmic distribution-dependent regret lower bound that defines hardness problem. Inspired by DMED algorithm (Honda Takemura, 2010) multi-armed...
This work is motivated by the growing demand for reproducible machine learning. We study stochastic multi-armed bandit problem. In particular, we consider a replicable algorithm that ensures, with high probability, algorithm's sequence of actions not affected randomness inherent in dataset. observe existing algorithms require $O(1/\rho^2)$ times more regret than nonreplicable algorithms, where $\rho$ level nonreplication. However, demonstrate this additional cost unnecessary when time...
We consider the fixed-confidence best arm identification (FC-BAI) problem in Bayesian Setting. This aims to find of largest mean with a fixed confidence level when bandit model has been sampled from known prior. Most studies on FC-BAI have conducted frequentist setting, where is predetermined before game starts. show that traditional algorithms studied such as track-and-stop and top-two algorithms, result arbitrary suboptimal performances setting. also prove lower bound expected number...