Junpei Komiyama

ORCID: 0000-0003-0095-6558
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Advanced Bandit Algorithms Research
  • Auction Theory and Applications
  • Machine Learning and Algorithms
  • Optimization and Search Problems
  • Data Stream Mining Techniques
  • Smart Grid Energy Management
  • Distributed Sensor Networks and Detection Algorithms
  • Reinforcement Learning in Robotics
  • Game Theory and Voting Systems
  • Machine Learning and Data Classification
  • Consumer Market Behavior and Pricing
  • Financial Markets and Investment Strategies
  • Face and Expression Recognition
  • Adversarial Robustness in Machine Learning
  • Gaussian Processes and Bayesian Inference
  • Statistical Methods in Clinical Trials
  • Scheduling and Timetabling Solutions
  • Target Tracking and Data Fusion in Sensor Networks
  • Game Theory and Applications
  • Experimental Behavioral Economics Studies
  • Neural Networks and Applications
  • Cognitive Radio Networks and Spectrum Sensing
  • Smart Systems and Machine Learning
  • Age of Information Optimization
  • Retirement, Disability, and Employment

New York University
2021-2024

The University of Tokyo
2013-2019

We discuss a multiple-play multi-armed bandit (MAB) problem in which several arms are selected at each round. Recently, Thompson sampling (TS), randomized algorithm with Bayesian spirit, has attracted much attention for its empirically excellent performance, and it is revealed to have an optimal regret bound the standard single-play MAB problem. In this paper, we propose (MP-TS) algorithm, extension of TS problem, analysis. prove that MP-TS binary rewards upper matches lower provided by...

10.48550/arxiv.1506.00779 preprint EN other-oa arXiv (Cornell University) 2015-01-01

We study an online forecasting setting in which, over $T$ rounds, $N$ strategic experts each report a forecast to mechanism, the mechanism selects one forecast, and then outcome is revealed. In any given round, expert has belief about outcome, but wishes select its so as maximize total number of times it selected. The goal obtain low regret: difference between cumulative loss (based on selected forecasts) best hindsight (as measured by experts' beliefs). consider exactly truthful mechanisms...

10.48550/arxiv.2502.11483 preprint EN arXiv (Cornell University) 2025-02-17

Existing data-dependent and best-of-both-worlds regret bounds for multi-armed bandits problems have limited adaptivity as they are either but not (BOBW), BOBW or sub-optimal $O(\sqrt{T\ln{T}})$ worst-case guarantee in the adversarial regime. To overcome these limitations, we propose real-time stability-penalty matching (SPM), a new method obtaining that simultaneously data-dependent, $T$-optimal problems. In particular, show SPM obtains with guarantees of order $O(\sqrt{T})$ regime...

10.48550/arxiv.2502.08143 preprint EN arXiv (Cornell University) 2025-02-12

We analyze statistical discrimination in hiring markets using a multiarmed bandit model. Myopic firms face workers arriving with heterogeneous observable characteristics. The association between the worker’s skill and characteristics is unknown ex ante; thus, need to learn it. Laissez-faire causes perpetual underestimation: minority are rarely hired, therefore, underestimation tends persist. Even marginal imbalance population ratio frequently results underestimation. demonstrate that subsidy...

10.1287/mnsc.2022.00893 article EN Management Science 2024-03-29

The sea crossing from Libya to Italy is one of the world's most dangerous and politically contentious migration routes, yet over half a million people have attempted since 2014. Leveraging data on aggregate flows individual incidents, we estimate how migrants smugglers reacted changes in border enforcement regime, namely rise interceptions by Libyan Coast Guard starting 2017 corresponding decrease probability rescue Europe. We find support for deterrence effect which crossings along Central...

10.1371/journal.pone.0300553 article EN cc-by PLoS ONE 2024-04-19

Emerging patterns are whose support significantly differs between two databases. We study the problem of listing emerging with a multiple testing guarantee. Recently, Terada et al., proposed Limitless Arity Multiple-testing Procedure (LAMP) that controls family-wise error rate (FWER) in statistical association mining. LAMP reduces number "untestable" hypotheses without compromising its power. Still, FWER is restrictive, and as result, power inherently unsatisfying when large. On other hand,...

10.1145/3097983.3098137 article EN 2017-08-04

We study the $K$-armed dueling bandit problem, a variation of standard stochastic problem where feedback is limited to relative comparisons pair arms. introduce tight asymptotic regret lower bound that based on information divergence. An algorithm inspired by Deterministic Minimum Empirical Divergence (Honda and Takemura, 2010) proposed, its analyzed. The proposed found be first one with upper matches bound. Experimental algorithms show significantly outperforms existing ones.

10.48550/arxiv.1506.02550 preprint EN other-oa arXiv (Cornell University) 2015-01-01

The Multi-Armed Bandit (MAB) is a fundamental model capturing the dilemma between exploration and exploitation in sequential decision making. At every time step, maker selects set of arms observes reward from each chosen arms. In this paper, we present variant problem, which call Scaling MAB (S-MAB): goal not only to maximize cumulative rewards, i.e., choosing with highest expected reward, but also decide how many select so that, expectation, cost selecting does exceed rewards. This problem...

10.1145/3292500.3330862 article EN 2019-07-25

Algorithmic decision making process now affects many aspects of our lives. Standard tools for machine learning, such as classification and regression, are subject to the bias in data, thus direct application off-the-shelf could lead a specific group being unfairly discriminated. Removing sensitive attributes data does not solve this problem because \textit{disparate impact} can arise when non-sensitive correlated. Here, we study fair learning algorithm that avoids disparate impact decision....

10.48550/arxiv.1710.04924 preprint EN other-oa arXiv (Cornell University) 2017-01-01

This paper considers the capacity expansion problem in two-sided matchings, where policymaker is allowed to allocate some extra seats as well standard seats. In medical residency match, each hospital accepts a limited number of doctors. Such constraints are typically given advance. However, such exogenous can compromise welfare doctors; popular hospitals inevitably dismiss their favorite Meanwhile, it often case that also benefited accept few To tackle problem, we propose an anytime method...

10.24963/ijcai.2022/1 article EN Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence 2022-07-01

We study the K-armed dueling bandit problem, a variation of standard stochastic problem where feedback is limited to relative comparisons pair arms. The hardness recommending Copeland winners, arms that beat greatest number other arms, characterized by deriving an asymptotic regret bound. propose Winners Relative Minimum Empirical Divergence (CW-RMED) and derive asymptotically optimal bound for it. However, it not known whether algorithm can be efficiently computed or not. To address this...

10.48550/arxiv.1605.01677 preprint EN other-oa arXiv (Cornell University) 2016-01-01

We study the budgeted multi-armed bandit problem with stochastic action costs. In this problem, a player not only receives reward but also pays cost for an of his/her choice. The goal is to maximize cumulative he/she before total exceeds budget. classical policy called KL-UCB known perform well. propose KL-UCB-SC, extension problem. prove that KL-UCB-SC asymptotically optimal case Bernoulli costs and rewards. To best our knowledge, first result shows asymptotic optimality in fact, regret...

10.1587/transfun.e100.a.2470 article EN IEICE Transactions on Fundamentals of Electronics Communications and Computer Sciences 2017-01-01

Partial monitoring is a general model for sequential learning with limited feedback formalized as game between two players. In this game, the learner chooses an action and at same time opponent outcome, then suffers loss receives signal. The goal of to minimize total loss. paper, we study partial finite actions stochastic outcomes. We derive logarithmic distribution-dependent regret lower bound that defines hardness problem. Inspired by DMED algorithm (Honda Takemura, 2010) multi-armed...

10.48550/arxiv.1509.09011 preprint EN other-oa arXiv (Cornell University) 2015-01-01

This work is motivated by the growing demand for reproducible machine learning. We study stochastic multi-armed bandit problem. In particular, we consider a replicable algorithm that ensures, with high probability, algorithm's sequence of actions not affected randomness inherent in dataset. observe existing algorithms require $O(1/\rho^2)$ times more regret than nonreplicable algorithms, where $\rho$ level nonreplication. However, demonstrate this additional cost unnecessary when time...

10.48550/arxiv.2402.07391 preprint EN arXiv (Cornell University) 2024-02-11

We consider the fixed-confidence best arm identification (FC-BAI) problem in Bayesian Setting. This aims to find of largest mean with a fixed confidence level when bandit model has been sampled from known prior. Most studies on FC-BAI have conducted frequentist setting, where is predetermined before game starts. show that traditional algorithms studied such as track-and-stop and top-two algorithms, result arbitrary suboptimal performances setting. also prove lower bound expected number...

10.48550/arxiv.2402.10429 preprint EN arXiv (Cornell University) 2024-02-15
Coming Soon ...