- Auction Theory and Applications
- Advanced Bandit Algorithms Research
- Supply Chain and Inventory Management
- Consumer Market Behavior and Pricing
- Reinforcement Learning in Robotics
- Optimization and Search Problems
- Mobile Crowdsensing and Crowdsourcing
- Machine Learning and Algorithms
- Distributed Sensor Networks and Detection Algorithms
- Digital Platforms and Economics
University of British Columbia
2025
Stanford University
2020-2021
Doubly Optimal No-Regret Online Learning in Strongly Monotone Games with Bandit Feedback Curious about how players can learn and adapt unknown games without knowing the game’s dynamics? In “Doubly Feedback,” Ba, Lin, Zhang, Zhou present a novel bandit learning algorithm for no-regret where each player only observes its reward determined by all players’ current joint action, not gradient. Focusing on smooth strongly monotone games, they introduce using self-concordant barrier functions. This...
We consider online no-regret learning in unknown games with bandit feedback, where each agent only observes its reward at time -- determined by all players' current joint action rather than gradient. focus on the class of smooth and strongly monotone study optimal therein. Leveraging self-concordant barrier functions, we first construct an convex optimization algorithm show that it achieves single-agent regret $\tilde{\Theta}(\sqrt{T})$ under strongly-concave payoff functions. then if...
We study the implications of selling through a voice-based virtual assistant (VA). The seller has set products available and VA decides which product to offer at what price, seeking maximize its revenue, consumer- or total-surplus. consumer is impatient rational, her expected utility given information her. selects based on consumer's request other it then presents them sequentially. Once presented priced, evaluates whether make purchase. valuation each comprises pre-evaluation value, common...
We consider online no-regret learning in unknown games with bandit feedback, where each player can only observe its reward at time -- determined by all players' current joint action rather than gradient. focus on the class of \textit{smooth and strongly monotone} study optimal therein. Leveraging self-concordant barrier functions, we first construct a new algorithm show that it achieves single-agent regret $\tilde{\Theta}(n\sqrt{T})$ under smooth concave functions ($n \geq 1$ is problem...
We study the implications of selling through a voice-based virtual assistant (VA). The seller has set products available and VA decides which product to offer at what price, seeking maximize its revenue, consumer- or total-surplus. consumer is impatient rational, her expected utility given information her. selects based on consumer's request other it then presents them sequentially. Once presented priced, evaluates whether make purchase. valuation each comprises pre-evaluation value, common...
We present a data-driven algorithm that advertisers can use to automate their digital ad-campaigns at online publishers. The enables the advertiser search across available target audiences and ad-media find best possible combination for its campaign via experimentation. problem of finding audience-ad is complicated by number distinctive challenges, including (a) need active exploration resolve prior uncertainty speed profitable combinations, (b) many combinations choose from, giving rise...