- Advanced Bandit Algorithms Research
- Optimization and Search Problems
- Data Stream Mining Techniques
- Auction Theory and Applications
- Smart Grid Energy Management
- Machine Learning and Algorithms
- Mobile Crowdsensing and Crowdsourcing
- User Authentication and Security Systems
- Recommender Systems and Techniques
- RFID technology advancements
- Advanced Wireless Network Optimization
- Reinforcement Learning in Robotics
- IoT and Edge/Fog Computing
- Advanced MIMO Systems Optimization
- Distributed and Parallel Computing Systems
- Caching and Content Delivery
- Advanced Data Processing Techniques
- Energy Efficient Wireless Sensor Networks
- Fault Detection and Control Systems
- Cognitive Radio Networks and Spectrum Sensing
- Advanced Malware Detection Techniques
- Physical Unclonable Functions (PUFs) and Hardware Security
- Complex Network Analysis Techniques
- ICT in Developing Communities
- Evacuation and Crowd Dynamics
City University of Hong Kong
2024-2025
California Institute of Technology
2023-2024
University of Massachusetts Amherst
2023-2024
Amherst College
2024
Carnegie Mellon University
2019-2023
Nanjing University of Posts and Telecommunications
2015-2019
The Probabilistic Maximum Coverage (PMC) problem plays a pivotal role in modeling various network applications, such as mobile crowdsensing, which involves selecting nodes within graph that probabilistically cover other nodes. Our study focuses on PMC the framework of online learning, termed bandit, where parameters are initially unknown. In this scenario, decision-maker is tasked with learning these to maximize cumulative rewards from covered Despite prior research we propose novel variant,...
The combinatorial multi-armed bandit (CMAB) is a fundamental sequential decision-making framework, extensively studied over the past decade. However, existing work primarily focuses on online setting, overlooking substantial costs of interactions and readily available offline datasets. To overcome these limitations, we introduce Off-CMAB, first learning framework for CMAB. Central to our lower confidence bound (CLCB) algorithm, which combines pessimistic reward estimations with solvers....
We study a hinted heterogeneous multi-agent multi-armed bandits problem (HMA2B), where agents can query low-cost observations (hints) in addition to pulling arms. In this framework, each of the M has unique reward distribution over K arms, and T rounds, they observe arm pull only if no other agent pulls that arm. The goal is maximize total utility by querying minimal necessary hints without achieving time-independent regret. HMA2B both centralized decentralized setups. Our main algorithm,...
The open radio access network (O-RAN) architecture provides enhanced opportunities for integrating machine learning in 5G/6G resource management by decomposing RAN functionalities. Yet, generic mechanisms either do not fully exploit the disaggregated non-real-time and near-real-time controllers or ignore potential elasticity of application demands, another degree freedom managing resources. We introduce a two-timescale framework aimed at optimizing users' long-term total QoS. Rather than...
Passive RFID technology is widely used in user authentication and access control. We propose RF-Rhythm, a secure usable two-factor system with strong resilience to lost/stolen/cloned cards. In each legitimate performs sequence of taps on his/her card according self-chosen secret melody. Such rhythmic can induce phase changes the backscattered signals, which reader detect recover user’s tapping rhythm. addition verifying card’s identification information as usual, backend server compares...
We introduce and study the online pause resume problem. In this problem, a player attempts to find k lowest (alternatively, highest) prices in sequence of fixed length T, which is revealed sequentially. At each time step, presented with price decides whether accept or reject it. The incurs aswitching cost whenever their decision changes consecutive steps, i.e., they purchasing. This problem motivated by goal carbon-aware load shifting, where workload may be paused during periods high carbon...
Probabilistic maximum coverage (PMC) is an important problem that can model many network applications, including mobile crowdsensing, content delivery, and dynamic channel allocation, where operator chooses nodes in a graph probabilistically cover other nodes. In this paper, we study PMC under the online learning context: bandit. For bandit parameters are not known priori, decision maker needs to learn unknown goal maximize total rewards from covered Though has been studied previously,...
We introduce and study the online pause resume problem. In this problem, a player attempts to find k lowest (alternatively, highest) prices in sequence of fixed length T, which is revealed sequentially. At each time step, presented with price decides whether accept or reject it. The incurs switching cost whenever their decision changes consecutive steps, i.e., they purchasing. This problem motivated by goal carbon-aware load shifting, where workload may be paused during periods high carbon...
We consider the stochastic multi-armed bandit (MAB) problem in a setting where player can, at cost, pre-observe one or multiple arms before playing of them each round. Apart from classic trade-off between exploration (trying out more to find best one) and exploitation (sticking with arm believed offer highest reward), we encounter an additional dilemma single round, i.e., pre-observing gives higher chance play one, but incurs larger cost which decreases overall reward. design...
Recently, mobile social networks in proximity (MSNP) has gained tremendous attentions, which explicitly explores the physical of individuals, and greatly facilitates their face-to-face interactions through direct communications. Generally, Proximity awareness can be enabled with two methodologies: Over-The-Top (OTT), centralized server determines based on users' location updates interests, intermediate between proximal users, Device-to-Device (D2D), allows devices radio range to discovery...
The recent advances of conversational recommendations provide a promising way to efficiently elicit users' preferences via interactions. To achieve this, the recommender system conducts conversations with users, asking their for different items or item categories. Most existing systems cold-start users utilize multi-armed bandit framework learn preference in an online manner. However, they rely on pre-defined conversation frequency about categories instead individual items, which may incur...
We study the sequential resource allocation problem where a decision maker repeatedly allocates budgets between resources. Motivating examples include allocating limited computing time or wireless spectrum bands to multiple users (i.e., resources). At each timestep, should distribute its available among different resources maximize expected reward, equivalently minimize cumulative regret. In doing so, learn value of allocated for user from feedback on user's received reward. For example, may...
Passive RFID technology is widely used in user authentication and access control. We propose RF-Rhythm, a secure usable two-factor system with strong resilience to lost/stolen/cloned cards. In each legitimate performs sequence of taps on his/her card according self-chosen secret melody. Such rhythmic can induce phase changes the backscattered signals, which reader detect recover user's tapping rhythm. addition verifying card's identification information as usual, backend server compares...
Recently, WiFi networks have witnessed an unprecedented deployment, and APs are ubiquitous around the world. On one hand, clients to actively discover new access points (APs), try them, even though most of privately owned, which wastes precious energy mobile devices due excessive listening scanning operations network interface cards. other many wireless technologies coexist in overlapping channels (e.g., 2.4 GHz ISM bands). It provides a opportunity cross-technology communication (CTC) that...
Online influence maximization has attracted much attention as a way to maximize spread through social network while learning the values of unknown parameters. Most previous works focus on single-item diffusion. In this paper, we introduce new Competitive Influence Maximization (OCIM) problem, where two competing items (e.g., products, news stories) propagate in same and probabilities edges are unknown. We adopt combinatorial multi-armed bandit (CMAB) framework for OCIM, but unlike...
We consider the stochastic multi-armed bandit (MAB) problem in a setting where player can pay to pre-observe arm rewards before playing an each round. Apart from usual trade-off between exploring new arms find best one and exploiting believed offer highest reward, we encounter additional dilemma: pre-observing more gives higher chance play one, but incurs larger cost. For single-player setting, design Observe-Before-Play Upper Confidence Bound (OBP-UCB) algorithm for K with Bernoulli...
In this paper, we study the combinatorial semi-bandits (CMAB) and focus on reducing dependency of batch-size $K$ in regret bound, where is total number arms that can be pulled or triggered each round. First, for setting CMAB with probabilistically (CMAB-T), discover a novel (directional) triggering probability variance modulated (TPVM) condition replace previously-used smoothness various applications, such as cascading bandits, online network exploration influence maximization. Under new...
Brief Biography: Jinhang Zuo is a joint postdoc at UMass Amherst and Caltech. He received his Ph.D. in ECE from CMU 2022. His main research interests include online learning, resource allocation, networked systems. the recipient of CDS Postdoctoral Fellowship Amherst, SIGMETRICS 2022 Best Poster Award, AAAI-20 Student Scholarship, Carnegie Institute Technology Dean's Fellowship.
Foundation models (FMs) emerge as a promising solution to harness distributed and diverse environmental data by leveraging prior knowledge understand the complicated temporal spatial correlations within heterogeneous datasets. Unlike learning frameworks such federated learning, which often struggle with multimodal data, FMs can transform inputs into embeddings. This process facilitates integration of information from various modalities application new domains. However, deploying in...
We introduce a novel framework of combinatorial multi-armed bandits (CMAB) with multivariant and probabilistically triggering arms (CMAB-MT), where the outcome each arm is $d$-dimensional random variable feedback follows general process. Compared existing CMAB works, CMAB-MT not only enhances modeling power but also allows improved results by leveraging distinct statistical properties for variables. For CMAB-MT, we propose 1-norm probability-modulated smoothness condition, an optimistic...
We introduce and study the online pause resume problem. In this problem, a player attempts to find k lowest (alternatively, highest) prices in sequence of fixed length T, which is revealed sequentially. At each time step, presented with price decides whether accept or reject it. The incurs switching cost whenever their decision changes consecutive steps, i.e., they purchasing. This problem motivated by goal carbon-aware load shifting, where workload may be paused during periods high carbon...
Foundation models (FMs) emerge as a promising solution to harness distributed and diverse environmental data by leveraging prior knowledge understand the complicated temporal spatial correlations within heterogeneous datasets.Unlike learning frameworks such federated learning, which often struggle with multimodal data, FMs can transform inputs into embeddings.This process facilitates integration of information from various modalities application new domains.However, deploying in...
This paper investigates stochastic multi-armed bandit algorithms that are robust to adversarial attacks, where an attacker can first observe the learner's action and {then} alter their reward observation. We study two cases of this model, with or without knowledge attack budget $C$, defined as upper bound summation difference between actual altered rewards. For both cases, we devise types regret bounds having additive multiplicative $C$ dependence terms. known case, prove our achieve...