Jiafan He

ORCID: 0009-0008-0815-5783
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Advanced Bandit Algorithms Research
  • Reinforcement Learning in Robotics
  • Military Defense Systems Analysis
  • Guidance and Control Systems
  • Adversarial Robustness in Machine Learning
  • Optimization and Search Problems
  • UAV Applications and Optimization
  • Aerospace and Aviation Technology
  • Musculoskeletal pain and rehabilitation
  • Smart Grid Energy Management
  • Auction Theory and Applications
  • Robotics and Sensor-Based Localization
  • Data Stream Mining Techniques
  • Pain Management and Opioid Use
  • Machine Learning and ELM
  • Simulation Techniques and Applications
  • Age of Information Optimization
  • Machine Learning and Algorithms
  • Influenza Virus Research Studies
  • Advanced Image Processing Techniques
  • Game Theory and Voting Systems
  • Spacecraft Dynamics and Control
  • Face and Expression Recognition
  • Pain Management and Placebo Effect
  • Image Processing Techniques and Applications

Institute of Electronics
2024

Nanjing University of Information Science and Technology
2020-2024

Tsinghua University
2019

This study aimed to investigate the pain situation, functional limitations, treatment used, care-seeking behaviors, and educational preferences of adults with in mainland China. An online questionnaire was developed through expert validation, participants were recruited via social media platforms. Inclusion criteria required having access Internet smartphones, while individuals significant cognitive impairments or severe mental illness excluded. 1566 participants, predominantly male (951) a...

10.3390/healthcare13030289 article EN Healthcare 2025-01-31

We study the problem of allocating T indivisible items that arrive online to agents with additive valuations. The allocation must satisfy a prominent fairness notion, envy-freeness up one item (EF1), at each round. To make this possible, we allow reallocation previously allocated items, but aim minimize these so-called adjustments. For case two agents, show algorithms are informed about values future can get by without any adjustments, whereas uninformed require Theta(T) general three or...

10.24963/ijcai.2019/49 article EN 2019-07-28

Semi-supervised learning has been proven to be effective in utilizing unlabeled samples mitigate the problem of limited labeled data. Traditional semi-supervised methods generate pseudo-labels for and train classifier using both pseudo-labeled samples. However, data-scarce scenarios, reliance on initial generation can degrade performance. Methods based consistency regularization have shown promising results by encouraging consistent outputs different semantic variations same sample obtained...

10.3390/a17030091 article EN cc-by Algorithms 2024-02-20

We study federated contextual linear bandits, where $M$ agents cooperate with each other to solve a global bandit problem the help of central server. consider asynchronous setting, all work independently and communication between one agent server will not trigger agents' communication. propose simple algorithm named \texttt{FedLinUCB} based on principle optimism. prove that regret is bounded by $\tilde{O}(d\sqrt{\sum_{m=1}^M T_m})$ complexity $\tilde{O}(dM^2)$, $d$ dimension vector $T_m$...

10.48550/arxiv.2207.03106 preprint EN other-oa arXiv (Cornell University) 2022-01-01

Command and control in modern air combat is showing pressing demand for intelligent decision-making. Research on decision-making technique relies data accumulation by simulations, which still remains underdeveloped at present. In this paper we introduce a tactical-level simulation system decision-making, can simulate between formations, has multiple application modes, namely Man-Man, Man-Machine, Machine-Machine. This collect diversified fine-grid data, provide support training of air-combat...

10.1109/ihmsc49165.2020.10102 article EN 2020-08-01

In emergency search and rescue scenarios, the quick location of trapped people is essential. However, disasters can render Global Positioning System (GPS) unusable. Unmanned aerial vehicles (UAVs) with localization devices serve as mobile anchors due to their agility high line-of-sight (LoS) probability. Nonetheless, number available UAVs during initial stages disaster relief limited, innovative methods are needed quickly plan UAV trajectories locate non-uniformly distributed dynamic targets...

10.48550/arxiv.2401.07256 preprint EN other-oa arXiv (Cornell University) 2024-01-01

Aligning large language models (LLM) with human preference plays a key role in building modern generative and can be achieved by reinforcement learning from feedback (RLHF). Despite their superior performance, current RLHF approaches often require amount of human-labelled data, which is expensive to collect. In this paper, inspired the success active learning, we address problem proposing query-efficient methods. We first formalize alignment as contextual dueling bandit design an...

10.48550/arxiv.2402.09401 preprint EN arXiv (Cornell University) 2024-02-14

We study the constant regret guarantees in reinforcement learning (RL). Our objective is to design an algorithm that incurs only finite over infinite episodes with high probability. introduce algorithm, Cert-LSVI-UCB, for misspecified linear Markov decision processes (MDPs) where both transition kernel and reward function can be approximated by some up misspecification level $\zeta$. At core of Cert-LSVI-UCB innovative certified estimator, which facilitates a fine-grained concentration...

10.48550/arxiv.2404.10745 preprint EN arXiv (Cornell University) 2024-04-16

Learning from human feedback plays an important role in aligning generative models, such as large language models (LLM). However, the effectiveness of this approach can be influenced by adversaries, who may intentionally provide misleading preferences to manipulate output undesirable or harmful direction. To tackle challenge, we study a specific model within problem domain--contextual dueling bandits with adversarial feedback, where true preference label flipped adversary. We propose...

10.48550/arxiv.2404.10776 preprint EN arXiv (Cornell University) 2024-04-16

10.1109/infocomwkshps61880.2024.10620725 article EN IEEE INFOCOM 2022 - IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS) 2024-05-20

Although Convolutional Neural Networks have significantly improved the development of SAR image super-resolution (SR) technology in recent years, it is a very challenging problem to reconstruct with large-scale factors, such as ×4 and ×8 due limited available information from low-resolution image. The co-registered high-resolution optical has been successfully applied enhance quality its discriminative characteristics. Compared single-frame SR reconstruction technology, image-guided better...

10.1080/01431161.2024.2408039 article EN International Journal of Remote Sensing 2024-10-07

We study multi-agent reinforcement learning in the setting of episodic Markov decision processes, where multiple agents cooperate via communication through a central server. propose provably efficient algorithm based on value iteration that enable asynchronous while ensuring advantage cooperation with low overhead. With linear function approximation, we prove our enjoys an $\tilde{\mathcal{O}}(d^{3/2}H^2\sqrt{K})$ regret $\tilde{\mathcal{O}}(dHM^2)$ complexity, $d$ is feature dimension, $H$...

10.48550/arxiv.2305.06446 preprint EN other-oa arXiv (Cornell University) 2023-01-01

This study tackles the challenges of adversarial corruption in model-based reinforcement learning (RL), where transition dynamics can be corrupted by an adversary. Existing studies on corruption-robust RL mostly focus setting model-free RL, robust least-square regression is often employed for value function estimation. However, these techniques cannot directly applied to RL. In this paper, we and take maximum likelihood estimation (MLE) approach learn model. Our work encompasses both online...

10.48550/arxiv.2402.08991 preprint EN arXiv (Cornell University) 2024-02-14

We study reinforcement learning (RL) with linear function approximation. For episodic time-inhomogeneous Markov decision processes (linear MDPs) whose transition probability can be parameterized as a of given feature mapping, we propose the first computationally efficient algorithm that achieves nearly minimax optimal regret $\tilde O(d\sqrt{H^3K})$, where $d$ is dimension $H$ planning horizon, and $K$ number episodes. Our based on weighted regression scheme carefully designed weight, which...

10.48550/arxiv.2212.06132 preprint EN other-oa arXiv (Cornell University) 2022-01-01

We study the linear contextual bandit problem in presence of adversarial corruption, where reward at each round is corrupted by an adversary, and corruption level (i.e., sum magnitudes over horizon) $C\geq 0$. The best-known algorithms this setting are limited that they either computationally inefficient or require a strong assumption on their regret least $C$ times worse than without corruption. In paper, to overcome these limitations, we propose new algorithm based principle optimism face...

10.48550/arxiv.2205.06811 preprint EN other-oa arXiv (Cornell University) 2022-01-01

Recently, several studies (Zhou et al., 2021a; Zhang 2021b; Kim 2021; Zhou and Gu, 2022) have provided variance-dependent regret bounds for linear contextual bandits, which interpolates the worst-case regime deterministic reward regime. However, these algorithms are either computationally intractable or unable to handle unknown variance of noise. In this paper, we present a novel solution open problem by proposing first efficient algorithm bandits with heteroscedastic Our is adaptive noise...

10.48550/arxiv.2302.10371 preprint EN other-oa arXiv (Cornell University) 2023-01-01

We study linear contextual bandits in the misspecified setting, where expected reward function can be approximated by a class up to bounded misspecification level $\zeta>0$. propose an algorithm based on novel data selection scheme, which only selects vectors with large uncertainty for online regression. show that, when $\zeta$ is dominated $\tilde O (\Delta / \sqrt{d})$ $\Delta$ being minimal sub-optimality gap and $d$ dimension of vectors, our enjoys same gap-dependent regret bound...

10.48550/arxiv.2303.09390 preprint EN other-oa arXiv (Cornell University) 2023-01-01
Coming Soon ...