Gen Li

ORCID: 0009-0005-9782-7649
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Sparse and Compressive Sensing Techniques
  • Reinforcement Learning in Robotics
  • Image and Signal Denoising Methods
  • Face and Expression Recognition
  • Multimodal Machine Learning Applications
  • Adversarial Robustness in Machine Learning
  • Distributed Sensor Networks and Detection Algorithms
  • Human Pose and Action Recognition
  • Robot Manipulation and Learning
  • Machine Learning and Algorithms
  • Advanced Optical Imaging Technologies
  • Age of Information Optimization
  • Tensor decomposition and applications
  • Blind Source Separation Techniques
  • Advanced Vision and Imaging
  • Image and Video Quality Assessment
  • Advanced Wireless Network Optimization
  • Numerical methods in inverse problems
  • Microwave Imaging and Scattering Analysis
  • Natural Language Processing Techniques
  • Robotic Mechanisms and Dynamics
  • Domain Adaptation and Few-Shot Learning
  • Advanced Bandit Algorithms Research
  • Video Surveillance and Tracking Methods
  • Human Motion and Animation

Toshiba (Japan)
2024

University of Pennsylvania
2023

Nanjing University of Aeronautics and Astronautics
2023

Suzhou Research Institute
2023

Guangzhou Vocational College of Science and Technology
2023

Tsinghua University
2017-2022

Center for Life Sciences
2021-2022

Princeton University
2020

Peking University
2020

Kwangwoon University
2009-2012

We propose Unicoder-VL, a universal encoder that aims to learn joint representations of vision and language in pre-training manner. Borrow ideas from cross-lingual pre-trained models, such as XLM (Lample Conneau 2019) Unicoder (Huang et al. 2019), both visual linguistic contents are fed into multi-layer Transformer (Vaswani 2017) for the cross-modal pre-training, where three tasks employed, including Masked Language Modeling(MLM), Object Classification(MOC) Visual-linguistic Matching(VLM)....

10.1609/aaai.v34i07.6795 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2020-04-03

We propose Unicoder-VL, a universal encoder that aims to learn joint representations of vision and language in pre-training manner. Borrow ideas from cross-lingual pre-trained models, such as XLM Unicoder, both visual linguistic contents are fed into multi-layer Transformer for the cross-modal pre-training, where three tasks employed, including Masked Language Modeling (MLM), Object Classification (MOC) Visual-linguistic Matching (VLM). The first two context-aware input tokens based on...

10.48550/arxiv.1908.06066 preprint EN other-oa arXiv (Cornell University) 2019-01-01

Abstract We study a spectral initialization method that serves key role in recent work on estimating signals non-convex settings. Previous analysis of this focuses the phase retrieval problem and provides only performance bounds. In paper, we consider arbitrary generalized linear sensing models present precise asymptotic characterization high-dimensional limit. Our also reveals transition phenomenon depends ratio between number samples signal dimension. When is below minimum threshold,...

10.1093/imaiai/iaz020 article EN Information and Inference A Journal of the IMA 2019-08-17

This paper investigates a problem of broad practical interest, namely, the reconstruction large-dimensional low-rank tensor from highly incomplete and randomly corrupted observations its entries. Although number papers have been dedicated to this completion problem, prior algorithms either are computationally too expensive for large-scale applications or come with suboptimal statistical performance. Motivated by this, we propose fast two-stage nonconvex algorithm—a gradient method following...

10.1287/opre.2021.2106 article EN Operations Research 2021-06-03

This paper is concerned with the sample efficiency of reinforcement learning, assuming access to a generative model (or simulator). We first consider $\gamma$-discounted infinite-horizon Markov decision processes (MDPs) state space $\mathcal{S}$ and action $\mathcal{A}$. Despite number prior works tackling this problem, complete picture trade-offs between complexity statistical accuracy yet be determined. In particular, all results suffer from severe size barrier, in sense that their claimed...

10.48550/arxiv.2005.12900 preprint EN cc-by arXiv (Cornell University) 2020-01-01

This paper is concerned with estimating the column space of an unknown low-rank matrix A⋆∈Rd1×d2, given noisy and partial observations its entries. There no shortage scenarios where observations—while being too to support faithful recovery entire matrix—still convey sufficient information enable reliable estimation interest. particularly evident crucial for highly unbalanced case dimension d2 far exceeds row d1, which focal point current paper. We investigate efficient spectral method,...

10.1214/20-aos1986 article EN The Annals of Statistics 2021-04-01

10.1109/cvpr52733.2024.01374 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024-06-16

This paper is concerned with the asynchronous form of Q-learning, which applies a stochastic approximation scheme to Markovian data samples. Motivated by recent advances in offline reinforcement learning, we develop an algorithmic framework that incorporates principle pessimism into penalizes infrequently-visited state-action pairs based on suitable lower confidence bounds (LCBs). leads to, among other things, improved sample efficiency and enhanced adaptivity presence near-expert data. Our...

10.1109/tit.2023.3299840 article EN cc-by-nc-nd IEEE Transactions on Information Theory 2023-07-28

Sparse subspace clustering (SSC) is a state-of-the-art method for high-dimensional data points lying in union of low-dimensional subspaces. However, while ℓ <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</sub> optimization-based SSC algorithms suffer from high computational complexity, other variants SSC, such as orthogonal-matching-pursuit-based (OMP-SSC), lose accuracy pursuit improving time efficiency. In this letter, we propose novel...

10.1109/lsp.2017.2741509 article EN IEEE Signal Processing Letters 2017-08-29

We study a simple spectral method that serves as key ingredient in growing line of work using efficient iterative algorithms for estimating signals nonconvex settings. Unlike previous work, which focuses on the phase retrieval setting and provides only bounds performance, we consider arbitrary generalized linear sensing models provide an exact characterization performance high-dimensional regime. Our analysis reveals transition phenomenon depends sampling ratio. When ratio is below critical...

10.1109/isit.2017.8007083 article EN 2022 IEEE International Symposium on Information Theory (ISIT) 2017-06-01

We present a method for inferring diverse 3D models of human-object interactions from images. Reasoning about how humans interact with objects in complex scenes single 2D image is challenging task given ambiguities arising the loss information through projection. In addition, modeling requires generalization ability towards object categories and interaction types. propose an action-conditioned that allows us to infer arrangements without supervision on contact regions or scene geometry. Our...

10.1109/3dv57658.2022.00047 article EN 2021 International Conference on 3D Vision (3DV) 2022-09-01

Subspace clustering (SC) refers to the problem of unlabeled high-dimensional data into a union low-dimensional linear subspaces. In many practical scenarios, one may have access only compressed due constraints measurement or computation. this paper, based on recently proposed restricted isometric property Gaussian random projection for subspaces, we propose general framework analyzing performance various subspace algorithms when applied data. Our captures connection between problems (CSC)...

10.1109/jstsp.2018.2879743 article EN IEEE Journal of Selected Topics in Signal Processing 2018-11-05

Mobile video streaming occupies three-quarters of today's cellular network traffic. The quality mobile videos becomes increasingly important for providers to attract more users. For example, they invest in bandwidth resources and conduct adaptive bitrate techniques improve quality. Prior (ABR) algorithms perform well under given throughput traces on broadband WiFi networks. They may poorly due the high dynamics To study properties networks, we collect 4G over four months two large cities,...

10.1109/tmc.2020.3036707 article EN IEEE Transactions on Mobile Computing 2020-11-09

In this paper, we approach the challenging problem of motion planning for knot tying. We propose a hierarchical in which top layer produces topological plan and bottom translates into continuous robot motion. The decomposes knotting task sequences abstract actions based on theory. each these trajectories through learned primitives. To adapt action to specific rope geometry, primitives take observed configuration as input. train by imitating human demonstrations reinforcement learning...

10.1109/iros45743.2020.9341330 article EN 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2020-10-24

Dimension reduction plays an essential role when decreasing the complexity of solving large-scale problems. The well-known Johnson-Lindenstrauss (JL) Lemma and Restricted Isometry Property (RIP) admit use random projection to reduce dimension while keeping Euclidean distance, which leads boom sparsity related signal processing. Recently, successful applications sparse models in computer vision machine learning have increasingly hinted that underlying structure high dimensional data looks...

10.1109/icassp.2017.7952899 article EN 2017-03-01

An OMP-like covariance-assisted matching pursuit (CAMP) method has recently been proposed. Given a prior knowledge of the covariance and mean sparse coefficients, CAMP balances least squares estimator by leveraging Gauss–Markov theorem. In this letter, we study performance in framework restricted isometry property (RIP). It is shown that under some conditions on RIP minimum magnitude nonzero elements signal, with level <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML"...

10.1109/lsp.2018.2816573 article EN IEEE Signal Processing Letters 2018-03-16

Restricted Isometry Property (RIP) is of fundamental importance in the theory compressed sensing and forms base many exact robust recovery guarantees this field. Quantitative description RIP involves bounding so-called constants measurement matrices. In respect, it noteworthy that most results literature concerning are upper bounds constants, which can be interpreted as theoretical guarantee successful sparse recovery. On contrary, land lower for remains uncultivated except some numerical...

10.1109/tsp.2020.2985848 article EN IEEE Transactions on Signal Processing 2020-01-01

Dimensionality reduction is in demand to reduce the complexity of solving large-scale problems with data lying latent low-dimensional structures machine learning and computer vision. Motivated by such need, this work we study Restricted Isometry Property (RIP) Gaussian random projections for subspaces RN, rigorously prove that projection Frobenius norm distance between any two spanned projected Rn (n<N) remain almost same as original probability no less than 1−e−O(n). Previously well-known...

10.1016/j.acha.2019.11.002 article EN cc-by-nc-nd Applied and Computational Harmonic Analysis 2019-11-12

The softmax policy gradient (PG) method, which performs ascent under parameterization, is arguably one of the de facto implementations optimization in modern reinforcement learning. For $\gamma$-discounted infinite-horizon tabular Markov decision processes (MDPs), remarkable progress has recently been achieved towards establishing global convergence PG methods finding a near-optimal policy. However, prior results fall short delineating clear dependencies rates on salient parameters such as...

10.48550/arxiv.2102.11270 preprint EN cc-by arXiv (Cornell University) 2021-01-01

A new approach to effectively extract depth data of three-dimensional (3D) objects in space using elemental images picked-up from 3D and their computationally reconstructed plane object (POIs) is proposed. For this approach, two image concepts are introduced: one the mapped (MEI) defined as an inversely flipped magnified version other area (RAI) a part POI segmented with exactly overlapped MEI. Then, can be extracted through correlations between MEIs RAIs along output plane. That is, map...

10.1143/jjap.48.042401 article EN Japanese Journal of Applied Physics 2009-04-01

Asynchronous Q-learning aims to learn the optimal action-value function (or Q-function) of a Markov decision process (MDP), based on single trajectory Markovian samples induced by behavior policy. Focusing $\gamma$-discounted MDP with state space $\mathcal{S}$ and action $\mathcal{A}$, we demonstrate that $\ell_{\infty}$-based sample complexity classical asynchronous --- namely, number needed yield an entrywise $\varepsilon$-accurate estimate Q-function is at most order...

10.48550/arxiv.2006.03041 preprint EN other-oa arXiv (Cornell University) 2020-01-01
Coming Soon ...