- Sparse and Compressive Sensing Techniques
- Reinforcement Learning in Robotics
- Image and Signal Denoising Methods
- Face and Expression Recognition
- Multimodal Machine Learning Applications
- Adversarial Robustness in Machine Learning
- Distributed Sensor Networks and Detection Algorithms
- Human Pose and Action Recognition
- Robot Manipulation and Learning
- Machine Learning and Algorithms
- Advanced Optical Imaging Technologies
- Age of Information Optimization
- Tensor decomposition and applications
- Blind Source Separation Techniques
- Advanced Vision and Imaging
- Image and Video Quality Assessment
- Advanced Wireless Network Optimization
- Numerical methods in inverse problems
- Microwave Imaging and Scattering Analysis
- Natural Language Processing Techniques
- Robotic Mechanisms and Dynamics
- Domain Adaptation and Few-Shot Learning
- Advanced Bandit Algorithms Research
- Video Surveillance and Tracking Methods
- Human Motion and Animation
Toshiba (Japan)
2024
University of Pennsylvania
2023
Nanjing University of Aeronautics and Astronautics
2023
Suzhou Research Institute
2023
Guangzhou Vocational College of Science and Technology
2023
Tsinghua University
2017-2022
Center for Life Sciences
2021-2022
Princeton University
2020
Peking University
2020
Kwangwoon University
2009-2012
We propose Unicoder-VL, a universal encoder that aims to learn joint representations of vision and language in pre-training manner. Borrow ideas from cross-lingual pre-trained models, such as XLM (Lample Conneau 2019) Unicoder (Huang et al. 2019), both visual linguistic contents are fed into multi-layer Transformer (Vaswani 2017) for the cross-modal pre-training, where three tasks employed, including Masked Language Modeling(MLM), Object Classification(MOC) Visual-linguistic Matching(VLM)....
We propose Unicoder-VL, a universal encoder that aims to learn joint representations of vision and language in pre-training manner. Borrow ideas from cross-lingual pre-trained models, such as XLM Unicoder, both visual linguistic contents are fed into multi-layer Transformer for the cross-modal pre-training, where three tasks employed, including Masked Language Modeling (MLM), Object Classification (MOC) Visual-linguistic Matching (VLM). The first two context-aware input tokens based on...
Abstract We study a spectral initialization method that serves key role in recent work on estimating signals non-convex settings. Previous analysis of this focuses the phase retrieval problem and provides only performance bounds. In paper, we consider arbitrary generalized linear sensing models present precise asymptotic characterization high-dimensional limit. Our also reveals transition phenomenon depends ratio between number samples signal dimension. When is below minimum threshold,...
This paper investigates a problem of broad practical interest, namely, the reconstruction large-dimensional low-rank tensor from highly incomplete and randomly corrupted observations its entries. Although number papers have been dedicated to this completion problem, prior algorithms either are computationally too expensive for large-scale applications or come with suboptimal statistical performance. Motivated by this, we propose fast two-stage nonconvex algorithm—a gradient method following...
This paper is concerned with the sample efficiency of reinforcement learning, assuming access to a generative model (or simulator). We first consider $\gamma$-discounted infinite-horizon Markov decision processes (MDPs) state space $\mathcal{S}$ and action $\mathcal{A}$. Despite number prior works tackling this problem, complete picture trade-offs between complexity statistical accuracy yet be determined. In particular, all results suffer from severe size barrier, in sense that their claimed...
This paper is concerned with estimating the column space of an unknown low-rank matrix A⋆∈Rd1×d2, given noisy and partial observations its entries. There no shortage scenarios where observations—while being too to support faithful recovery entire matrix—still convey sufficient information enable reliable estimation interest. particularly evident crucial for highly unbalanced case dimension d2 far exceeds row d1, which focal point current paper. We investigate efficient spectral method,...
This paper is concerned with the asynchronous form of Q-learning, which applies a stochastic approximation scheme to Markovian data samples. Motivated by recent advances in offline reinforcement learning, we develop an algorithmic framework that incorporates principle pessimism into penalizes infrequently-visited state-action pairs based on suitable lower confidence bounds (LCBs). leads to, among other things, improved sample efficiency and enhanced adaptivity presence near-expert data. Our...
Sparse subspace clustering (SSC) is a state-of-the-art method for high-dimensional data points lying in union of low-dimensional subspaces. However, while ℓ <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</sub> optimization-based SSC algorithms suffer from high computational complexity, other variants SSC, such as orthogonal-matching-pursuit-based (OMP-SSC), lose accuracy pursuit improving time efficiency. In this letter, we propose novel...
We study a simple spectral method that serves as key ingredient in growing line of work using efficient iterative algorithms for estimating signals nonconvex settings. Unlike previous work, which focuses on the phase retrieval setting and provides only bounds performance, we consider arbitrary generalized linear sensing models provide an exact characterization performance high-dimensional regime. Our analysis reveals transition phenomenon depends sampling ratio. When ratio is below critical...
We present a method for inferring diverse 3D models of human-object interactions from images. Reasoning about how humans interact with objects in complex scenes single 2D image is challenging task given ambiguities arising the loss information through projection. In addition, modeling requires generalization ability towards object categories and interaction types. propose an action-conditioned that allows us to infer arrangements without supervision on contact regions or scene geometry. Our...
Subspace clustering (SC) refers to the problem of unlabeled high-dimensional data into a union low-dimensional linear subspaces. In many practical scenarios, one may have access only compressed due constraints measurement or computation. this paper, based on recently proposed restricted isometric property Gaussian random projection for subspaces, we propose general framework analyzing performance various subspace algorithms when applied data. Our captures connection between problems (CSC)...
Mobile video streaming occupies three-quarters of today's cellular network traffic. The quality mobile videos becomes increasingly important for providers to attract more users. For example, they invest in bandwidth resources and conduct adaptive bitrate techniques improve quality. Prior (ABR) algorithms perform well under given throughput traces on broadband WiFi networks. They may poorly due the high dynamics To study properties networks, we collect 4G over four months two large cities,...
In this paper, we approach the challenging problem of motion planning for knot tying. We propose a hierarchical in which top layer produces topological plan and bottom translates into continuous robot motion. The decomposes knotting task sequences abstract actions based on theory. each these trajectories through learned primitives. To adapt action to specific rope geometry, primitives take observed configuration as input. train by imitating human demonstrations reinforcement learning...
Dimension reduction plays an essential role when decreasing the complexity of solving large-scale problems. The well-known Johnson-Lindenstrauss (JL) Lemma and Restricted Isometry Property (RIP) admit use random projection to reduce dimension while keeping Euclidean distance, which leads boom sparsity related signal processing. Recently, successful applications sparse models in computer vision machine learning have increasingly hinted that underlying structure high dimensional data looks...
An OMP-like covariance-assisted matching pursuit (CAMP) method has recently been proposed. Given a prior knowledge of the covariance and mean sparse coefficients, CAMP balances least squares estimator by leveraging Gauss–Markov theorem. In this letter, we study performance in framework restricted isometry property (RIP). It is shown that under some conditions on RIP minimum magnitude nonzero elements signal, with level <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML"...
Restricted Isometry Property (RIP) is of fundamental importance in the theory compressed sensing and forms base many exact robust recovery guarantees this field. Quantitative description RIP involves bounding so-called constants measurement matrices. In respect, it noteworthy that most results literature concerning are upper bounds constants, which can be interpreted as theoretical guarantee successful sparse recovery. On contrary, land lower for remains uncultivated except some numerical...
Dimensionality reduction is in demand to reduce the complexity of solving large-scale problems with data lying latent low-dimensional structures machine learning and computer vision. Motivated by such need, this work we study Restricted Isometry Property (RIP) Gaussian random projections for subspaces RN, rigorously prove that projection Frobenius norm distance between any two spanned projected Rn (n<N) remain almost same as original probability no less than 1−e−O(n). Previously well-known...
The softmax policy gradient (PG) method, which performs ascent under parameterization, is arguably one of the de facto implementations optimization in modern reinforcement learning. For $\gamma$-discounted infinite-horizon tabular Markov decision processes (MDPs), remarkable progress has recently been achieved towards establishing global convergence PG methods finding a near-optimal policy. However, prior results fall short delineating clear dependencies rates on salient parameters such as...
A new approach to effectively extract depth data of three-dimensional (3D) objects in space using elemental images picked-up from 3D and their computationally reconstructed plane object (POIs) is proposed. For this approach, two image concepts are introduced: one the mapped (MEI) defined as an inversely flipped magnified version other area (RAI) a part POI segmented with exactly overlapped MEI. Then, can be extracted through correlations between MEIs RAIs along output plane. That is, map...
Asynchronous Q-learning aims to learn the optimal action-value function (or Q-function) of a Markov decision process (MDP), based on single trajectory Markovian samples induced by behavior policy. Focusing $\gamma$-discounted MDP with state space $\mathcal{S}$ and action $\mathcal{A}$, we demonstrate that $\ell_{\infty}$-based sample complexity classical asynchronous --- namely, number needed yield an entrywise $\varepsilon$-accurate estimate Q-function is at most order...