Jingen Liu

ORCID: 0000-0003-3133-3644
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Human Pose and Action Recognition
  • Video Analysis and Summarization
  • Multimodal Machine Learning Applications
  • Advanced Image and Video Retrieval Techniques
  • Video Surveillance and Tracking Methods
  • Anomaly Detection Techniques and Applications
  • Image Retrieval and Classification Techniques
  • Advanced Vision and Imaging
  • Generative Adversarial Networks and Image Synthesis
  • Advanced Neural Network Applications
  • Face recognition and analysis
  • Domain Adaptation and Few-Shot Learning
  • Gait Recognition and Analysis
  • Visual Attention and Saliency Detection
  • Robotics and Sensor-Based Localization
  • Venous Thromboembolism Diagnosis and Management
  • Machine Learning and Data Classification
  • Computer Graphics and Visualization Techniques
  • Industrial Vision Systems and Defect Detection
  • Acute Ischemic Stroke Management
  • 3D Shape Modeling and Analysis
  • Adversarial Robustness in Machine Learning
  • Infrared Target Detection Methodologies
  • Natural Language Processing Techniques
  • Image Enhancement Techniques

Amazon (United States)
2024

Walt Disney (United States)
2023

JDA Software (United States)
2022

JDSU (United States)
2020-2022

Wuhan University of Technology
2009-2020

SRI International
2012-2017

Rutgers Sexual and Reproductive Health and Rights
2015

Fujian Blood Center
2014

Princeton University
2013

University of Michigan
2010-2011

In this paper, we present a systematic framework for recognizing realistic actions from videos “in the wild.” Such unconstrained are abundant in personal collections as well on web. Recognizing action such has not been addressed extensively, primarily due to tremendous variations that result camera motion, background clutter, changes object appearance, and scale, etc. The main challenge is how extract reliable informative features videos. We both motion static Since raw of types dense yet...

10.1109/cvpr.2009.5206744 article EN 2009 IEEE Conference on Computer Vision and Pattern Recognition 2009-06-01

In this paper we explore the idea of using high-level semantic concepts, also called attributes, to represent human actions from videos and argue that attributes enable construction more descriptive models for action recognition. We propose a unified framework wherein manually specified are: i) selected in discriminative fashion so as account intra-class variability; ii) coherently integrated with data-driven make attribute set descriptive. Data-driven are automatically inferred training...

10.1109/cvpr.2011.5995353 article EN 2011-06-01

In this paper, we present a novel approach for automatically learning compact and yet discriminative appearance-based human action model. A video sequence is represented by bag of spatiotemporal features called video-words quantizing the extracted 3D interest points (cuboids) from videos. Our proposed able to discover optimal number video-word clusters utilizing Maximization Mutual Information(MMI). Unlike k-means algorithm, which typically used cluster cuboids into words based on their...

10.1109/cvpr.2008.4587723 article EN 2009 IEEE Conference on Computer Vision and Pattern Recognition 2008-06-01

In this paper, we present a novel approach to recognizing human actions from different views by view knowledge transfer. An action is originally modelled as bag of visual-words (BoVW), which sensitive changes. We argue that, opposed visual words, there exist some higher level features can be shared across and enable the connection models for views. To discover these features, use bipartite graph model two view-dependent vocabularies, then apply partitioning co-cluster vocabularies into...

10.1109/cvpr.2011.5995729 article EN 2011-06-01

In this paper, we propose a Customizable Architecture Search (CAS) approach to automatically generate network architecture for semantic image segmentation. The generated consists of sequence stacked computation cells. A cell is represented as directed acyclic graph, in which each node hidden representation (i.e., feature map) and edge associated with an operation (e.g., convolution pooling), transforms data new layer. During the training, CAS algorithm explores search space optimized build...

10.1109/cvpr.2019.01191 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019-06-01

In this paper, we propose a framework that fuses multiple features for improved action recognition in videos. The fusion of is important recognizing actions as often single feature based representation not enough to capture the imaging variations (view-point, illumination etc.) and attributes individuals (size, age, gender etc.). Hence, use two types features: i) quantized vocabulary local spatio-temporal (ST) volumes (or cuboids), ii) spin-images, which aims shape deformation actor by...

10.1109/cvpr.2008.4587527 article EN 2009 IEEE Conference on Computer Vision and Pattern Recognition 2008-06-01

In this paper, we present a systematic framework for recognizing realistic actions from videos "in the wild". Such unconstrained are abundant in personal collections as well on Web. Recognizing action such has not been addressed extensively, primarily due to tremendous variations that result camera motion, background clutter, changes object appearance, and scale, etc. The main challenge is how extract reliable informative features videos. We both motion static Since raw of types dense yet...

10.1109/cvprw.2009.5206744 article EN 2009 IEEE Conference on Computer Vision and Pattern Recognition 2009-06-01

In this paper, we propose a novel approach for learning generic visual vocabulary. We use diffusion maps to automatically learn semantic vocabulary from abundant quantized midlevel features. Each feature is represented by the vector of pointwise mutual information (PMI). space, believe features produced similar sources must lie on certain manifold. To capture intrinsic geometric relations between features, measure their dissimilarity using distance. The underlying idea embed into...

10.1109/cvpr.2009.5206845 article EN 2009 IEEE Conference on Computer Vision and Pattern Recognition 2009-06-01

The success of deep neural networks generally requires a vast amount training data to be labeled, which is expensive and unfeasible in scale, especially for video collections. To alleviate this problem, paper, we propose 3DRotNet: fully self-supervised approach learn spatiotemporal features from unlabeled videos. A set rotations are applied all videos, pretext task defined as prediction these rotations. When accomplishing task, 3DRotNet actually trained understand the semantic concepts...

10.48550/arxiv.1811.11387 preprint EN other-oa arXiv (Cornell University) 2018-01-01

Low-level appearance as well spatio-temporal features, appropriately quantized and aggregated into Bag-of-Words (BoW) descriptors, have been shown to be effective in many detection recognition tasks. However, their effcacy for complex event unconstrained videos not systematically evaluated. In this paper, we use the NIST TRECVID Multimedia Event Detection (MED11 [1]) open source dataset, containing annotated data 15 high-level events, standardized test bed evaluating low-level features. This...

10.1109/cvpr.2012.6248114 article EN 2009 IEEE Conference on Computer Vision and Pattern Recognition 2012-06-01

Weakly supervised temporal action localization aims to detect and localize actions in untrimmed videos with only video-level labels during training. However, without frame-level annotations, it is challenging achieve completeness relieve background interference. In this paper, we present an Action Unit Memory Network (AUMN) for weakly localization, which can mitigate the above two challenges by learning unit memory bank. proposed AUMN, attention modules are designed update bank adaptively...

10.1109/cvpr46437.2021.00984 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021-06-01

The Visual Object Tracking challenge VOT2021 is the ninth annual tracker benchmarking activity organized by VOT initiative. Results of 71 trackers are presented; many state-of-the-art published at major computer vision conferences or in journals recent years. was composed four sub-challenges focusing on different tracking domains: (i) VOT-ST2021 focused short-term RGB, (ii) VOT-RT2021 "real-time" (iii) VOT-LT2021 long-term tracking, namely coping with target disappearance and reappearance...

10.1109/iccvw54120.2021.00305 article EN 2021-10-01

In this paper, we propose a novel approach for scene modeling. The proposed method is able to automatically discover the intermediate semantic concepts. We utilize Maximization of Mutual Information (MMI) co-clustering clusters concepts, which call Each concept corresponds cluster visterms in bag Vis- terms (BOV) paradigm classification. MMI co- clustering results fewer but meaningful clusters. Unlike k-means used image patches based on their appearances BOV, can group are highly correlated...

10.1109/iccv.2007.4408866 article EN 2007-01-01

Action recognition methods suffer from many drawbacks in practice, which include (1)the inability to cope with incremental problems; (2)the requirement of an intensive training stage obtain good performance; (3) the recognize simultaneous multiple actions; and (4) difficulty performing frame by frame. In order overcome all these using a single method, we propose novel framework involving feature-tree index large scale motion features Sphere/Rectangle-tree (SR-tree). The consists following...

10.1109/iccv.2009.5459374 article EN 2009-09-01

We propose to use action, scene and object concepts as semantic attributes for classification of video events in InTheWild content, such YouTube videos. model using a variety complementary attribute features developed concept space. Our contribution is systematically demonstrate the advantages this concept-based event representation (CBER) applications understanding. Specifically, CBER has better generalization capability, which enables recognize with few training examples. In addition,...

10.1109/wacv.2013.6475038 article EN 2013-01-01

Contrastive learning, which aims at minimizing the distance between positive pairs while maximizing that of negative ones, has been widely and successfully applied in unsupervised feature where design (pos/neg) is one its keys. In this paper, we attempt to devise a feature-level data manipulation, differing from augmentation, enhance generic contrastive self-supervised learning. To end, first visualization scheme for pos/neg score <sup xmlns:mml="http://www.w3.org/1998/Math/MathML"...

10.1109/iccv48922.2021.01014 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2021-10-01

In this paper, we propose a simple yet effective video super-resolution method that aims at generating highfidelity high-resolution (HR) videos from low-resolution (LR) ones. Previous methods predominantly leverage temporal neighbor frames to assist the of current frame. Those achieve limited performance as they suffer challenges in spatial frame alignment and lack useful information similar LR frames. contrast, devise cross-frame non-local attention mechanism allows superresolution without...

10.1109/cvpr52688.2022.01731 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022-06-01

We propose a novel method for automatically discovering key motion patterns happening in scene by observing the an extended period. Our does not rely on object detection and tracking, uses low level features, direction of pixel wise optical flow. first divide video into clips estimate sequence flow-fields. Each moving is quantized based its location direction. This essentially bag words representation clips. Once obtained, we proceed to screening stage, using measure called `conditional...

10.1109/iccv.2009.5459376 article EN 2009-09-01

We present a method whereby an embodied agent using visual perception can efficiently create model of local indoor environment from its experience moving within it. Our uses motion cues to compute likelihoods structure hypotheses, based on simple, generic geometric knowledge about points, lines, planes, and motion. single-image analysis, not attempt identify single accurate model, but propose set plausible hypotheses the initial frame. then use data subsequent frames update Bayesian...

10.1109/iccv.2011.6126233 article EN International Conference on Computer Vision 2011-11-01

Region sampling or weighting is significantly important to the success of modern region-based object detectors. Unlike some previous works, which only focus on "hard'' samples when optimizing objective function, we argue that sample should be data-dependent and task-dependent. The importance a for function optimization determined by its uncertainties both classification bounding box regression tasks. To this end, devise general loss cover most detectors with various strategies, then based it...

10.1109/cvpr42600.2020.01418 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020-06-01

10.1016/j.cviu.2011.08.010 article EN Computer Vision and Image Understanding 2011-11-08

Virtual try-on methods aim to generate images of fashion models wearing arbitrary combinations garments. This is a challenging task because the generated image must appear realistic and accurately display interaction between Prior works produce that are filled with artifacts fail capture important visual details necessary for commercial applications. We propose Outfit Visualization Net (OVNet) these (e.g. buttons, shading, textures, hemlines, interactions garments) high quality...

10.1109/cvpr46437.2021.01529 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021-06-01

We propose a novel statistical manifold modeling approach that is capable of classifying poses object categories from video sequences by simultaneously minimizing the intra-class variability and maximizing inter-pose distance. Following intuition an part based representation suitable selection process may help achieve our purpose, we formulate problem perspective treat as adjusting (parameterized pose) means "alignment" "expansion" operations. show alignment expansion are equivalent to...

10.1109/iccv.2011.6126340 article EN International Conference on Computer Vision 2011-11-01

Multimedia event detection has drawn a lot of attention in recent years. Given recognized event, this paper, we conduct pilot study the multimedia recounting problem, which answers question why video is as i.e. what evidences decision made on. In order to provide semantic adopt concept-based representation for learning discriminative model. Then, present approach that exactly recovers contribution evidence classification decision. This can be applied on any additive classifiers. The...

10.1145/2393347.2396386 article EN Proceedings of the 30th ACM International Conference on Multimedia 2012-10-29
Coming Soon ...