- Video Surveillance and Tracking Methods
- Advanced Image and Video Retrieval Techniques
- Human Pose and Action Recognition
- Advanced Neural Network Applications
- Multimodal Machine Learning Applications
- Image Retrieval and Classification Techniques
- Gait Recognition and Analysis
- Speech and Audio Processing
- Music and Audio Processing
- Face recognition and analysis
- Speech Recognition and Synthesis
- Neuroscience and Neuropharmacology Research
- Domain Adaptation and Few-Shot Learning
- Video Analysis and Summarization
- Neurotransmitter Receptor Influence on Behavior
- Visual Attention and Saliency Detection
- Robotics and Sensor-Based Localization
- Anomaly Detection Techniques and Applications
- Photoreceptor and optogenetics research
- Neural dynamics and brain function
- Advanced Memory and Neural Computing
- Receptor Mechanisms and Signaling
- Machine Learning and ELM
- Face and Expression Recognition
- Emotion and Mood Recognition
Alibaba Group (China)
2022-2025
Peking University
2016-2025
National Institute on Drug Abuse
2014-2024
Ningde Normal University
2012-2024
Alibaba Group (United States)
2019-2024
Peng Cheng Laboratory
2022-2023
Tianjin University
2023
National Institutes of Health
2015-2023
Northwestern Polytechnical University
2023
King University
2019-2021
Although the performance of person Re-Identification (ReID) has been significantly boosted, many challenging issues in real scenarios have not fully investigated, e.g., complex scenes and lighting variations, viewpoint pose changes, large number identities a camera network. To facilitate research towards conquering those issues, this paper contributes new dataset called MSMT171 with important features, 1) raw videos are taken by an 15-camera network deployed both indoor outdoor scenes, 2)...
Feature extraction and matching are two crucial components in person Re-Identification (ReID). The large pose deformations the complex view variations exhibited by captured images significantly increase difficulty of learning features from images. To overcome these difficulties, this work we propose a Pose-driven Deep Convolutional (PDC) model to learn improved feature models end end. Our deep architecture explicitly leverages human part cues alleviate robust representations both global...
The huge variance of human pose and the misalignment detected images significantly increase difficulty person Re-Identification (Re-ID). Moreover, efficient Re-ID systems are required to cope with massive visual data being produced by video surveillance systems. Targeting solve these problems, this work proposes a Global-Local-Alignment Descriptor (GLAD) an indexing retrieval framework, respectively. GLAD explicitly leverages local global cues in body generate discriminative robust...
Speech emotion recognition is challenging because of the affective gap between subjective emotions and low-level features. Integrating multilevel feature learning model training, deep convolutional neural networks (DCNN) has exhibited remarkable success in bridging semantic visual tasks like image classification, object detection. This paper explores how to utilize a DCNN bridge speech signals. To this end, we first extract three channels log Mel-spectrograms (static, delta, delta delta)...
The challenge of unsupervised person re-identification (ReID) lies in learning discriminative features without true labels. This paper formulates ReID as a multi-label classification task to progressively seek Our method starts by assigning each image with single-class label, then evolves leveraging the updated model for label prediction. prediction comprises similarity computation and cycle consistency ensure quality predicted To boost training efficiency classification, we further propose...
Emotion recognition is challenging due to the emotional gap between emotions and audio-visual features. Motivated by powerful feature learning ability of deep neural networks, this paper proposes bridge using a hybrid model, which first produces segment features with Convolutional Neural Networks (CNNs) 3D-CNN, then fuses in Deep Belief (DBNs). The proposed method trained two stages. First, CNN 3D-CNN models pre-trained on corresponding large-scale image video classification tasks are...
Exploiting multi-scale representations is critical to improve edge detection for objects at different scales. To extract edges dramatically scales, we propose a Bi-Directional Cascade Network (BDCN) structure, where an individual layer supervised by labeled its specific scale, rather than directly applying the same supervision all CNN outputs. Furthermore, enrich learned BDCN, introduce Scale Enhancement Module (SEM) which utilizes dilated convolution generate features, instead of using...
Learning discriminative representations for unseen person images is critical Re-Identification (ReID). Most of current approaches learn deep in classification tasks, which essentially minimize the empirical risk on training set. As shown our experiments, such commonly focus several body parts to set, rather than entire human body. Inspired by structural minimization principle SVM, we revise traditional representation learning procedure both and risk. The evaluated proposed part loss,...
Previous works on vehicle Re-ID mainly focus extracting global features and learning distance metrics. Because some vehicles commonly share same model maker, it is hard to distinguish them based their appearances. Compared with the appearance, local regions such as decorations inspection stickers attached windshield, may be more distinctive for Re-ID. To embed detailed visual cues in those regions, we propose a Region-Aware deep Model (RAM). Specifically, addition features, RAM also extracts...
This paper proposes the Global-Local Temporal Representation (GLTR) to exploit multi-scale temporal cues in video sequences for person Re-Identification (ReID). GLTR is constructed by first modeling short-term among adjacent frames, then capturing long-term relations inconsecutive frames. Specifically, are modeled parallel dilated convolutions with different dilation rates represent motion and appearance of pedestrian. The captured a self-attention model alleviate occlusions noises...
We propose a novel Multi-Task Learning with Low Rank Attribute Embedding (MTL-LORAE) framework for person re-identification. Re-identifications from multiple cameras are regarded as related tasks to exploit shared information improve re-identification accuracy. Both low level features and semantic/data-driven attributes utilized. Since generally correlated, we introduce rank attribute embedding into the MTL formulation embed original binary continuous space, where incorrect incomplete...
Most of unsupervised person Re-Identification (Re-ID) works produce pseudo-labels by measuring the feature similarity without considering distribution discrepancy among cameras, leading to degraded accuracy in label computation across cameras. This paper targets address this challenge studying a novel intra-inter camera for pseudo-label generation. We decompose sample into two stage, i.e., intra-camera and inter-camera computations, respectively. The directly leverages CNN features within...
Dorsal raphe (DR) serotonin neurons provide a major input to the ventral tegmental area (VTA). Here, we show that DR transporter (SERT) establish both asymmetric and symmetric synapses on VTA dopamine neurons, but most of these are asymmetric. Moreover, DR-SERT terminals making coexpress vesicular glutamate 3 (VGluT3; for accumulation its synaptic release), suggesting excitatory nature synapses. photoactivation fibers promotes conditioned place preference, elicits currents mesoaccumbens...
This paper proposes a two-stream convolution network to extract spatial and temporal cues for video based person ReIdentification (ReID). A stream in this is constructed by inserting several Multi-scale 3D (M3D) layers into 2D CNN network. The resulting M3D introduces fraction of parameters the CNN, but gains ability multi-scale feature learning. With compact architecture, also more efficient easier optimize than existing networks. further involves Residual Attention Layers (RAL) refine...
In person re-identification (re-ID), extracting part-level features from images has been verified to be crucial offer fine-grained information. Most of the existing CNN-based methods only locate human parts coarsely, or rely on pretrained parsing models and fail in locating identifiable nonhuman (e.g., knapsack). this article, we introduce an alignment scheme transformer architecture for first time propose auto-aligned (AAformer) automatically both ones at patch level. We "Part tokens...
The Bag-of-visual Words (BoW) image representation has been applied for various problems in the fields of multimedia and computer vision. basic idea is to represent images as visual documents composed repeatable distinctive elements, which are comparable words texts. However, massive experiments show that commonly used not expressive text words, desirable because it hinders their effectiveness applications. In this paper, Descriptive Visual (DVWs) Phrases (DVPs) proposed correspondences...
Despite decades of research on neurobiological mechanisms psychostimulant addiction, the only effective treatment for many addicts is contingency management, a behavioral that uses alternative non-drug reward to maintain abstinence. However, when management discontinued, most relapse drug use. The brain underlying after cessation are largely unknown, and, until recently, an animal model this human condition did not exist. Here we used novel rat model, in which availability mutually exclusive...
The huge variance of human pose and the misalign-ment detected images significantly increase difficulty pedestrian image matching in person Re-Identification (Re-ID). Moreover, massive visual data being produced by surveillance video cameras requires highly efficient Re-ID systems. Targeting to solve first problem, this work proposes a robust discriminative descriptor, namely, Global-Local-Alignment Descriptor (GLAD). For second treats as retrieval an indexing framework. GLAD explicitly...