- Human Pose and Action Recognition
- Domain Adaptation and Few-Shot Learning
- Advanced Neural Network Applications
- Advanced Image and Video Retrieval Techniques
- Video Surveillance and Tracking Methods
- Multimodal Machine Learning Applications
- Generative Adversarial Networks and Image Synthesis
- Anomaly Detection Techniques and Applications
- Hand Gesture Recognition Systems
- Face recognition and analysis
- Visual Attention and Saliency Detection
- Image Enhancement Techniques
- Image Retrieval and Classification Techniques
- Advanced Image Processing Techniques
- Adversarial Robustness in Machine Learning
- Face and Expression Recognition
- Machine Learning and Data Classification
- Video Analysis and Summarization
- Gait Recognition and Analysis
- Service-Oriented Architecture and Web Services
- Software Testing and Debugging Techniques
- Credit Risk and Financial Regulations
- Sparse and Compressive Sensing Techniques
- Biometric Identification and Security
- Medical Image Segmentation Techniques
Beijing Institute of Technology
2024
Microsoft (Finland)
2023-2024
Purple Mountain Laboratories
2024
Microsoft Research (United Kingdom)
2022-2023
Lanzhou Jiaotong University
2023
Beijing University of Posts and Telecommunications
2023
Suzhou City University
2023
National University of Defense Technology
2017-2022
Sichuan University
2018-2022
Tianjin University
2022
Incorporating multi-scale features in fully convolutional neural networks (FCNs) has been a key element to achieving state-of-the-art performance on semantic image segmentation. One common way extract is feed multiple resized input images shared deep network and then merge the resulting for pixelwise classification. In this work, we propose an attention mechanism that learns softly weight at each pixel location. We adapt segmentation model, which jointly train with model. The proposed model...
Learning fine-grained image similarity is a challenging task. It needs to capture between-class and within-class differences. This paper proposes deep ranking model that employs learning techniques learn metric directly from images. has higher capability than models based on hand-crafted features. A novel multiscale network structure been developed describe the images effectively. An efficient triplet sampling algorithm also proposed with distributed asynchronized stochastic gradient....
ABSTRACT This paper examines the illiquidity of corporate bonds and its asset‐pricing implications. Using transactions data from 2003 to 2009, we show that in is substantial, significantly greater than what can be explained by bid–ask spreads. We establish a strong link between bond prices. In aggregate, changes market‐level explain substantial part time variation yield spreads high‐rated (AAA through A) bonds, overshadowing credit risk component. cross‐section, bond‐level measure explains...
We present an approach that exploits hierarchical Recurrent Neural Networks (RNNs) to tackle the video captioning problem, i.e., generating one or multiple sentences describe a realistic video. Our framework contains sentence generator and paragraph generator. The produces simple short describes specific interval. It both temporal-and spatial-attention mechanisms selectively focus on visual elements during generation. captures inter-sentence dependency by taking as input sentential embedding...
Existing methods on video-based action recognition are generally view-dependent, i.e., performing from the same views seen in training data. We present a novel multiview spatio-temporal and-or graph (MST-AOG) representation for cross-view recognition, is performed video an unknown and unseen view. As compositional model, MST-AOG compactly represents hierarchical combinatorial structures of actions by explicitly modeling geometry, appearance motion variations. This paper proposes effective to...
While feedforward deep convolutional neural networks (CNNs) have been a great success in computer vision, it is important to note that the human visual cortex generally contains more feedback than connections. In this paper, we will briefly introduce background of feedbacks cortex, which motivates us develop computational mechanism networks. addition inference traditional networks, loop introduced infer activation status hidden layer neurons according "goal" network, e.g., high-level...
In this paper, we present a multimodal Recurrent Neural Network (m-RNN) model for generating novel sentence descriptions to explain the content of images. It directly models probability distribution word given previous words and image. Image are generated by sampling from distribution. The consists two sub-networks: deep recurrent neural network sentences convolutional These sub-networks interact with each other in layer form whole m-RNN model. effectiveness our is validated on three...
Convolutional Neural Networks (CNNs) with Bilinear Pooling, initially in their full form and later using compact representations, have yielded impressive performance gains on a wide range of visual tasks, including fine-grained categorization, question answering, face recognition, description texture style. The key to success lies the spatially invariant modeling pairwise (2 <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">nd</sup> order) feature...
This paper presents a novel locality sensitive histogram algorithm for visual tracking. Unlike the conventional image that counts frequency of occurrences each intensity value by adding ones to corresponding bin, is computed at pixel location and floating-point added bin occurrence an value. The declines exponentially with respect distance where computed, thus every considered but those are far away can be neglected due very small weights assigned. An efficient proposed enables histograms in...
We propose a novel attention based deep learning architecture for visual question answering task (VQA). Given an image and related natural language question, VQA generates the answer question. Generating correct answers requires model's to focus on regions corresponding because different questions inquire about attributes of regions. introduce configurable convolutional neural network (ABC-CNN) learn such question-guided attention. ABC-CNN determines map image-question pair by convolving...
In this paper, we address the task of learning novel visual concepts, and their interactions with other from a few images sentence descriptions. Using linguistic context features, our method is able to efficiently hypothesize semantic meaning new words add them its word dictionary so that they can be used describe which contain these concepts. Our has an image captioning module based on [38] several improvements. particular, propose transposed weight sharing scheme, not only improves...
Fine-grained recognition is challenging due to its subtle local inter-class differences versus large intra-class variations such as poses. A key address this problem localize discriminative parts extract pose-invariant features. However, ground-truth part annotations can be expensive acquire. Moreover, it hard define for many fine-grained classes. This work introduces Fully Convolutional Attention Networks (FCANs), a reinforcement learning framework optimally glimpse regions adaptive...
Learning fine-grained image similarity is a challenging task. It needs to capture between-class and within-class differences. This paper proposes deep ranking model that employs learning techniques learn metric directly from images.It has higher capability than models based on hand-crafted features. A novel multiscale network structure been developed describe the images effectively. An efficient triplet sampling algorithm proposed with distributed asynchronized stochastic gradient. Extensive...
The labeling cost of large number bounding boxes is one the main challenges for training modern object detectors. To reduce dependence on expensive box annotations, we propose a new semi-supervised detection formulation, in which few seed level annotations and scale image are used to train detector. We adopt training-mining framework, widely weakly supervised tasks. However, mining process inherently introduces various kinds labelling noises: false negatives, positives inaccurate boundaries,...
The popular bag of words approach for action recognition is based on the classifying quantized local features density. This focuses excessively but discards all information about interactions among them. Local themselves may not be discriminative enough, combined with their contexts, they can very useful some actions. In this paper, we present a novel representation that captures contextual between interest points, density observed in each point's mutliscale spatio-temporal domain. We...
A key challenge in fine-grained recognition is how to find and represent discriminative local regions.Recent attention models are capable of learning region localizers only from category labels with reinforcement learning. However, not utilizing any explicit part information, they able accurately multiple distinctive regions.In this work, we introduce an attribute-guided localization scheme where the learned under guidance attribute descriptions.By designing a novel reward strategy, learn...
We present an approach that exploits hierarchical Recurrent Neural Networks (RNNs) to tackle the video captioning problem, i.e., generating one or multiple sentences describe a realistic video. Our framework contains sentence generator and paragraph generator. The produces simple short describes specific interval. It both temporal- spatial-attention mechanisms selectively focus on visual elements during generation. captures inter-sentence dependency by taking as input sentential embedding...
Temporal misalignment and duration variation in video actions largely influence the performance of action recognition, but it is very difficult to specify effective temporal alignment on sequences. To address this challenge, paper proposes a novel discriminative learning-based method, called maximum margin warping (MMTW), align two sequences measure their matching score. Based latent structure SVM formulation, proposed MMTW method able learn phantom template represent an class for...
Unsupervised domain adaptive person re-identification (ReID) has been extensively investigated to mitigate the adverse effects of gaps. Those works assume target data can be accessible all at once. However, for real-world streaming data, this hinders timely adaptation changing statistics and sufficient exploitation increasing samples. In paper, address more practical scenarios, we propose a new task, Lifelong Un-supervised Domain Adaptive (LUDA) ReID. This is challenging because it requires...
Incorporating multi-scale features in fully convolutional neural networks (FCNs) has been a key element to achieving state-of-the-art performance on semantic image segmentation. One common way extract is feed multiple resized input images shared deep network and then merge the resulting for pixelwise classification. In this work, we propose an attention mechanism that learns softly weight at each pixel location. We adapt segmentation model, which jointly train with model. The proposed model...