Heng Tao Shen

ORCID: 0000-0002-2999-2088
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Advanced Image and Video Retrieval Techniques
  • Multimodal Machine Learning Applications
  • Domain Adaptation and Few-Shot Learning
  • Image Retrieval and Classification Techniques
  • Video Analysis and Summarization
  • Human Pose and Action Recognition
  • Video Surveillance and Tracking Methods
  • Advanced Neural Network Applications
  • Data Management and Algorithms
  • Anomaly Detection Techniques and Applications
  • Topic Modeling
  • Adversarial Robustness in Machine Learning
  • Generative Adversarial Networks and Image Synthesis
  • Face and Expression Recognition
  • Advanced Vision and Imaging
  • Gait Recognition and Analysis
  • Advanced Image Processing Techniques
  • Visual Attention and Saliency Detection
  • Remote-Sensing Image Classification
  • Natural Language Processing Techniques
  • Human Mobility and Location-Based Analysis
  • Recommender Systems and Techniques
  • Face recognition and analysis
  • Web Data Mining and Analysis
  • Algorithms and Data Compression

University of Electronic Science and Technology of China
2016-2025

Tongji University
2024-2025

Peng Cheng Laboratory
2021-2024

State Key Laboratory of Quantum Optics and Quantum Optics Devices
2024

Shanxi University
2024

Beijing University of Posts and Telecommunications
2014-2023

National University of Singapore
2000-2023

Huazhong Agricultural University
2023

Hefei University of Technology
2020-2023

Chinese Academy of Sciences
2022

Recently, learning based hashing techniques have attracted broad research interests because they can support efficient storage and retrieval for high-dimensional data such as images, videos, documents, etc. However, a major difficulty of to hash lies in handling the discrete constraints imposed on pursued codes, which typically makes optimizations very challenging (NP-hard general). In this work, we propose new supervised framework, where objective is generate optimal binary codes linear...

10.1109/cvpr.2015.7298598 preprint EN 2015-06-01

Nearest neighbor search is a problem of finding the data points from database such that distances them to query point are smallest. Learning hash one major solutions this and has been widely studied recently. In paper, we present comprehensive survey learning algorithms, categorize according manners preserving similarities into: pairwise similarity preserving, multiwise implicit as well quantization, discuss their relations. We separate quantization objective function very different though...

10.1109/tpami.2017.2699960 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2017-05-02

Cross-modal retrieval aims to enable flexible experience across different modalities (e.g., texts vs. images). The core of cross-modal research is learn a common subspace where the items can be directly compared each other. In this paper, we present novel Adversarial Cross-Modal Retrieval (ACMR) method, which seeks an effective based on adversarial learning. learning implemented as interplay between two processes. first process, feature projector, tries generate modality-invariant...

10.1145/3123266.3123326 article EN Proceedings of the 30th ACM International Conference on Multimedia 2017-10-19

Recent progress in using long short-term memory (LSTM) for image captioning has motivated the exploration of their applications video captioning. By taking a as sequence features, an LSTM model is trained on video-sentence pairs and learns to associate sentence. However, most existing methods compress entire shot or frame into static representation, without considering attention mechanism which allows selecting salient features. Furthermore, approaches usually translating error, but ignore...

10.1109/tmm.2017.2729019 article EN IEEE Transactions on Multimedia 2017-07-19

In this paper, we present a new multimedia retrieval paradigm to innovate large-scale search of heterogenous data. It is able return results different media types from heterogeneous data sources, e.g., using query image retrieve relevant text documents or images sources. This utilizes the widely available sources and caters for current users' demand receiving result list simultaneously containing multiple obtain comprehensive understanding query's results. To enable inter-media retrieval,...

10.1145/2463676.2465274 article EN 2013-06-22

Compared with supervised learning for feature selection, it is much more difficult to select the discriminative features in unsupervised due lack of label information. Traditional selection algorithms usually which best preserve data distribution, e.g., manifold structure, whole set. Under assumption that class input can be predicted by a linear classifier, we incorporate analysis and l2,1-norm minimization into joint framework selection. Different from existing algorithms, our algorithm...

10.5591/978-1-57735-516-8/ijcai11-267 article EN International Joint Conference on Artificial Intelligence 2011-07-16

Clustering is a long-standing important research problem, however, remains challenging when handling large-scale image data from diverse sources. In this paper, we present novel Binary Multi-View (BMVC) framework, which can dexterously manipulate multi-view and easily scale to large data. To achieve goal, formulate BMVC by two key components: compact collaborative discrete representation learning binary clustering structure learning, in joint framework. Specifically, collaboratively encodes...

10.1109/tpami.2018.2847335 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2018-06-18

As mobile devices with positioning capabilities continue to proliferate, data management for so-called trajectory databases that capture the historical movements of populations moving objects becomes important. This paper considers querying such convoys, a convoy being group have traveled together some time. More specifically, this formalizes concept query using density-based notions, in order groups arbitrary extents and shapes. Convoy discovery is relevant real-life applications throughput...

10.14778/1453856.1453971 article EN Proceedings of the VLDB Endowment 2008-08-01

Hashing based methods have attracted considerable attention for efficient cross-modal retrieval on large-scale multimedia data. The core problem of hashing is how to learn compact binary codes that construct the underlying correlations between heterogeneous features from different modalities. A majority recent approaches aim at learning hash functions preserve pairwise similarities defined by given class labels. However, these fail explicitly explore discriminative property labels during...

10.1109/tip.2017.2676345 article EN IEEE Transactions on Image Processing 2017-03-01

Similarity search (nearest neighbor search) is a problem of pursuing the data items whose distances to query item are smallest from large database. Various methods have been developed address this problem, and recently lot efforts devoted approximate search. In paper, we present survey on one main solutions, hashing, which has widely studied since pioneering work locality sensitive hashing. We divide hashing algorithms two categories: designs hash functions without exploring distribution...

10.48550/arxiv.1408.2927 preprint EN other-oa arXiv (Cornell University) 2014-01-01

Recent vision and learning studies show that compact hash codes can facilitate massive data processing with significantly reduced storage computation. Particularly, deep functions has greatly improved the retrieval performance, typically under semantic supervision. In contrast, current unsupervised hashing algorithms hardly achieve satisfactory performance due to either relaxed optimization or absence of similarity-sensitive objective. this work, we propose a simple yet effective framework,...

10.1109/tpami.2018.2789887 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2018-01-05

The booming industry of location-based services has accumulated a huge collection users' location trajectories driving, cycling, hiking, etc. In this work, we investigate the problem discovering Most Popular Route (MPR) between two locations by observing traveling behaviors many previous users. This new query is beneficial to travelers who are asking directions or planning trip in an unfamiliar city/area, as historical experiences can reveal how people usually choose routes locations. To...

10.1109/icde.2011.5767890 article EN 2011-04-01

Near-duplicate video retrieval (NDVR) has recently attracted lots of research attention due to the exponential growth online videos. It helps in many areas, such as copyright protection, tagging, usage monitoring, etc. Most existing approaches use only a single feature represent for NDVR. However, is often insufficient characterize content. Besides, while accuracy main concern previous literatures, scalability NDVR algorithms large scale datasets been rarely addressed. In this paper, we...

10.1145/2072298.2072354 article EN Proceedings of the 30th ACM International Conference on Multimedia 2011-11-28

Hashing methods for efficient image retrieval aim at learning hash functions that map similar images to semantically correlated binary codes in the Hamming space with similarity well preserved. The traditional hashing usually represent content by hand-crafted features. Deep based on deep neural network (DNN) architectures can generate more effective features and obtain better performance. However, underlying data structure is hardly captured existing DNN models. Moreover, (either visually or...

10.1109/tfuzz.2020.2984991 article EN IEEE Transactions on Fuzzy Systems 2020-04-03

Currently, unsupervised heterogeneous domain adaptation in a generalized setting, which is the most common scenario real-world applications, under insufficient exploration. Existing approaches either are limited to special cases or require labeled target samples for training. This paper aims overcome these limitations by proposing framework, named as transfer independently together (TIT). Specifically, we learn multiple transformations, one each (independently), map data onto shared latent...

10.1109/tcyb.2018.2820174 article EN IEEE Transactions on Cybernetics 2018-04-13

Most existing cross-modal hashing methods suffer from the scalability issue in training phase. In this paper, we propose a novel approach with linear time complexity to data size, enable scalable indexing for multimedia search across multiple modals. Taking both intra-similarity each modal and inter-similarity different modals into consideration, proposed aims at effectively learning hash functions large-scale datasets. More specifically, modal, first partition $k$ clusters then represent...

10.1145/2502081.2502107 article EN 2013-10-21

Unsupervised domain adaptation addresses the problem of transferring knowledge from a well-labeled source to an unlabeled target where two domains have distinctive data distributions. Thus, essence is mitigate distribution divergence between domains. The state-of-the-art methods practice this very idea by either conducting adversarial training or minimizing metric which defines gaps. In paper, we propose new method named Adversarial Tight Match (ATM) enjoys benefits both and learning....

10.1109/tpami.2020.2991050 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2020-04-28

Near-duplicate video retrieval (NDVR) has recently attracted much research attention due to the exponential growth of online videos. It many applications, such as copyright protection, automatic tagging and monitoring. Many existing approaches use only a single feature represent for NDVR. However, is often insufficient characterize content. Moreover, while accuracy main concern in previous literatures, scalability NDVR algorithms large scale datasets been rarely addressed. In this paper, we...

10.1109/tmm.2013.2271746 article EN IEEE Transactions on Multimedia 2013-07-03

Domain adaptation aims to leverage knowledge from a well-labeled source domain poorly labeled target domain. A majority of existing works transfer the at either feature level or sample level. Recent studies reveal that both paradigms are essentially important, and optimizing one them can reinforce other. Inspired by this, we propose novel approach jointly exploit with distribution matching landmark selection. During transfer, also take local consistency between samples into consideration so...

10.1109/tip.2019.2924174 article EN IEEE Transactions on Image Processing 2019-06-26

Video captioning has been attracting broad research attention in the multimedia community. However, most existing approaches heavily rely on static visual information or partially capture local temporal knowledge (e.g., within 16 frames), thus hardly describing motions accurately from a global view. In this paper, we propose novel video framework, which integrates bidirectional long-short term memory (BiLSTM) and soft mechanism to generate better representations for videos as well enhance...

10.1109/tcyb.2018.2831447 article EN IEEE Transactions on Cybernetics 2018-05-25

Video captioning, in essential, is a complex natural process, which affected by various uncertainties stemming from video content, subjective judgment, and so on. In this paper, we build on the recent progress using encoder-decoder framework for captioning address what find to be critical deficiency of existing methods that most decoders propagate deterministic hidden states. Such uncertainty cannot modeled efficiently models. propose generative approach, referred as multimodal stochastic...

10.1109/tnnls.2018.2851077 article EN IEEE Transactions on Neural Networks and Learning Systems 2018-08-16

Recent progress has been made in using attention based encoder-decoder framework for image and video captioning. Most existing decoders apply the mechanism to every generated word including both visual words (e.g., “gun” “shooting”) non-visual “the”, “a”). However, these can be easily predicted natural language model without considering signals or attention. Imposing on could mislead decrease overall performance of Furthermore, hierarchy LSTMs enables more complex representation data,...

10.1109/tpami.2019.2894139 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2019-01-01

In this paper, we propose a novel approach to video captioning based on adversarial learning and long short-term memory (LSTM). With solution concept, aim at compensating for the deficiencies of LSTM-based methods that generally show potential effectively handle temporal nature data when generating captions but also typically suffer from exponential error accumulation. Specifically, adopt standard generative network (GAN) architecture, characterized by an interplay two competing processes:...

10.1109/tip.2018.2855422 article EN IEEE Transactions on Image Processing 2018-07-12

In real-world transfer learning tasks, especially in cross-modal applications, the source domain and target often have different features distributions, which are well known as heterogeneous adaptation (HDA) problem. Yet, existing HDA methods focus on either alleviating feature discrepancy or mitigating distribution divergence due to challenges of HDA. fact, optimizing one them can reinforce other. this paper, we propose a novel method that optimize both unified objective function....

10.1109/tnnls.2018.2868854 article EN IEEE Transactions on Neural Networks and Learning Systems 2018-09-27

The task of image-text matching refers to measuring the visual-semantic similarity between an image and a sentence. Recently, fine-grained methods that explore local alignment regions sentence words have shown advance in inferring correspondence by aggregating pairwise region-word similarity. However, is hard achieve as some important may be inaccurately detected or even missing. Meanwhile, with high-level semantics cannot strictly corresponding single-image region. To tackle these problems,...

10.1109/tnnls.2020.2967597 article EN IEEE Transactions on Neural Networks and Learning Systems 2020-02-11
Coming Soon ...