- Music and Audio Processing
- Multimodal Machine Learning Applications
- Human Pose and Action Recognition
- Video Analysis and Summarization
- Speech and Audio Processing
- Speech Recognition and Synthesis
- Advanced Image and Video Retrieval Techniques
- Image Retrieval and Classification Techniques
- Anomaly Detection Techniques and Applications
- Face and Expression Recognition
- Text and Document Classification Technologies
- Visual Attention and Saliency Detection
- Topic Modeling
- Water Systems and Optimization
- Natural Language Processing Techniques
- Hand Gesture Recognition Systems
- Speech and dialogue systems
- Domain Adaptation and Few-Shot Learning
- Advanced Graph Neural Networks
- Tensor decomposition and applications
- Infrared Target Detection Methodologies
- Recommender Systems and Techniques
- Gaze Tracking and Assistive Technology
- Robotics and Automated Systems
- Advanced Adaptive Filtering Techniques
Shandong University
2022-2024
Norwegian University of Science and Technology
2019-2022
Xi'an Jiaotong University
2020-2021
Shandong Academy of Sciences
2021
Qilu University of Technology
2021
Harbin Institute of Technology
2017
Recently, label distribution learning (LDL) has drawn much attention in machine learning, where LDL model is learned from labelel instances. Different single-label and multi-label annotations, distributions describe the instance by multiple labels with different intensities accommodate to more general scenes. Since most existing datasets merely provide logical labels, are unavailable many real-world applications. To handle this problem, we propose two novel enhancement methods, i.e., Label...
Video moment retrieval targets at retrieving a golden in video for given natural language query. The main challenges of this task include 1) the requirement accurately localizing (i.e., start time and end of) relevant an untrimmed stream, 2) bridging semantic gap between textual query contents. To tackle those problems, early approaches adopt sliding window or uniform sampling to collect clips first then match each clip with identify clips. Obviously, these strategies are time-consuming...
In recent years, the task of salient object detection in optical remote sensing images (RSI-SOD) has received extensive attention. Benefiting from development deep learning, much progress been made RSI-SOD field. However, existing methods still face challenges addressing various issues present RSI, including uncertain numbers objects, cluttered backgrounds, and interference shadows. To address these challenges, we propose a novel approach, Adaptive Edge-aware Semantic Interaction Network...
Without the valuable label information to guide learning process, it is demanding fully excavate and integrate underlying from different views learn unified multi-view representation. This paper focuses on this challenge presents a novel method, termed Graph-guided Unsupervised Multi-view Representation Learning (GUMRL), taking full advantage of graph during process. To be specific, GUMRL jointly conducts view-specific feature representation learning, which under guidance information, fuses...
Compared with single-label and multi-label annotations, label distribution describes the instance by multiple labels different intensities accommodates to more-general conditions. Nevertheless, learning is unavailable in many real-world applications because most existing datasets merely provide logical labels. To handle this problem, a novel enhancement method, Label Enhancement Sample Correlations via low-rank representation, proposed paper. Unlike methods, representation method employed so...
Understanding what is happening in the surveillance video important for human-machine interface transportation systems, where temporal language grounding one of key tasks, targeting at localizing desired moment an untrimmed with a given sentence query that relevant to moment. This task challenging due following reasons: 1) requirement understanding contents and semantics comprehensively, 2) building bridge between cross-modal semantics. To tackle these problems, early methods first sample...
This paper presented a gesture recognition system for human-computer interaction based on 24GHz radars. We describe this designed frequently used detection like hand pushing, pulling, lifting and shaking. Decision tree was established to classify these original signals into the four sets of gestures. Through set tests gesture-recognition system, we proposed that could achieve an overall accuracy rate higher than 92%.
Localizing a desired moment within an untrimmed video via given natural language query, i.e., cross-modal localization, has attracted widespread research attention recently. However, it is challenging task because requires not only accurately understanding intra-modal semantic information, but also explicitly capturing inter-modal correlations (consistency and complementarity). Existing efforts mainly focus on alignment, while ignoring necessary supplement. Consequently, we present...
Localizing the desired video clip for a given query in an untrimmed has been hot research topic multimedia understanding. Recently, new task named relocalization, which is clip, raised. Some methods have developed this task, however, these often require dense annotations of temporal boundaries inside long videos training. A more practical solution weakly-supervised approach, only needs matching information between and video.
Failure of critical industrial equipment leads to significant costs due downtime and unplanned maintenance interventions, which in turn drives the demand for Acoustic Anomaly Detection (AAD). Previous approaches based on a deep Auto-Encoder (AE) followed by Gaussian Mixture Models (GMMs) have made progress. However, these tandem suffer from major weakness that dimensionality reduction feature stage density estimation are optimised separately. This paper proposed an unsupervised Density...
Recently, label distribution learning (LDL) has drawn much attention in machine learning, where LDL model is learned from labelel instances. Different single-label and multi-label annotations, distributions describe the instance by multiple labels with different intensities accommodate to more general scenes. Since most existing datasets merely provide logical labels, are unavailable many real-world applications. To handle this problem, we propose two novel enhancement methods, i.e., Label...
As a hot research topic, many multi-view clustering approaches are proposed over the past few years. Nevertheless, most existing algorithms merely take consensus information among different views into consideration for clustering. Actually, it may hinder performance in real-life applications, since usually contain diverse statistic properties. To address this problem, we propose novel Tensor-based Intrinsic Subspace Representation Learning (TISRL) paper. Concretely, rank preserving...
With the development of internet things technologies, tremendous sensor audio data has been produced, which poses great challenges to audio-based event detection in smart cities. In this paper, we target a challenging task, namely, text-to-audio grounding. addition precisely localizing all desired on- and off-sets untrimmed audio, new task requires extensive acoustic linguistic comprehension as well reasoning for crossmodal matching relations between query. The current approaches often treat...