- Video Analysis and Summarization
- Advanced Image and Video Retrieval Techniques
- Image Retrieval and Classification Techniques
- Advanced Image Fusion Techniques
- Music and Audio Processing
- Image Enhancement Techniques
- Advanced Vision and Imaging
- Visual Attention and Saliency Detection
- Image Processing Techniques and Applications
- Remote-Sensing Image Classification
- Video Surveillance and Tracking Methods
- Multimodal Machine Learning Applications
- Speech and Audio Processing
- Photoacoustic and Ultrasonic Imaging
- Advanced Image Processing Techniques
- Advanced Neural Network Applications
- Thermography and Photoacoustic Techniques
- Digital Media Forensic Detection
- Web Data Mining and Analysis
- Industrial Vision Systems and Defect Detection
- Data Management and Algorithms
- Network Security and Intrusion Detection
- Multimedia Communication and Technology
- Adversarial Robustness in Machine Learning
- Infrared Target Detection Methodologies
Dalian University of Technology
2018-2024
State Grid Corporation of China (China)
2017-2022
Alibaba Group (Cayman Islands)
2022
Beijing University of Posts and Telecommunications
2021
Yanshan University
2020
Taizhou University
2020
Alibaba Group (United States)
2019
Bellevue Hospital Center
2019
AT&T (United States)
2006-2018
China Ocean Shipping (China)
2016
Multi-modality image fusion and segmentation play a vital role in autonomous driving robotic operation. Early efforts focus on boosting the performance for only one task, e.g., or segmentation, making it hard to reach 'Best of Both Worlds'. To overcome this issue, paper, we propose Multi-interactive Feature learning architecture Segmentation, namely SegMiF, exploit dual-task correlation promote both tasks. The SegMiF is cascade structure, containing sub-network commonly used sub-network. By...
Recently, multi-modality scene perception tasks, e.g., image fusion and understanding, have attracted widespread attention for intelligent vision systems. However, early efforts always consider boosting a single task unilaterally neglecting others, seldom investigating their underlying connections joint promotion. To overcome these limitations, we establish the hierarchical dual tasks-driven deep model to bridge tasks. Concretely, firstly construct an module fuse complementary...
Infrared-visible image fusion (IVIF) is a fundamental and critical task in the field of computer vision. Its aim to integrate unique characteristics both infrared visible spectra into holistic representation. Since 2018, growing amount diversity IVIF approaches step deep-learning era, encompassing introduced broad spectrum networks or loss functions for improving visual enhancement. As research deepens practical demands grow, several intricate issues like data compatibility, perception...
Infrared and visible image fusion is a powerful technique that combines complementary information from different modalities for downstream semantic perception tasks. Existing learning-based methods show remarkable performance, but are suffering the inherent vulnerability of adversarial attacks, causing significant decrease in accuracy. In this work, perception-aware framework proposed to promote segmentation robustness scenes. We first conduct systematic analyses about components fusion,...
Multi-modality image fusion refers to generating a complementary that integrates typical characteristics from source images. In recent years, we have witnessed the remarkable progress of deep learning models for multi-modality fusion. Existing CNN-based approaches strain every nerve design various architectures realizing these tasks in an end-to-end manner. However, handcrafted designs are unable cope with high demanding tasks, resulting blurred targets and lost textural details. To...
A video sequence usually consists of separate scenes, and each scene includes many shots. For understanding purposes, it is most important to detect breaks. To analyze the content scene, detection shot breaks also required. Usually, a break associated with simultaneous change image, motion, audio characteristics, while only accompanied changes in image or motion both. We propose use information along accomplish segmentation at different levels. Promising results have been obtained videos...
Infrared-visible image fusion (IVIF) is a critical task in computer vision, aimed at integrating the unique features of both infrared and visible spectra into unified representation. Since 2018, field has entered deep learning era, with an increasing variety approaches introducing range networks loss functions to enhance visual performance. However, challenges such as data compatibility, perception accuracy, efficiency remain. Unfortunately, there lack recent comprehensive surveys that...
Scene classification and segmentation are fundamental steps for efficient accessing, retrieving browsing large amount of video data. We have developed a scene scheme using Hidden Markov Model (HMM)-based classifier. By utilizing the temporal behaviors different classes, HMM classifier can effectively classify presegmented clips into one predefined classes. In this paper, we describe three approaches joint based on HMM, which search most likely class transition path by dynamic programming...
Video copy detection techniques are essential for a number of applications including discovering copyright infringement multimedia content, monitoring commercial air time, and querying videos by example. Over the last decade, video has received rapidly growing attention from research community. To encourage more innovative technology benchmark state art approaches in this field, TRECVID conference series, sponsored NIST, initiated an evaluation task on content based 2008. In paper, we...
In recent years, learning-based methods have achieved significant advancements in multi-exposure image fusion. However, two major stumbling blocks hinder the development, including pixel misalignment and inefficient inference. Reliance on aligned pairs existing causes susceptibility to artifacts due device motion. Additionally, techniques often rely handcrafted architectures with huge network engineering, resulting redundant parameters, adversely impacting inference efficiency flexibility....
Video deraining is an important issue for outdoor vision systems and has been investigated extensively. However, designing optimal architectures by the aggregating model formation data distribution a challenging task video deraining. In this paper, we develop model-guided triple-level optimization framework to deduce network architecture with cooperating auto-searching mechanism, named Triple-level Model Inferred Cooperating Searching (TMICS), dealing various rain circumstances. particular,...
The proposed shot boundary determination (SBD) algorithm contains a set of finite state machine (FSM) based detectors for pure cut, fast dissolve, fade in, out, and wipe. Support vector machines (SVM) are applied to the cut dissolve further boost performance. Our SBD system was highly effective when evaluated in TRECVID 2006 (TREC video retrieval evaluation) its performance ranked highest overall.
Infrared and visible image fusion plays a vital role in the field of computer vision. Previous approaches make efforts to design various rules loss functions. However, these experimental designed methods more complex. Besides, most them only focus on boosting visual effects, thus showing unsatisfactory performance for follow-up high-level vision tasks. To address challenges, this letter, we develop semantic-level network sufficiently utilize semantic guidance, emancipating rules. In...
With the rapid development of smart grids, number various types power IoT terminal devices has grown by leaps and bounds. An attack on either difficult-to-protect end or any node in a large complex network can put grid at risk. The traffic generated Distributed Denial Service (DDoS) attacks is characterised short bursts time, making it difficult to apply existing centralised detection methods that rely manual setting characteristics changing scenarios. In this paper, DDoS model based...
This paper addresses the problem of recovering semantic structure broadcast news. A hierarchy retrievable units is automatically constructed by integrating information from different media. The provides a compact, yet meaningful, abstraction news data, similar to conventional table content that can serve as an effective index table, facilitating capability browsing through large amounts data in nonlinear fashion. recovery further enables automated solutions constructing visual...
Video classification and segmentation are fundamental steps for efficient accessing, retrieval browsing of large amounts video data. We have developed a scene scheme using hidden Markov model (HMM) based classifier. By utilizing the temporal behaviors different classes, HMM classifier can effectively classify segments into one pre-defined classes. In this paper, we describe two approaches joint on HMM, which works by searching most likely class transition path dynamic programming technique.
To effectively extract the typical features of bearing, a new method that related local mean decomposition Shannon entropy and improved kernel principal component analysis model was proposed. First, are extracted by time–frequency domain method, decomposition, using to process original separated product functions, so as get features. However, been still contain superfluous information; nonlinear multi-features technique, analysis, is introduced fuse characters. The weight factor....
In recent years, there has been a growing interest in combining learnable modules with numerical optimization to solve low-level vision tasks. However, most existing approaches focus on designing specialized schemes generate image/feature propagation. There is lack of unified consideration construct propagative modules, provide theoretical analysis tools, and design effective learning mechanisms. To mitigate the above issues, this paper proposes optimization-inspired framework aggregate...
Major casts, for example, the anchor persons or reporters in news broadcast programs and principle characters movies, play an important role video, their occurrences provide meaningful indices organizing presenting video content. This paper describes a new approach automatically generating list of major casts sequence based on multiple modalities, specifically, speaker information audio track face track. The core algorithm is composed three steps. First, boundaries are detected segments...
Multi-modality image fusion and segmentation play a vital role in autonomous driving robotic operation. Early efforts focus on boosting the performance for only one task, \emph{e.g.,} or segmentation, making it hard to reach~`Best of Both Worlds'. To overcome this issue, paper, we propose \textbf{M}ulti-\textbf{i}nteractive \textbf{F}eature learning architecture \textbf{Seg}mentation, namely SegMiF, exploit dual-task correlation promote both tasks. The SegMiF is cascade structure, containing...