- Music and Audio Processing
- Speech and Audio Processing
- Speech Recognition and Synthesis
- Video Coding and Compression Technologies
- Anomaly Detection Techniques and Applications
- Advanced Data Compression Techniques
- Water Systems and Optimization
- Image and Signal Denoising Methods
- Advanced Vision and Imaging
- Advanced Image Processing Techniques
- Image and Video Quality Assessment
- Image Enhancement Techniques
- Advanced Adaptive Filtering Techniques
- Video Analysis and Summarization
- Cloud Computing and Resource Management
- Topic Modeling
- Software-Defined Networks and 5G
- IoT and Edge/Fog Computing
- Machine Fault Diagnosis Techniques
- Multimodal Machine Learning Applications
- Software Engineering Research
- Network Security and Intrusion Detection
- Integrated Circuits and Semiconductor Failure Analysis
- Human Motion and Animation
- Computer Graphics and Visualization Techniques
KU Leuven
2022-2025
University of California, San Diego
2025
IMEC
2022-2024
Shanghai University of Engineering Science
2022
Wuhan University of Science and Technology
2021
Beijing Institute of Technology
2016-2019
Peking University Shenzhen Hospital
2019
Peking University
2018
University of Electronic Science and Technology of China
2016-2017
Microsoft (United States)
2015
Can we get network latency between any two servers at time in large-scale data center networks? The collected can then be used to address a series of challenges: telling if an application perceived issue is caused by the or not, defining and tracking service level agreement (SLA), automatic troubleshooting. We have developed Pingmesh system for measurement analysis answer above question affirmatively. has been running Microsoft centers more than four years, it collects tens terabytes per...
Can we get network latency between any two servers at time in large-scale data center networks? The collected can then be used to address a series of challenges: telling if an application perceived issue is caused by the or not, defining and tracking service level agreement (SLA), automatic troubleshooting. We have developed Pingmesh system for measurement analysis answer above question affirmatively. has been running Microsoft centers more than four years, it collects tens terabytes per...
Abstract Tonic-clonic seizures (TCSs), which present a significant risk for sudden unexpected death in epilepsy (SUDEP), require accurate detection to enable effective long-term monitoring. Previous studies have demonstrated the advantages of multimodal seizure systems reliably detecting TCSs over extended periods. However, effectiveness these data-driven depends heavily on availability reliable training data. To address this need, we propose an innovative data selection method designed...
This paper presents a novel convolutional neural network (CNN) based image compression framework via scalable auto-encoder (SAE). Specifically, our SAE deep codec consists of hierarchical coding layers, each which is an end-to-end optimized auto-encoder. The coarse content and texture are encoded through the first (base) layer while consecutive (enhance) layers iteratively code pixel-level reconstruction errors between original former reconstructed images. proposed structure alleviates need...
Image texture
We proposed a novel chroma upsampling scheme for YCbCr 420 videos. In state-of-the-art work, the is performed by simply duplicating stored U and V values in 4 copies saving to corresponding locations 444. method, instead of directly copying current values, we consider their neighboring as weights redistribute total energy. The experimental results show that method better than state-of the-art all tested videos terms CPSNR; average gain 0.13 dB.
Enhancing speech captured by distant microphones is a challenging task. In this study, we investigate the multichannel signal properties of single acoustic vector sensor (AVS) to obtain inter-sensor data ratio (ISDR) model in time-frequency (TF) domain. Then, monotone functions describing relationship between ISDRs and direction arrival (DOA) target speaker are derived. For enhancement (SE) task, DOA given, calculated. Hence, TF components dominated extracted with high probability using...
Sequential audio event tagging can provide not only the type information of events, but also order between events and number that occur in an clip.Most previous works on sequence analysis rely connectionist temporal classification (CTC).However, CTC's conditional independence assumption prevents it from effectively learning correlations diverse events.This paper first introduces Transformer into sequential tagging, since Transformers perform well sequence-related tasks.To better utilize...
H.264 coding technology is the most commonly-used video compression technology. As videos in transmission, packet losses inevitably occur. The error concealment method can solve problem. For lost block, we use motion vector of neighboring available block to estimate vectors, and estimates propagate predict all other missing vectors. By comparison against state-of-the-art method, proposed algorithm increases average PSNR by 1.93 dB on average.
This paper proposes a method for Acoustic Constrained Segmentation (ACS) in audio recordings of vehicles driven through production test track, delimiting the boundaries surface types track. ACS is variant classical acoustic segmentation where sequence labels known, contiguous and invariable, which especially useful this work as track has standard configuration types. The proposed ConvDTW-ACS utilizes Convolutional Neural Network classifying overlapping image chunks extracted from full...
As generative models achieve great success, tampering and modifying the sensitive image contents (i.e., human faces, artist signatures, commercial logos, etc.) have induced a significant threat with social impact. The backdoor attack is method that implants vulnerabilities in target model, which can be activated through trigger. In this work, we innovatively prevent abuse of content modification by implanting into image-editing models. Once protected on an modified editing will triggered,...
Many real-world user queries (e.g. "How do to make egg fried rice?") could benefit from systems capable of generating responses with both textual steps accompanying images, similar a cookbook. Models designed generate interleaved text and images face challenges in ensuring consistency within across these modalities. To address challenges, we present ISG, comprehensive evaluation framework for text-and-image generation. ISG leverages scene graph structure capture relationships between image...
Pixel-wise regression tasks (e.g., monocular depth estimation (MDE) and optical flow (OFE)) have been widely involved in our daily life applications like autonomous driving, augmented reality video composition. Although certain are security-critical or bear societal significance, the adversarial robustness of such models not sufficiently studied, especially black-box scenario. In this work, we introduce first unified patch attack framework against pixel-wise tasks, aiming to identify...
Anomaly detection models can help to automatically and proactively detect faults in industrial machines. Microphones are appealing as they generally inexpensive unlike visual inspection, recording sound samples give information about the internals of machine. However, conventional methods based on an AutoEncoder (AE) structure learned from scratch struggle learn how robustly reconstruct with limited available data. This paper addresses this problem by presenting a method for unsupervised...