- Image and Signal Denoising Methods
- Human Pose and Action Recognition
- Sparse and Compressive Sensing Techniques
- Multimodal Machine Learning Applications
- Advanced Image Processing Techniques
- Advanced Neural Network Applications
- Remote-Sensing Image Classification
- Anomaly Detection Techniques and Applications
- Advanced Text Analysis Techniques
- Advanced Image and Video Retrieval Techniques
- Topic Modeling
- Remote Sensing and Land Use
- Advanced Image Fusion Techniques
- Blind Source Separation Techniques
- Advanced Vision and Imaging
- Domain Adaptation and Few-Shot Learning
- Privacy-Preserving Technologies in Data
- Web Data Mining and Analysis
- Music and Audio Processing
- Internet Traffic Analysis and Secure E-voting
- Natural Language Processing Techniques
- Video Analysis and Summarization
- Text and Document Classification Technologies
- Authorship Attribution and Profiling
- Image Processing Techniques and Applications
King Abdullah University of Science and Technology
2019-2025
Nanjing Tech University
2024-2025
Fudan University
2015-2024
Central University of Finance and Economics
2024
Beijing University of Posts and Telecommunications
2021-2024
State Key Laboratory of Networking and Switching Technology
2024
Nanjing Normal University
2024
Research Institute of Petroleum Exploration and Development
2024
Chinese Research Academy of Environmental Sciences
2023
Ministry of Ecology and Environment
2023
Temporal action detection is a fundamental yet challenging task in video understanding. Video context critical cue to effectively detect actions, but current works mainly focus on temporal context, while neglecting semantic as well other important properties. In this work, we propose graph convolutional network (GCN) model adaptively incorporate multi-level into features and cast sub-graph localization problem. Specifically, formulate snippets nodes, snippet-snippet correlations edges,...
We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite. It offers 3,670 hours of dailylife activity spanning hundreds scenarios (household, outdoor, workplace, leisure, etc.) captured by 931 unique camera wearers from 74 worldwide locations 9 different countries. The approach to collection is designed uphold rigorous privacy ethics standards, with consenting participants robust de-identification procedures where relevant. Ego4D dramatically expands the volume...
In order to improve CS performance of natural images, in this paper, we propose a novel framework design an OPtimization-INspired Explicable deep Network, dubbed OPINENet, for adaptive sampling and recovery. Both orthogonal binary constraints matrix are incorporated into OPINE-Net simultaneously. particular, is composed three subnets: subnet, initialization subnet recovery all the parameters (e.g. matrix, nonlinear transforms, shrinkage threshold) learned end-to-end, rather than...
Temporal action localization (TAL) in videos is a challenging task, especially due to the large variation temporal scales. Short actions usually occupy major proportion datasets, but tend have lowest performance. In this paper, we confront challenge of short and propose multi-level cross-scale solution dubbed as video self-stitching graph network (VSGN). We two key components VSGN: (VSS) pyramid (xGPN). VSS, focus on period magnify it along dimension obtain larger scale. stitch original clip...
Compressive sensing (CS) has drawn quite an amount of attention as a joint sampling and compression approach. Its theory shows that when the signal is sparse enough in some domain, it can be decoded from many fewer measurements than suggested by Nyquist theory. So one most challenging researches CS to seek domain where exhibit high degree sparsity hence recovered faithfully. Most conventional recovery approaches, however, exploited set fixed bases (e.g., DCT, wavelet, gradient domain) for...
Photo cropping is widely used in the printing industry, photography, and cinematography. Conventional photo methods suffer from three drawbacks: 1) semantics to describe aesthetics are determined by experience of model designers specific data sets, 2) image global configurations, an essential cue capture photos aesthetics, not well preserved cropped photo, 3) multi-channel visual features region contribute differently human but state-of-the-art cannot automatically weight them. Owing recent...
Due to independent and coarse quantization of transform coefficients in each block, block-based coding usually introduces visually annoying blocking artifacts at low bitrates, which greatly prevents further bit reduction. To alleviate the conflict between reduction quality preservation, deblocking as a post-processing strategy is an attractive promising solution without changing existing codec. In this paper, order reduce obtain high-quality image, image formulated optimization problem...
The compressive sensing (CS) theory indicates that robust reconstruction of signals can be obtained from far fewer measurements than those required by the Nyquist-Shannon theorem. Thus, CS has great potential in video acquisition and processing, considering it makes subsequent complex data compression unnecessary. In this paper, we propose a novel algorithm for effectively reconstructing videos measurements. comprises double phases, which first phase exploits intra-frame correlation provides...
The block discrete cosine transform (BDCT) has been widely used in current image and video coding standards, owing to its good energy compaction decorrelation properties. However, because of independent quantization DCT coefficients each block, BDCT usually gives rise visually annoying blocking compression artifacts, especially at low bit rates. In this paper, reduce artifacts obtain high-quality images, deblocking is cast as an optimization problem within maximum a posteriori framework,...
The recent and increasing interest in video-language research has driven the development of large-scale datasets that enable data-intensive machine learning techniques. In comparison, limited effort been made at assessing fitness these for grounding task. Recent works have begun to discover significant limitations datasets, suggesting state-of-the-art techniques commonly overfit hidden dataset biases. this work, we present MAD (Movie Audio Descriptions), a novel benchmark departs from...
Recently, conditional diffusion models have gained popularity in numerous applications due to their exceptional generation ability. However, many existing methods are training-required. They need train a time-dependent classifier or condition-dependent score estimator, which increases the cost of constructing and is inconvenient transfer across different conditions. Some current works aim overcome this limitation by proposing training-free solutions, but most can only be applied specific...
While deep neural networks (NN) significantly advance image compressed sensing (CS) by improving reconstruction quality, the necessity of training current CS NNs from scratch constrains their effectiveness and hampers rapid deployment. Although recent methods utilize pre-trained diffusion models for reconstruction, they struggle with slow inference restricted adaptability to CS. To tackle these challenges, this paper proposes Invertible Diffusion Models (IDM), a novel efficient, end-to-end...
Learning-based image super-resolution aims to reconstruct high-frequency (HF) details from the prior model trained by a set of high- and low-resolution patches. In this paper, HF be estimated is considered as combination two components: main (MHF) residual (RHF), we propose novel method via dual-dictionary learning sparse representation, which consists dictionary learning, recover MHF RHF respectively. Extensive experimental results on test images validate that employing proposed two-layer...
Flammability is a significant challenge in polymer-based strain sensing applications. In addition, the existing intrinsic flame retardant not elastic at room temperature, which may potentially damage flexible equipment. This study presents series of flame-retardant ionic conductive elastomers (ICEs) (denoted as PCAIPx) containing phosphorus from phytic acid (PA) and nitrogen choline chloride (ChCl) with multiple hydrogen bonds synthesized using simple efficient one-pot UV-initiated radical...
Compressed Sensing (CS) has drawn quite an amount of attention as a joint sampling and compression methodology. Recent studies further show that image prior models play important role in CS recovery. By exploiting the non-local self-similarity natural images clustering similar patches, low-rank model is adopted this paper. Different from traditional nuclear norm, we extend <i xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">l</i> <sub...
We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite. It offers 3,670 hours of daily-life activity spanning hundreds scenarios (household, outdoor, workplace, leisure, etc.) captured by 931 unique camera wearers from 74 worldwide locations 9 different countries. The approach to collection is designed uphold rigorous privacy ethics standards, with consenting participants robust de-identification procedures where relevant. Ego4D dramatically expands the volume...
In compressive sensing (CS), the seeking of a fair domain is essentially significance to achieve high enough degree signal sparsity. Most methods in literature, however, use fixed transform or prior information that cannot exhibit sparsity for various images. Superiorly, we propose an algorithm explore structured Laplacian DCT coefficients, which can adapt non-stationarity natural Better achieved by utilizing nonlocal similarity images and constructing image patch groups. Meanwhile, multiple...
Compressive sensing (CS) has drawn an enormous amount of attention in recent years due to its sub-Nyquist sampling rate and low-complexity requirement at the encoder. However, it turns out that decoder lieu encoder suffers from heavy computation order decently recover signal CS measurements. With aim developing a fast yet accurate algorithm, this paper, we propose leverage deep convolutional neural network (CNN) prior model constrained reconstruction formulation solve via alternating...
With the recent advances in video and 3D understanding, novel 4D spatio-temporal methods fusing both concepts have emerged. Towards this direction, Ego4D Episodic Memory Benchmark proposed a task for Visual Queries with Localization (VQ3D). Given an egocentric clip image crop depicting query object, goal is to localize position of center that object respect camera pose frame. Current tackle problem VQ3D by unprojecting 2D localization results sibling (VQ2D) into predictions. Yet, we point...
In the very recent years, development of music recommendation system has been a more heated problem due to higher level digital songs consumption and advancement machine learning techniques. Some traditional approaches such as collaborator filtering, widely used in systems, have helped give listeners quick access music. However, collaborative filtering or model based algorithm limitations giving better result with ignorance combination factor lyrics genre. our paper, we will propose an...