- Advanced Vision and Imaging
- Advanced Image and Video Retrieval Techniques
- Advanced Neural Network Applications
- Advanced Image Processing Techniques
- Image Enhancement Techniques
- Multimodal Machine Learning Applications
- Visual Attention and Saliency Detection
- Face Recognition and Perception
- Video Surveillance and Tracking Methods
- Video Analysis and Summarization
- Human Pose and Action Recognition
- Data Stream Mining Techniques
- Cell Image Analysis Techniques
- Human Motion and Animation
- Medical Image Segmentation Techniques
- Autonomous Vehicle Technology and Safety
- Robot Manipulation and Learning
- Advanced Memory and Neural Computing
- Image Processing Techniques and Applications
- Robotic Mechanisms and Dynamics
- Piezoelectric Actuators and Control
- Image and Video Quality Assessment
- Surface Treatment and Residual Stress
- Tactile and Sensory Interactions
- Hand Gesture Recognition Systems
Southwest Jiaotong University
2024
Megvii (China)
2021-2024
Vi Technology (United States)
2021-2023
University of Electronic Science and Technology of China
2016-2022
Beijing Normal University
2021
National Institute Of Veterinary Epidemiology And Disease Informatics
2021
Huazhong University of Science and Technology
2020
Communication University of China
2019
China Three Gorges University
2019
China Energy Engineering Corporation (China)
2019
Spotting objects that are visually adapted to their surroundings is challenging for both humans and AI. Conventional generic / salient object detection techniques suboptimal this task because they tend only discover easy clear objects, while overlooking the difficult-to-detect ones with inherent uncertainties derived from indistinguishable textures. In work, we contribute a novel approach using probabilistic representational model in combination transformers explicitly reason under...
Diffusion models have achieved promising results in image restoration tasks, yet suffer from time-consuming, excessive computational resource consumption, and unstable restoration. To address these issues, we propose a robust efficient Diffusion-based Low-Light enhancement approach, dubbed DiffLL. Specifically, present wavelet-based conditional diffusion model (WCDM) that leverages the generative power of to produce with satisfactory perceptual fidelity. Additionally, it also takes advantage...
Crowd counting is an important yet challenging task due to the large scale and density variation. Recent investigations have shown that distilling rich relations among multi-scale features exploiting useful information from auxiliary task, i.e., localization, are vital for this task. Nevertheless, how comprehensively leverage these within a unified network architecture still problem. In paper, we present novel structure called Hybrid Graph Neural Network (HyGnn) which targets relieve problem...
Optical flow is a fundamental method used for quantitative motion estimation on the image plane. In deep learning era, most works treat it as task of 'matching features', to pull matched pixels close possible in feature space and vice versa. However, spatial affinity (smoothness constraint), another important component understanding, has been largely overlooked. this paper, we introduce novel approach, called kernel patch attention (KPA), better resolve ambiguity dense matching by explicitly...
Optical flow, or the estimation of motion fields from image sequences, is one fundamental problems in computer vision. Unlike most pixel-wise tasks that aim at achieving consistent representations same category, optical flow raises extra demands for obtaining local discrimination and smoothness, which yet not fully explored by existing approaches. In this paper, we push Gaussian Attention (GA) into models to accentuate properties during representation learning enforce affinity matching....
Supervised homography estimation methods face a challenge due to the lack of adequate labeled training data. To address this issue, we propose DMHomo , diffusion model-based framework for supervised learning. This generates image pairs with accurate labels, realistic content, and interval motion, ensuring that they satisfy pairs. We utilize unlabeled pseudo labels such as dominant plane masks, computed from existing methods, train model dataset. further enhance performance, introduce new...
End-to-end deep learning has gained considerable interests in autonomous driving vehicles both academic and industrial fields, especially decision making process. One critical issue process of is steering control. Researchers already trained different artificial neural networks to predict angle with front-facing camera data stream. However, existing end-to-end methods only consider the spatiotemporal relation on a single layer lack ability extracting future information. In this paper, we...
In this paper, we propose a novel framework for optical flow estimation that achieves good balance between performance and efficiency. Our approach involves disentangling global motion learning from local estimation, treating matching refinement as separate stages. We offer two key insights: First, the multi-scale 4D cost-volume based recurrent decoder is computationally expensive unnecessary handling small displacement. With separation, can utilize lightweight methods both parts maintain...
We study the problem of estimating optical flow from event cameras. One important issue is how to build a high-quality event-flow dataset with accurate values and labels. Previous datasets are created by either capturing real scenes cameras or synthesizing images pasted foreground objects. The former case can produce but calculated labels, which sparse inaccurate. latter generate dense labels interpolated events prone errors. In this work, we propose render physically correct using computer...
SQL injection has always been a major threat in the field of web application security. Traditional methods such as rule-matching-based detection solutions, which are inefficient to cope with ever-changing techniques and there is risk bypassing variants. In this paper, we extract attack related payloads from network flow propose model based on Convolutional Neural Network (CNN), can take advantages high-dimensional features behavior deal issue. The proposed approach was tested real-traffic...
Semantic correspondence is a fundamental problem in computer vision, which aims at establishing dense correspondences across images depicting different instances under the same category. This task challenging due to large intra-class variations and severe lack of ground truth. A popular solution learn from synthetic data. However, because limited appearance background within synthetically generated training data, model's capability for handling "real" image pairs using such strategy...
We present an unsupervised optical flow estimation method by proposing adaptive pyramid sampling in the deep network. Specifically, downsampling, we propose a Content-Aware Pooling (CAP) module, which promotes local feature gathering avoiding cross region pooling, so that learned features become more representative. In upsampling, Adaptive Flow Upsampling (AFU) where edge interpolation can be avoided, producing sharp motion boundaries. Equipped with these two modules, our achieves best...
Based on the theory of acoustic propagation, realization mechanism longitudinal-torsional vibration (LTV) by converter cylinder with multiple diagonal slits (MDS) under single excitation was studied. The influences composite stepped horns structural parameters resonant frequency were studied numerical analysis method, and horn (resonant about 20kHz) LTV designed. results show that could be realized longitudinal when is 20498Hz, amplitude varied periodically. can applied in design system...
Scene parsing, or semantic segmentation, aims at labeling all pixels in an image with the predefined categories of things and stuff. Learning a robust representation for each pixel is crucial this task. Existing state-of-the-art (SOTA) algorithms employ deep neural networks to learn (discover) representations needed parsing from raw data. Nevertheless, these discover desired features only given (content), ignoring more generic knowledge contained dataset. To overcome deficiency, we make...
The kinematics model was built based on the principle of ultrasonic deep rolling (UDR) with longitudinal-torsional vibration (LTV), and trajectory equations any particles edge roller involved in process were given; influences parameters including phase angle, frequency LTV, amplitude longitudinal vibration, ratio radius, depth, feed rate, rotation trajectory, acoustic system design processing quality discussed. results can provide theoretical basis for optimization parameters.
In this paper, we propose a diffusion-based unsupervised framework that incorporates physically explainable Retinex theory with diffusion models for low-light image enhancement, named LightenDiffusion. Specifically, present content-transfer decomposition network performs within the latent space instead of as in previous approaches, enabling encoded features unpaired and normal-light images to be decomposed into content-rich reflectance maps content-free illumination maps. Subsequently, map...