Xu Jia

ORCID: 0000-0003-3168-3505
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Advanced Image Processing Techniques
  • Advanced Vision and Imaging
  • Image and Signal Denoising Methods
  • Image Processing Techniques and Applications
  • Advanced Image Fusion Techniques
  • Generative Adversarial Networks and Image Synthesis
  • Advanced SAR Imaging Techniques
  • Image Enhancement Techniques
  • Advanced Image and Video Retrieval Techniques
  • Advanced Neural Network Applications
  • Image and Video Quality Assessment
  • Visual Attention and Saliency Detection
  • Video Analysis and Summarization
  • Microwave Imaging and Scattering Analysis
  • Human Motion and Animation
  • Video Surveillance and Tracking Methods
  • Synthetic Aperture Radar (SAR) Applications and Techniques
  • Multimodal Machine Learning Applications
  • 3D Shape Modeling and Analysis
  • Geochemistry and Geologic Mapping
  • Advanced Memory and Neural Computing
  • Human Pose and Action Recognition
  • Computer Graphics and Visualization Techniques
  • Domain Adaptation and Few-Shot Learning
  • Digital Media Forensic Detection

Dalian University of Technology
2012-2025

Southwest Petroleum University
2023-2024

Wuhan University
2023

Liaoning University of Technology
2023

Huawei Technologies (Sweden)
2019-2021

Huawei Technologies (China)
2019-2021

Huawei Technologies (France)
2020

Guangdong Polytechnic Normal University
2020

National Tsing Hua University
2016

Tsinghua University
2006-2013

In a traditional convolutional layer, the learned filters stay fixed after training. contrast, we introduce new framework, Dynamic Filter Network, where are generated dynamically conditioned on an input. We show that this architecture is powerful one, with increased flexibility thanks to its adaptive nature, yet without excessive increase in number of model parameters. A wide variety filtering operations can be way, including local spatial transformations, but also others like selective...

10.48550/arxiv.1605.09673 preprint EN other-oa arXiv (Cornell University) 2016-01-01

In the last few years, image denoising has benefited a lot from fast development of neural networks. However, requirement large amounts noisy-clean pairs for supervision limits wide use these models. Although there have been attempts in training an model with only single noisy images, existing self-supervised approaches suffer inefficient network training, loss useful information, or dependence on noise modeling. this paper, we present very simple yet effective method named Neighbor2Neighbor...

10.1109/cvpr46437.2021.01454 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021-06-01

Video super-resolution, which aims at producing a high-resolution video from its corresponding low-resolution version, has recently drawn increasing attention. In this work, we propose novel method that can effectively incorporate temporal information in hierarchical way. The input sequence is divided into several groups, with each one to kind of frame rate. These groups provide complementary recover missing details the reference frame, further integrated an attention module and deep...

10.1109/cvpr42600.2020.00803 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020-06-01

Although remarkable progress has been made on single image super-resolution due to the revival of deep convolutional neural networks, learning methods are confronted with challenges computation and memory consumption in practice, especially for mobile devices. Focusing this issue, we propose an efficient residual dense block search algorithm multiple objectives hunt fast, lightweight accurate networks super-resolution. Firstly, accelerate network, exploit variation feature scale adequately...

10.1609/aaai.v34i07.6877 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2020-04-03

Usually located at the very early stages of computational photography pipeline, demosaicing and denoising play important parts in modern camera image processing. Recently, some neural networks have shown effectiveness joint (JDD). Most them first decompose a Bayer raw into four-channel RGGB then feed it network. This practice ignores fact that green channels are sampled double rate compared to red blue channels. In this paper, we propose self-guidance network (SGNet), where initially...

10.1109/cvpr42600.2020.00231 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020-06-01

In this paper, we propose an end-to-end learning framework for event-based motion deblurring in a self-supervised manner, where real-world events are exploited to alleviate the performance degradation caused by data inconsistency. To achieve end, optical flows predicted from events, with which blurry consistency and photometric enable self-supervision on network data. Furthermore, piecewise linear model is proposed take into account non-linearities thus leads accurate physical formation of...

10.1109/iccv48922.2021.00258 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2021-10-01

Temporal modeling is crucial for video super-resolution. Most of the super-resolution methods adopt optical flow or deformable convolution explicitly motion compensation. However, such temporal techniques increase model complexity and might fail in case occlusion complex motion, resulting serious distortion artifacts. In this paper, we propose to explore role explicit difference both LR HR space. Instead directly feeding consecutive frames into a VSR model, compute between divide those...

10.1109/cvpr52688.2022.01689 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022-06-01

The 3D Lookup Table (3D LUT) is a highly-efficient tool for real-time image enhancement tasks, which models non-linear color transform by sparsely sampling it into discretized lattice. Previous works have made efforts to learn image-adaptive output values of LUTs flexible but neglect the importance strategy. They adopt sub-optimal uniform point allocation, limiting expressiveness learned since (tri-)linear interpolation between points in LUT might fail model local non-linearities transform....

10.1109/cvpr52688.2022.01700 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022-06-01

In the past years, attention-based Transformers have swept across field of computer vision, starting a new stage backbones in semantic segmentation. Nevertheless, segmentation under poor light conditions remains an open problem. Moreover, most papers about work on images produced by commodity frame-based cameras with limited framerate, hindering their deployment to auto-driving systems that require instant perception and response at milliseconds. An event camera is sensor generates data...

10.1109/tip.2023.3249579 article EN IEEE Transactions on Image Processing 2023-01-01

Video super-resolution plays an important role in surveillance video analysis and ultra-high-definition display, which has drawn much attention both the research industrial communities. Although many deep learning-based VSR methods have been proposed, it is hard to directly compare these since different loss functions training datasets a significant impact on results. In this work, we carefully study three temporal modeling (2D CNN with early fusion, 3D slow fusion Recurrent Neural Network)...

10.48550/arxiv.2008.05765 preprint EN other-oa arXiv (Cornell University) 2020-01-01

The task of single image super-resolution (SISR) aims at reconstructing a high-resolution (HR) from low- resolution (LR) image. Although significant progress has been made with deep learning models, they are trained on synthetic paired data in supervised way and do not perform well real cases. There several attempts that directly apply unsupervised translation models to address such problem. However, need be modified adapt low-level vision which poses higher requirement the accuracy...

10.1109/cvprw50498.2020.00242 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) 2020-06-01

In recent years, image denoising has benefited a lot from deep neural networks. However, these models need large amounts of noisy-clean pairs for supervision. Although there have been attempts in training networks with only noisy images, existing self-supervised algorithms suffer inefficient network training, heavy computational burden, or dependence on noise modeling. this paper, we proposed framework named Neighbor2Neighbor denoising. We develop theoretical motivation and prove that by...

10.1109/tip.2022.3176533 article EN IEEE Transactions on Image Processing 2022-01-01

Sparse representation has been successfully applied to visual tracking by finding the best candidate with a minimal reconstruction error using target templates. However, most sparse representation-based methods only consider holistic rather than local appearance discriminate between and background regions, hence may not perform well when objects are heavily occluded. In this paper, we develop simple yet robust algorithm based on coarse fine structural model. The proposed method exploits both...

10.1109/tip.2016.2592701 article EN IEEE Transactions on Image Processing 2016-07-18

In this work, we focus on synthesizing high-fidelity novel view images for arbitrary human performers, given a set of sparse multi-view images. It is challenging task due to the large variation among articulated body poses and heavy self-occlusions. To alleviate this, introduce an effective generalizable framework Generalizable Model-based Neural Radiance Fields (GM-NeRF) synthesize free-viewpoint Specifically, propose geometry-guided attention mechanism register appearance code from 2D...

10.1109/cvpr52729.2023.01978 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

Videos stored on mobile devices or delivered the Internet are usually in compressed format and of various unknown compression parameters, but most video super-resolution (VSR) methods often assume ideal inputs resulting large performance gap between experimental settings real-world applications. In spite a few pioneering works being proposed recently to super-resolve videos, they not specially designed deal with videos levels compression. this paper, we propose novel practical...

10.1109/cvpr52729.2023.00200 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

Event-based cameras are bio-inspired sensors that capture brightness change of every pixel in an asynchronous manner. Compared with frame-based sensors, event have microsecond-level latency and high dynamic range, hence showing great potential for object detection under high-speed motion poor illumination conditions. Due to sparsity asynchronism nature streams, most existing approaches resort hand-crafted methods convert data into 2D grid representation. However, they sub-optimal aggregating...

10.1609/aaai.v37i2.25346 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2023-06-26

10.1016/j.cviu.2024.104094 article FR Computer Vision and Image Understanding 2024-07-28

This paper proposes the novel task of video generation conditioned on a SINGLE semantic label map, which provides good balance between flexibility and quality in process. Different from typical end-to-end approaches, model both scene content dynamics single step, we propose to decompose this difficult into two sub-problems. As current image methods do better than terms detail, synthesize high by only generating first frame. Then animate based its meaning obtain temporally coherent video,...

10.48550/arxiv.1903.04480 preprint EN other-oa arXiv (Cornell University) 2019-01-01
Coming Soon ...