Yi Xu

ORCID: 0000-0003-2126-6054
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Advanced Vision and Imaging
  • Video Surveillance and Tracking Methods
  • Advanced Image Processing Techniques
  • Domain Adaptation and Few-Shot Learning
  • Advanced Image and Video Retrieval Techniques
  • Human Pose and Action Recognition
  • Computer Graphics and Visualization Techniques
  • Advanced Neural Network Applications
  • Image and Signal Denoising Methods
  • Stochastic Gradient Optimization Techniques
  • Image Enhancement Techniques
  • Sparse and Compressive Sensing Techniques
  • Image and Video Quality Assessment
  • Anomaly Detection Techniques and Applications
  • Topic Modeling
  • Optical measurement and interference techniques
  • 3D Shape Modeling and Analysis
  • Visual Attention and Saliency Detection
  • Robotics and Sensor-Based Localization
  • Machine Learning and Data Classification
  • Multimodal Machine Learning Applications
  • Natural Language Processing Techniques
  • Advanced Image Fusion Techniques
  • Face recognition and analysis
  • Image Processing Techniques and Applications

Shanghai Jiao Tong University
2015-2024

Fudan University
2019-2024

Changjiang Institute of Survey, Planning, Design and Research
2024

Tsinghua University
2006-2024

Soochow University
2024

Stomatology Hospital
2024

Zhejiang University
2017-2024

Changchun University of Science and Technology
2024

Tianjin University
2024

Shanghai Municipal Education Commission
2021-2024

We present a new image editing method, particularly effective for sharpening major edges by increasing the steepness of transition while eliminating manageable degree low-amplitude structures. The seemingly contradictive effect is achieved in an optimization framework making use L0 gradient minimization, which can globally control how many non-zero gradients are resulted to approximate prominent structure sparsity-control manner. Unlike other edge-preserving smoothing approaches, our method...

10.1145/2024156.2024208 article EN 2011-12-12

We present a new image editing method, particularly effective for sharpening major edges by increasing the steepness of transition while eliminating manageable degree low-amplitude structures. The seemingly contradictive effect is achieved in an optimization framework making use L 0 gradient minimization, which can globally control how many non-zero gradients are resulted to approximate prominent structure sparsity-control manner. Unlike other edge-preserving smoothing approaches, our method...

10.1145/2070781.2024208 article EN ACM Transactions on Graphics 2011-11-30

Crowd counting or density estimation is a challenging task in computer vision due to large scale variations, perspective distortions and serious occlusions, etc. Existing methods generally suffer from two issues: 1) the model averaging effects multi-scale CNNs induced by widely adopted ℓ <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sub> regression loss; 2) inconsistent across different scaled inputs. To explicitly address these issues, we...

10.1109/cvpr.2018.00550 article EN 2018-06-01

Scale problem lies in the heart of object detection. In this work, we develop a novel Scale-Transferrable Detection Network (STDN) for detecting multi-scale objects images. contrast to previous methods that simply combine predictions from multiple feature maps different network depths, proposed is equipped with embedded super-resolution layers (named as scale-transfer layer/module work) explicitly explore interscale consistency nature across detection scales. Scale-transfer module naturally...

10.1109/cvpr.2018.00062 article EN 2018-06-01

The massive multiple-input multiple-output (MIMO) system has drawn increasing attention recently as it is expected to boost the throughput and result in lower costs. Previous studies mainly focus on time division duplexing (TDD) systems, which are more amenable practical implementations due channel reciprocity. However, there many frequency (FDD) systems deployed worldwide. Consequently, of great importance investigate design performance FDD MIMO systems. To reduce overhead estimation a...

10.1109/access.2014.2353297 article EN cc-by-nc-nd IEEE Access 2014-01-01

Vision-language representation learning largely benefits from image-text alignment through contrastive losses (e.g., InfoNCE loss). The success of this strategy is attributed to its capability in maximizing the mutual information (MI) between an image and matched text. However, simply performing cross-modal (CMA) ignores data potential within each modality, which may result degraded representations. For instance, although CMA-based models are able map pairs close together embedding space,...

10.1109/cvpr52688.2022.01522 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022-06-01

Delineating organs at risk (OARs) on computed tomography (CT) images is an essential step in radiation therapy; however, it notoriously time-consuming and prone to inter-observer variation. Herein, we report a deep learning-based automatic segmentation (AS) algorithm (WBNet) that can accurately efficiently delineate all major OARs the entire body directly CT scans.We collected 755 scans of head neck, thorax, abdomen, pelvis manually delineated 50 images. The with contours were split into...

10.1016/j.radonc.2021.04.019 article EN cc-by-nc-nd Radiotherapy and Oncology 2021-05-04

Visually exploring in a real-world 4D spatiotemporal space freely VR has been long-term quest. The task is especially appealing when only few or even single RGB cameras are used for capturing the dynamic scene. To this end, we present an efficient framework capable of fast reconstruction, compact modeling, and streamable rendering. First, propose to decompose according temporal characteristics. Points associated with probabilities belonging three categories: static, deforming, new areas....

10.1109/tvcg.2023.3247082 article EN IEEE Transactions on Visualization and Computer Graphics 2023-02-22

By leveraging Mobile Cloud Computing (MCC), resource-poor mobile devices are enabled to run rich media applications. In this article, we review cloud computing, with focus on the technical challenges of MCC for multimedia applications, and briefly prototypes. The article is concluded a discussion several open research problems that call substantial efforts.

10.1109/mwc.2013.6549282 article EN IEEE Wireless Communications 2013-06-01

Emerging applications and operational scenarios raise strict requirements for long-distance data transmission, driving network operators to design wide area networks from a new perspective. Software-defined network, i.e., SD-WAN, has been regarded as the promising architecture of next-generation network. To demystify software-defined we revisit status challenges legacy We briefly introduce In order bottom top, survey representative advances in each layer As SD-WAN based multi-objective...

10.1109/icccn.2019.8847124 article EN 2019-07-01

With a focus on fatigue driving detection research, fully automated driver status algorithm using images is proposed. In the proposed algorithm, multitask cascaded convolutional network (MTCNN) architecture employed in face and feature point location, region of interest (ROI) extracted points. A neural network, named EM-CNN, to detect states eyes mouth from ROI images. The percentage eyelid closure over pupil time (PERCLOS) opening degree (POM) are two parameters used for detection....

10.1155/2020/7251280 article EN cc-by Computational Intelligence and Neuroscience 2020-11-18

To leverage deep learning for image aesthetics assessment, one critical but unsolved issue is how to seamlessly incorporate the information of aspect ratios learn more robust models. In this paper, an adaptive fractional dilated convolution (AFDC), which aspect-ratio-embedded, composition-preserving and parameter-free, developed tackle natively in convolutional kernel level. Specifically, adaptively constructed according ratios, where interpolation nearest two integer kernels are used cope...

10.1109/cvpr42600.2020.01412 preprint EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020-06-01

Self-supervised depth estimation for indoor environments is more challenging than its outdoor counterpart in at least the following two aspects: (i) range of sequences varies a lot across different frames, making it difficult network to induce consistent cues, whereas maximum distance scenes mostly stays same as camera usually sees sky; (ii) contain much rotational motions, which cause difficulties pose network, while motions are pre-dominantly translational, especially driving datasets such...

10.1109/iccv48922.2021.01255 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2021-10-01

This work reviews the results of NTIRE 2021 Challenge on Non-Homogeneous Dehazing. The proposed techniques and their have been evaluated a novel dataset that extends NH-Haze datset. It consists additional 35 pairs real haze free nonhomogeneous hazy images recorded outdoor. has introduced in outdoor scenes by using professional setup imitates conditions scenes. 327 participants registered challenge 23 teams competed final testing phase. solutions gauge state-of-the-art image dehazing.

10.1109/cvprw53098.2021.00074 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) 2021-06-01

The 3D Lookup Table (3D LUT) is a highly-efficient tool for real-time image enhancement tasks, which models non-linear color transform by sparsely sampling it into discretized lattice. Previous works have made efforts to learn image-adaptive output values of LUTs flexible but neglect the importance strategy. They adopt sub-optimal uniform point allocation, limiting expressiveness learned since (tri-)linear interpolation between points in LUT might fail model local non-linearities transform....

10.1109/cvpr52688.2022.01700 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022-06-01

10.1109/cvpr52733.2024.00813 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024-06-16

10.1016/j.patrec.2008.10.017 article EN Pattern Recognition Letters 2008-11-26

Of late, threats enabled by the ubiquitous use of mobile devices have drawn much interest from research community. However, prior all suffer a similar, and profound, weakness - namely requirement that adversary is either within visual range victim (e.g., to ensure pop-out events in reflections victim's sunglasses can be discerned) or close enough target avoid expensive telescopes. In this paper, we broaden scope attacks relaxing these requirements show breaches privacy are possible even when...

10.1145/2508859.2516709 article EN 2013-01-01

Video compression artifact reduction aims to recover high-quality videos from low-quality compressed videos. Most existing approaches use a single neighboring frame or pair of frames (preceding and/or following the target frame) for this task. Furthermore, as high quality overall may contain patches, and patches exist in low overall, current methods focusing on nearby peak-quality (PQFs) miss details frames. To remedy these shortcomings, paper we propose novel end-to-end deep neural network...

10.1109/iccv.2019.00714 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2019-10-01
Coming Soon ...