- Advanced Vision and Imaging
- Advanced Image Processing Techniques
- Image and Signal Denoising Methods
- Generative Adversarial Networks and Image Synthesis
- Human Pose and Action Recognition
- Advanced Image and Video Retrieval Techniques
- Image Enhancement Techniques
- Image Processing Techniques and Applications
- Video Analysis and Summarization
- Anomaly Detection Techniques and Applications
- Medical Image Segmentation Techniques
- Multimodal Machine Learning Applications
- Medical Imaging Techniques and Applications
- Advanced Neural Network Applications
- Face recognition and analysis
- Computer Graphics and Visualization Techniques
- Advanced X-ray and CT Imaging
- Adversarial Robustness in Machine Learning
- Face and Expression Recognition
- Video Surveillance and Tracking Methods
- Advanced Image Fusion Techniques
- Radiomics and Machine Learning in Medical Imaging
- Domain Adaptation and Few-Shot Learning
- 3D Shape Modeling and Analysis
- Image Retrieval and Classification Techniques
Tsinghua University
2009-2025
Tongren Hospital
2021-2025
Chongqing Medical University
2025
Westlake University
2025
Jiangsu University
2023-2025
Panzhihua Central Hospital
2025
China Automotive Technology and Research Center
2025
China Agricultural University
2025
Guiyang Medical University
2025
Affiliated Hospital of Hangzhou Normal University
2025
<h3>Importance</h3> Convalescent plasma is a potential therapeutic option for patients with coronavirus disease 2019 (COVID-19), but further data from randomized clinical trials are needed. <h3>Objective</h3> To evaluate the efficacy and adverse effects of convalescent therapy COVID-19. <h3>Design, Setting, Participants</h3> Open-label, multicenter, trial performed in 7 medical centers Wuhan, China, February 14, 2020, to April 1, final follow-up 28, 2020. The included 103 participants...
In single image deblurring, the "coarse-to-fine" scheme, i.e. gradually restoring sharp on different resolutions in a pyramid, is very successful both traditional optimization-based methods and recent neural-network-based approaches. this paper, we investigate strategy propose Scale-recurrent Network (SRN-DeblurNet) for deblurring task. Compared with many learning-based approaches [25], it has simpler network structure, smaller number of parameters easier to train. We evaluate our method...
Haze is one of the major factors that degrade outdoor images. Removing haze from a single image known to be severely ill-posed, and assumptions made in previous methods do not hold many situations. In this paper, we systematically investigate different haze-relevant features learning framework identify best feature combination for dehazing. We show dark-channel most informative task, which confirms observation He et al. [8] perspective, while other also contribute significantly complementary...
Motion blur from camera shake is a major problem in videos captured by hand-held devices. Unlike single-image deblurring, video-based approaches can take advantage of the abundant information that exists across neighboring frames. As result best performing methods rely on alignment nearby However, aligning images computationally expensive and fragile procedure, aggregate must therefore be able to identify which regions have been accurately aligned not, task requires high level scene...
Previous CNN-based video super-resolution approaches need to align multiple frames the reference. In this paper, we show that proper frame alignment and motion compensation is crucial for achieving high quality results. We accordingly propose a "sub-pixel compensation" (SPMC) layer in CNN framework. Analysis experiments suitability of SR. The final end-to-end, scalable framework effectively incorporates SPMC fuses reveal image details. Our implementation can generate visually quantitatively...
Image matting is the problem of determining for each pixel in an image whether it foreground, background, or mixing parameter, "alpha", those pixels that are a mixture foreground and background. Matting inherently ill-posed problem. Previous approaches either use naive color sampling methods to estimate background colors unknown pixels, propagation-based avoid under weak assumptions about statistics. We argue neither method itself enough generate good results complex natural images. analyze...
Blind image deconvolution, i.e., estimating a blur kernel k and latent x from an input blurred y, is severely ill-posed problem. In this paper we introduce new patch-based strategy for estimation in blind deconvolution. Our approach estimates "trusted" subset of by imposing patch prior specifically tailored towards modeling the appearance edge corner primitives. To choose proper priors examine both statistical learned natural dataset simple synthetic structures. Based on priors, iteratively...
Although tremendous success has been achieved for interactive object cutout in still images, accurately extracting dynamic objects video remains a very challenging problem. Previous systems present two major limitations: (1) reliance on global statistics, thus lacking the ability to deal with complex and diverse scenes; (2) treating segmentation as optimization, practical workflow that can guarantee convergence of desired results. We Video SnapCut , robust system significantly advances...
The availability of quantitative online benchmarks for low-level vision tasks such as stereo and optical flow has led to significant progress in the respective fields. This paper introduces a benchmark image matting. There are three key factors successful benchmarking system: (a) challenging, high-quality ground truth test set; (b) an evaluation repository that is dynamically updated with new results; (c) perceptually motivated error functions. Our strives meet all criteria. We evaluated...
Matting refers to the problem of accurate foreground estimation in images and video. It is one key techniques many image editing film production applications, thus has been extensively studied literature. With recent advances digital cameras, using matting create novel composites or facilitate other tasks gained increasing interest from both professionals as well consumers. Consequently, various systems have proposed try efficiently extract high quality mattes still video sequences. This...
Pre-training video transformers on extra large-scale datasets is generally required to achieve premier performance relatively small datasets. In this paper, we show that masked autoencoders (VideoMAE) are data-efficient learners for self-supervised pre-training (SSVP). We inspired by the recent ImageMAE and propose customized tube masking with an extremely high ratio. This simple design makes reconstruction a more challenging self-supervision task, thus encouraging extracting effective...
In this paper, we present a new approach for text localization in natural images, by discriminating and non-text regions at three levels: pixel, component line levels. Firstly, powerful low-level filter called the Stroke Feature Transform (SFT) is proposed, which extends widely-used Width (SWT) incorporating color cues of pixels, leading to significantly enhanced performance on inter-component separation intra-component connection. Secondly, based output SFT, apply two classifiers,...
We present a robust and efficient approach to video stabilization that achieves high-quality camera motion for wide range of videos. In this article, we focus on the problem transforming set input 2D trajectories so they are both smooth resemble visually plausible views imaged scene; our key insight is can achieve goal by enforcing subspace constraints feature while smoothing them. Our assembles tracked features in into trajectory matrix, factors it two low-rank matrices, performs filtering...
Non-blind deconvolution is a key component in image deblurring systems. Previous methods assume linear blur model where the blurred generated by convolution of latent and kernel. This assumption often does not hold practice due to various types outliers imaging process. Without proper outlier handling, previous may generate results with severe ringing artifacts even when kernel estimated accurately. In this paper we analyze few common that cause fail, such as pixel saturation non-Gaussian...
in current clinical practice, the standard evaluation for axillary lymph node (ALN) status breast cancer has a low efficiency and is based on an invasive procedure that causes operative-associated complications many patients. Therefore, we aimed to use machine learning techniques develop efficient preoperative magnetic resonance imaging (MRI) radiomics approach of ALN explore association between tumor microenvironment patients with early-stage cancer.in this retrospective multicenter study,...
We present a novel highfidelity generative adversarial network (GAN) inversion framework that enables attribute editing with image-specific details well-preserved (e.g., background, appearance, and illumination). first analyze the challenges of GAN from perspective lossy data compression. With low bitrate latent code, previous works have difficulties in preserving reconstructed edited images. Increasing size code can improve accuracy but at cost inferior editability. To image fidelity...
We present a new data-driven video inpainting method for recovering missing regions of frames. A novel deep learning architecture is proposed which contains two subnetworks: temporal structure inference network and spatial detail network. The built upon 3D fully convolutional architecture: it only learns to complete low-resolution volume given the expensive computational cost convolution. low resolution result provides guidance network, performs imagebased with 2D produce recovered frames in...
Pretraining Vision Transformers (ViTs) has achieved great success in visual recognition. A following scenario is to adapt a ViT various image and video recognition tasks. The adaptation challenging because of heavy computation memory storage. Each model needs an independent complete finetuning process different tasks, which limits its transferability domains. To address this challenge, we propose effective approach for Transformer, namely AdaptFormer, can the pre-trained ViTs into many tasks...
We introduce the task of scene-aware dialog. Our goal is to generate a complete and natural response question about scene, given video audio scene history previous turns in To answer successfully, agents must ground concepts from while leveraging contextual cues dialog history. benchmark this task, we Audio Visual Scene-Aware Dialog (AVSD) Dataset. For each more than 11,000 videos human actions Charades dataset, our dataset contains video, plus final summary by one participants. train...
Recent studies in deepfake detection have yielded promising results when the training and testing face forgeries are from same dataset. However, problem remains challenging one tries to generalize detector created by unseen methods This work addresses generalizable a simple principle: representation should be sensitive diverse types of forgeries. Following this principle, we propose enrich "diversity" synthesizing augmented with pool forgery configurations strengthen "sensitivity" enforcing...
Image inpainting has made remarkable progress with recent advances in deep learning. Popular networks mainly follow an encoder-decoder architecture (sometimes skip connections) and possess sufficiently large receptive field, i.e., larger than the image resolution. The field refers to set of input pixels that are path-connected a neuron. For task, however, size surrounding areas needed repair different kinds missing regions different, very is not always optimal, especially for local...
Neural Radiance Field (NeRF) has gained considerable attention recently for 3D scene reconstruction and novel view synthesis due to its remarkable quality. However, image blurriness caused by defocus or motion, which often occurs when capturing scenes in the wild, significantly degrades To address this problem, We propose Deblur-NeRF, first method that can recover a sharp NeRF from blurry input. adopt an analysis-by-synthesis approach reconstructs views simulating blurring process, thus...
Previous portrait image generation methods roughly fall into two categories: 2D GANs and 3D-aware GANs. can generate high fidelity portraits but with low view consistency. GAN maintain consistency their generated images are not locally editable. To overcome these limitations, we propose FENeRF, a generator that produce view-consistent locally-editable images. Our method uses decoupled latent codes to corresponding facial semantics texture in spatial-aligned 3D volume shared geometry....