- Advanced Vision and Imaging
- Advanced Image Processing Techniques
- Image Processing Techniques and Applications
- Image and Signal Denoising Methods
- Computer Graphics and Visualization Techniques
- Image Enhancement Techniques
- Advanced Image and Video Retrieval Techniques
- Generative Adversarial Networks and Image Synthesis
- Image Retrieval and Classification Techniques
- Multimodal Machine Learning Applications
- Topic Modeling
- Optical measurement and interference techniques
- Recommender Systems and Techniques
- Higher Education and Teaching Methods
- Visual Attention and Saliency Detection
- 3D Surveying and Cultural Heritage
- Salmonella and Campylobacter epidemiology
- Diabetic Foot Ulcer Assessment and Management
- Semantic Web and Ontologies
- Speech and dialogue systems
- Human Pose and Action Recognition
- Machine Learning and Algorithms
- Video Surveillance and Tracking Methods
- Automated Road and Building Extraction
- Photoacoustic and Ultrasonic Imaging
University of Maryland, College Park
2024
Xidian University
2023
University of Hong Kong
2022-2023
Renmin University of China
2023
Beijing Institute of Big Data Research
2023
SMART Reading
2022
Hong Kong University of Science and Technology
2022
Shenzhen University
2022
Wuhan University
2022
Nanyang Technological University
2022
Recent studies have shown the importance of modeling long-range interactions in inpainting problem. To achieve this goal, existing approaches exploit either standalone attention techniques or transformers, but usually under a low resolution consideration computational cost. In paper, we present novel transformer-based model for large hole inpainting, which unifies merits transformers and convolutions to efficiently process high-resolution images. We carefully design each component our...
Estimating 3D human pose from a single image is challenging task. This work attempts to address the uncertainty of lifting detected 2D joints space by introducing an intermediate state - Part-Centric Heatmap Triplets (HEMlets), which shortens gap between observation and interpretation. The HEMlets utilize three joint-heatmaps represent relative depth information end-joints for each skeletal body part. In our approach, Convolutional Network(ConvNet) first trained predict HEMlests input image,...
Single image super-resolution (SISR) deals with a fundamental problem of upsampling low-resolution (LR) to its high-resolution (HR) version. Last few years have witnessed impressive progress propelled by deep learning methods. However, one critical challenge faced existing methods is strike sweet spot model complexity and resulting SISR quality. This paper addresses this pain point proposing linearly-assembled pixel-adaptive regression network (LAPAR), which casts the direct LR HR mapping...
We consider the single image super-resolution (SISR) problem, where a high-resolution (HR) is generated based on low-resolution (LR) input. Recently, generative adversarial networks (GANs) become popular to hallucinate details. Most methods along this line rely predefined single-LR-single-HR mapping, which not flexible enough for ill-posed SISR task. Also, GAN-generated fake details may often undermine realism of whole image. address these issues by proposing best-buddy GANs (Beby-GAN)...
Long-range temporal alignment is critical yet challenging for video restoration tasks. Recently, some works attempt to divide the long-range into several sub-alignments and handle them progressively. Although this operation helpful in modeling distant correspondences, error accumulation inevitable due propagation mechanism. In work, we present a novel, generic iterative module which employs gradual refinement scheme sub-alignments, yielding more accurate motion compensation. To further...
For video frame interpolation (VFI), existing deep-learning-based approaches strongly rely on the ground-truth (GT) intermediate frames, which sometimes ignore non-unique nature of motion judging from given adjacent frames. As a result, these methods tend to produce averaged solutions that are not clear enough. To alleviate this issue, we propose relax requirement reconstructing an as close GT possible. Towards end, develop texture consistency loss (TCL) upon assumption interpolated content...
Neural radiance fields (NeRF) show great success in novel view synthesis. However, real-world scenes, recovering high-quality details from the source images is still challenging for existing NeRF-based approaches, due to potential imperfect calibration information and scene representation inaccuracy. Even with training frames, synthetic views produced by NeRF models suffer notable rendering artifacts, such as noise, blur, etc. Towards improve synthesis quality of we propose NeRFLiX, a...
Cystitis glandularis (CG) is a rare urological condition characterized by glandular metaplasia of the bladder mucosa. Recurrence following transurethral resection (TUR) significant clinical challenge. Traditional predictive models often fail to capture complexity data, resulting in insufficient accuracy. In contrast, machine learning (ML) has demonstrated substantial potential medical prediction identifying and analyzing complex patterns that are undetectable conventional methods. This study...
Relevance feedback is a powerful technique to enhance Content-Based Image Retrieval (CBIR) performance. It solicits the user's relevance judgments on retrieved images returned by CBIR systems. The labeling then used learn classifier distinguish between relevant and irrelevant images. However, top returnedimages may not be most informative ones. challenge thus determine which unlabeled would (i.e., improve most) if they were labeled as training samples. In this paper, we propose novel active...
The manipulation of panoramic/wide-angle images is usually achieved via image warping. Though various techniques have been developed for preserving shapes and straight lines warping, these are not sufficient images. projections will turn the into curved "geodesic lines", it fundamentally impossible to keep all straight. In this work, we propose a geodesic-preserving method content-aware An energy term introduced preserve geodesic appearance lines, can be used with shape-preserving terms. Our...
The detection and recognition of traffic signs in complex environments has received extensive attention, the correct small targets occluded are two key issues. This paper proposes a context-aware attention-driven weighted fusion network for sign detection. Specifically, design context module not only enhances diversity global features, but also reduces sensitivity convolution to objects. In addition, feature pyramid is designed efficiently fuse deep semantic information shallow...
Generative adversarial networks (GANs) have made great success in image inpainting yet still difficulties tackling large missing regions. In contrast, iterative probabilistic algorithms, such as autoregressive and denoising diffusion models, to be deployed with massive computing resources for decent effect. To achieve high-quality results low computational cost, we present a novel pixel spread model (PSM) that iteratively employs decoupled modeling, combining the optimization efficiency of...
the Best Paper Award Committee to select Paper.After careful deliberation, following paper was chosen with unanimous consensus as winner, on basis of its intellectual merit and potential impact:Visual attention network [1] Two other papers were awarded an
Video Multimodal Large Language Models (MLLMs) have shown remarkable capability of understanding the video semantics on various downstream tasks. Despite advancements, there is still a lack systematic research visual context representation, which refers to scheme select frames from and further tokens frame. In this paper, we explore design space for aim improve performance MLLMs by finding more effective representation schemes. Firstly, formulate task as constrained optimization problem,...
Motivation: High-resolution DWI plays a crucial role in brain tumor diagnosis. Previous studies have introduced two high-resolution distortion-free techniques: PSF and BLADE. However, no one has yet compared the MR imaging. Goal(s): To compare image quality of BLADE DWI. Approach: In this study, scan parameters were adjusted to achieve optimized for Subsequently, scans performed on patients, final was compared. Results: With scanning times being similar, exhibits superior SNR while its...
There have been recent efforts to extend the Chain-of-Thought (CoT) paradigm Multimodal Large Language Models (MLLMs) by finding visual clues in input scene, advancing reasoning ability of MLLMs. However, current approaches are specially designed for tasks where clue plays a major role whole process, leading difficulty handling complex scenes does not actually simplify task. To deal with this challenge, we propose new enabling MLLMs autonomously modify scene ones based on its status, such...
In this paper, we introduce a compact random-access vector representation for solid textures made of intermixed regions with relatively smooth internal color variations. It is feature-preserving and resolution-independent. representation, texture volume divided into multiple regions. Region boundaries are implicitly defined using signed distance function. Color variations within the represented compactly supported radial basis functions (RBFs). With spatial indexing structure, such RBFs...
Although pre-trained language models (PLMs) have shown impressive performance by text-only self-supervised training, they are found lack of visual semantics or commonsense. Existing solutions often rely on explicit images for knowledge augmentation (requiring time-consuming retrieval generation), and also conduct the whole input text, without considering whether it is actually needed in specific inputs tasks. To address these issues, we propose a novel **V**isually-**A**ugmented fine-tuning...
Recent studies have shown the importance of modeling long-range interactions in inpainting problem. To achieve this goal, existing approaches exploit either standalone attention techniques or transformers, but usually under a low resolution consideration computational cost. In paper, we present novel transformer-based model for large hole inpainting, which unifies merits transformers and convolutions to efficiently process high-resolution images. We carefully design each component our...
We consider the single image super-resolution (SISR) problem, where a high-resolution (HR) is generated based on low-resolution (LR) input. Recently, generative adversarial networks (GANs) become popular to hallucinate details. Most methods along this line rely predefined single-LR-single-HR mapping, which not flexible enough for SISR task. Also, GAN-generated fake details may often undermine realism of whole image. address these issues by proposing best-buddy GANs (Beby-GAN) rich-detail...
Recent works on interactive video object cutout mainly focus designing dynamic foreground-background (FB) classifiers for segmentation propagation. However, the research optimally removing errors from FB classification is sparse, and often accumulate rapidly, causing significant in propagated frames. In this work, we take initial steps to addressing problem, call new task \emph{segmentation rectification}. Our key observation that possibly asymmetrically distributed false positive negative...