Kun Zhou

ORCID: 0000-0001-9592-6575
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Advanced Vision and Imaging
  • Advanced Image Processing Techniques
  • Image Processing Techniques and Applications
  • Image and Signal Denoising Methods
  • Computer Graphics and Visualization Techniques
  • Image Enhancement Techniques
  • Advanced Image and Video Retrieval Techniques
  • Generative Adversarial Networks and Image Synthesis
  • Image Retrieval and Classification Techniques
  • Multimodal Machine Learning Applications
  • Topic Modeling
  • Optical measurement and interference techniques
  • Recommender Systems and Techniques
  • Higher Education and Teaching Methods
  • Visual Attention and Saliency Detection
  • 3D Surveying and Cultural Heritage
  • Salmonella and Campylobacter epidemiology
  • Diabetic Foot Ulcer Assessment and Management
  • Semantic Web and Ontologies
  • Speech and dialogue systems
  • Human Pose and Action Recognition
  • Machine Learning and Algorithms
  • Video Surveillance and Tracking Methods
  • Automated Road and Building Extraction
  • Photoacoustic and Ultrasonic Imaging

University of Maryland, College Park
2024

Xidian University
2023

University of Hong Kong
2022-2023

Renmin University of China
2023

Beijing Institute of Big Data Research
2023

SMART Reading
2022

Hong Kong University of Science and Technology
2022

Shenzhen University
2022

Wuhan University
2022

Nanyang Technological University
2022

Recent studies have shown the importance of modeling long-range interactions in inpainting problem. To achieve this goal, existing approaches exploit either standalone attention techniques or transformers, but usually under a low resolution consideration computational cost. In paper, we present novel transformer-based model for large hole inpainting, which unifies merits transformers and convolutions to efficiently process high-resolution images. We carefully design each component our...

10.1109/cvpr52688.2022.01049 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022-06-01

Estimating 3D human pose from a single image is challenging task. This work attempts to address the uncertainty of lifting detected 2D joints space by introducing an intermediate state - Part-Centric Heatmap Triplets (HEMlets), which shortens gap between observation and interpretation. The HEMlets utilize three joint-heatmaps represent relative depth information end-joints for each skeletal body part. In our approach, Convolutional Network(ConvNet) first trained predict HEMlests input image,...

10.1109/iccv.2019.00243 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2019-10-01

Single image super-resolution (SISR) deals with a fundamental problem of upsampling low-resolution (LR) to its high-resolution (HR) version. Last few years have witnessed impressive progress propelled by deep learning methods. However, one critical challenge faced existing methods is strike sweet spot model complexity and resulting SISR quality. This paper addresses this pain point proposing linearly-assembled pixel-adaptive regression network (LAPAR), which casts the direct LR HR mapping...

10.48550/arxiv.2105.10422 preprint EN other-oa arXiv (Cornell University) 2021-01-01

We consider the single image super-resolution (SISR) problem, where a high-resolution (HR) is generated based on low-resolution (LR) input. Recently, generative adversarial networks (GANs) become popular to hallucinate details. Most methods along this line rely predefined single-LR-single-HR mapping, which not flexible enough for ill-posed SISR task. Also, GAN-generated fake details may often undermine realism of whole image. address these issues by proposing best-buddy GANs (Beby-GAN)...

10.1609/aaai.v36i2.20030 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2022-06-28

Long-range temporal alignment is critical yet challenging for video restoration tasks. Recently, some works attempt to divide the long-range into several sub-alignments and handle them progressively. Although this operation helpful in modeling distant correspondences, error accumulation inevitable due propagation mechanism. In work, we present a novel, generic iterative module which employs gradual refinement scheme sub-alignments, yielding more accurate motion compensation. To further...

10.1109/cvpr52688.2022.00596 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022-06-01

For video frame interpolation (VFI), existing deep-learning-based approaches strongly rely on the ground-truth (GT) intermediate frames, which sometimes ignore non-unique nature of motion judging from given adjacent frames. As a result, these methods tend to produce averaged solutions that are not clear enough. To alleviate this issue, we propose relax requirement reconstructing an as close GT possible. Towards end, develop texture consistency loss (TCL) upon assumption interpolated content...

10.1109/cvpr52729.2023.02123 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

Neural radiance fields (NeRF) show great success in novel view synthesis. However, real-world scenes, recovering high-quality details from the source images is still challenging for existing NeRF-based approaches, due to potential imperfect calibration information and scene representation inaccuracy. Even with training frames, synthetic views produced by NeRF models suffer notable rendering artifacts, such as noise, blur, etc. Towards improve synthesis quality of we propose NeRFLiX, a...

10.1109/cvpr52729.2023.01190 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

Cystitis glandularis (CG) is a rare urological condition characterized by glandular metaplasia of the bladder mucosa. Recurrence following transurethral resection (TUR) significant clinical challenge. Traditional predictive models often fail to capture complexity data, resulting in insufficient accuracy. In contrast, machine learning (ML) has demonstrated substantial potential medical prediction identifying and analyzing complex patterns that are undetectable conventional methods. This study...

10.21037/tau-2024-665 article EN Translational Andrology and Urology 2025-03-01

Relevance feedback is a powerful technique to enhance Content-Based Image Retrieval (CBIR) performance. It solicits the user's relevance judgments on retrieved images returned by CBIR systems. The labeling then used learn classifier distinguish between relevant and irrelevant images. However, top returnedimages may not be most informative ones. challenge thus determine which unlabeled would (i.e., improve most) if they were labeled as training samples. In this paper, we propose novel active...

10.1145/1277741.1277764 article EN 2007-07-23

The manipulation of panoramic/wide-angle images is usually achieved via image warping. Though various techniques have been developed for preserving shapes and straight lines warping, these are not sufficient images. projections will turn the into curved "geodesic lines", it fundamentally impossible to keep all straight. In this work, we propose a geodesic-preserving method content-aware An energy term introduced preserve geodesic appearance lines, can be used with shape-preserving terms. Our...

10.1109/cvpr.2015.7298617 article EN 2015-06-01

The detection and recognition of traffic signs in complex environments has received extensive attention, the correct small targets occluded are two key issues. This paper proposes a context-aware attention-driven weighted fusion network for sign detection. Specifically, design context module not only enhances diversity global features, but also reduces sensitivity convolution to objects. In addition, feature pyramid is designed efficiently fuse deep semantic information shallow...

10.1109/access.2023.3264214 article EN cc-by-nc-nd IEEE Access 2023-01-01

Generative adversarial networks (GANs) have made great success in image inpainting yet still difficulties tackling large missing regions. In contrast, iterative probabilistic algorithms, such as autoregressive and denoising diffusion models, to be deployed with massive computing resources for decent effect. To achieve high-quality results low computational cost, we present a novel pixel spread model (PSM) that iteratively employs decoupled modeling, combining the optimization efficiency of...

10.48550/arxiv.2212.02963 preprint EN other-oa arXiv (Cornell University) 2022-01-01

the Best Paper Award Committee to select Paper.After careful deliberation, following paper was chosen with unanimous consensus as winner, on basis of its intellectual merit and potential impact:Visual attention network [1] Two other papers were awarded an

10.1007/s41095-024-0435-z article EN cc-by Computational Visual Media 2024-05-14

Video Multimodal Large Language Models (MLLMs) have shown remarkable capability of understanding the video semantics on various downstream tasks. Despite advancements, there is still a lack systematic research visual context representation, which refers to scheme select frames from and further tokens frame. In this paper, we explore design space for aim improve performance MLLMs by finding more effective representation schemes. Firstly, formulate task as constrained optimization problem,...

10.48550/arxiv.2410.13694 preprint EN arXiv (Cornell University) 2024-10-17

Motivation: High-resolution DWI plays a crucial role in brain tumor diagnosis. Previous studies have introduced two high-resolution distortion-free techniques: PSF and BLADE. However, no one has yet compared the MR imaging. Goal(s): To compare image quality of BLADE DWI. Approach: In this study, scan parameters were adjusted to achieve optimized for Subsequently, scans performed on patients, final was compared. Results: With scanning times being similar, exhibits superior SNR while its...

10.58530/2024/3499 article EN Proceedings on CD-ROM - International Society for Magnetic Resonance in Medicine. Scientific Meeting and Exhibition/Proceedings of the International Society for Magnetic Resonance in Medicine, Scientific Meeting and Exhibition 2024-11-26

There have been recent efforts to extend the Chain-of-Thought (CoT) paradigm Multimodal Large Language Models (MLLMs) by finding visual clues in input scene, advancing reasoning ability of MLLMs. However, current approaches are specially designed for tasks where clue plays a major role whole process, leading difficulty handling complex scenes does not actually simplify task. To deal with this challenge, we propose new enabling MLLMs autonomously modify scene ones based on its status, such...

10.48550/arxiv.2411.18142 preprint EN arXiv (Cornell University) 2024-11-27

In this paper, we introduce a compact random-access vector representation for solid textures made of intermixed regions with relatively smooth internal color variations. It is feature-preserving and resolution-independent. representation, texture volume divided into multiple regions. Region boundaries are implicitly defined using signed distance function. Color variations within the represented compactly supported radial basis functions (RBFs). With spatial indexing structure, such RBFs...

10.1145/1833349.1778823 article EN 2010-07-15

Although pre-trained language models (PLMs) have shown impressive performance by text-only self-supervised training, they are found lack of visual semantics or commonsense. Existing solutions often rely on explicit images for knowledge augmentation (requiring time-consuming retrieval generation), and also conduct the whole input text, without considering whether it is actually needed in specific inputs tasks. To address these issues, we propose a novel **V**isually-**A**ugmented fine-tuning...

10.18653/v1/2023.acl-long.833 article EN cc-by 2023-01-01

Recent studies have shown the importance of modeling long-range interactions in inpainting problem. To achieve this goal, existing approaches exploit either standalone attention techniques or transformers, but usually under a low resolution consideration computational cost. In paper, we present novel transformer-based model for large hole inpainting, which unifies merits transformers and convolutions to efficiently process high-resolution images. We carefully design each component our...

10.48550/arxiv.2203.15270 preprint EN other-oa arXiv (Cornell University) 2022-01-01

We consider the single image super-resolution (SISR) problem, where a high-resolution (HR) is generated based on low-resolution (LR) input. Recently, generative adversarial networks (GANs) become popular to hallucinate details. Most methods along this line rely predefined single-LR-single-HR mapping, which not flexible enough for SISR task. Also, GAN-generated fake details may often undermine realism of whole image. address these issues by proposing best-buddy GANs (Beby-GAN) rich-detail...

10.48550/arxiv.2103.15295 preprint EN other-oa arXiv (Cornell University) 2021-01-01

Recent works on interactive video object cutout mainly focus designing dynamic foreground-background (FB) classifiers for segmentation propagation. However, the research optimally removing errors from FB classification is sparse, and often accumulate rapidly, causing significant in propagated frames. In this work, we take initial steps to addressing problem, call new task \emph{segmentation rectification}. Our key observation that possibly asymmetrically distributed false positive negative...

10.48550/arxiv.1602.04906 preprint EN other-oa arXiv (Cornell University) 2016-01-01
Coming Soon ...